Predicting Premier League Matches- Prepare the data

Using Python and Random Forests to predict football matches

Author

Thomas H. Simm

Introduction

The aim of this page is to prepare data for modelling to predict the results.

Source of the data

The data was downloaded with this code which uses the wesbite FBREF.

The data from the following seasons were used (season = year competition started), as earlier years had a different format: - 2017 - 2018 - 2019 - 2020 - 2021

Further down the data in each column downloaded is shown.

Methodology

The data gives details of a given match along with the result. Details such as shots on goal, possession etc. But I don’t want to predict the results of a match given the data of that match. Instead I want to predict the result based on data from previous matches.

  • So the data for a match needs to come from data from previous matches, with preferance to matches that are near

The second important step with this data is to combine results of the home and away team to allow predictions for one match.

#collapse-hide

The data

Overall match details

  • Date – Date listed is local to the match
  • Time – Time listed is local to the match venue
    • Time is written in the 24-hour notation -Your local time is in (·)
  • Comp – Competition
  • Number next to competition states which level in the country’s league pyramid this league occupies.
  • Round – Round or Phase of Competition
  • Day – Day of week
  • GF – Goals For
  • GA – Goals Against
  • opponent

Scores & Fixtures

  • Day – Day of week
  • xG – Expected Goals
    • xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted).
    • Provided by StatsBomb.
    • An underline indicates there is a match that is missing data, but will be updated when available.
  • xGA – Expected Goals Allowed
    • xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted).
    • Provided by StatsBomb.
    • An underline indicates there is a match that is missing data, but will be updated when available.
  • Poss – Possession
    • Calculated as the percentage of passes attempted
  • Formation – Number of players in each row from defenders to forwards, not including the goalkeeper.
    • Formations provided by Data Sports Group and StatsBomb.

Shooting

Standard

  • Gls – Goals scored or allowed
  • Sh – Shots Total -Does not include penalty kicks
  • SoT – Shots on target -Note: Shots on target do not include penalty kicks
  • SoT% – Percentage of shots that are on target
    • Minimum .395 shots per squad game to qualify as a leader -Note: Shots on target do not include penalty kicks
  • G/Sh – Goals per shot
    • Minimum .395 shots per squad game to qualify as a leader
  • G/SoT – Goals per shot on target
    • Minimum .111 shots on target per squad game to qualify as a leader
    • Note: Shots on target do not include penalty kicks
  • Dist – Average distance, in yards, from goal of all shots taken
    • Minimum .395 shots per squad game to qualify as a leader
    • Does not include penalty kicks
  • FK – Shots from free kicks
  • PK – Penalty Kicks Made
  • PKatt – Penalty Kicks Attempted #### Expected
  • xG – Expected Goals
  • xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted).
    • Provided by StatsBomb.
    • An underline indicates there is a match that is missing data, but will be updated when available.
  • npxG – Non-Penalty Expected Goals
    • Provided by StatsBomb.
    • An underline indicates there is a match that is missing data, but will be updated when available.
  • npxG/Sh – Non-Penalty Expected Goals per shot
    • Provided by StatsBomb.
    • An underline indicates there is a match that is missing data, but will be updated when available.
    • Minimum .395 shots per squad game to qualify as a leader
  • G-xG – Goals minus Expected Goals
  • xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted).
    • Provided by StatsBomb.
    • An underline indicates there is a match that is missing data, but will be updated when available.
  • np:G-xG – Non-Penalty Goals minus Non-Penalty Expected Goals
    • xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted).
    • Provided by StatsBomb.
    • An underline indicates there is a match that is missing data, but will be updated when available.

Goalkeeping

Performance

  • SoTA – Shots on Target Against
  • GA – Goals Against
  • Save% – Save Percentage
    • (Shots on Target Against - Goals Against)/Shots on Target Against
    • Note that not all shots on target are stopped by the keeper, many will be stopped by defenders
    • Does not include penalty kicks
  • CS – Clean Sheets
    • Full matches by goalkeeper where no goals are allowed.
  • PSxG – Post-Shot Expected Goals
    • PSxG is expected goals based on how likely the goalkeeper is to save the shot
    • xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted).
    • Provided by StatsBomb.
    • An underline indicates there is a match that is missing data, but will be updated when available.
  • PSxG+/- – Post-Shot Expected Goals minus Goals Allowed
    • Positive numbers suggest better luck or an above average ability to stop shots
    • PSxG is expected goals based on how likely the goalkeeper is to save the shot
    • Note: Does not include own goals
    • xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted).
    • Provided by StatsBomb.
    • An underline indicates there is a match that is missing data, but will be updated when available. #### Penalty Kicks
  • PKatt – Penalty Kicks Attempted
  • PKA– Penalty Kicks Allowed
  • PKsv – Penalty Kicks Saved
  • PKm– Penalty Kicks Missed #### Launched
  • Cmp – Passes Completed
    • Passes` longer than 40 yards
  • Att – Passes Attempted
    • Passes longer than 40 yards
  • Cmp% – Pass Completion Percentage
    • Passes longer than 40 yards #### Passes
  • Att – Passes Attempted
    • Not including goal kicks
  • Thr – Throws Attempted
  • Launch% – Percentage of Passes that were Launched
    • Not including goal kicks
    • Passes longer than 40 yards
  • AvgLen – Average length of passes, in yards
    • Not including goal kicks #### Goal Kicks
  • Att – Goal Kicks Attempted
  • Launch% – Percentage of Goal Kicks that were Launched
    • Passes longer than 40 yards
  • AvgLen – Average length of goal kicks, in yards #### Crosses
  • Opp – Opponent’s attempted crosses into penalty area
  • Stp – Number of crosses into penalty area which were successfully stopped by the goalkeeper
  • Stp% – Percentage of crosses into penalty area which were successfully stopped by the goalkeeper #### Sweeper
  • #OPA – # of defensive actions outside of penalty area
  • AvgDist – Average distance from goal (in yards) of all defensive actions

Passing

Total

  • Cmp – Passes Completed
  • Att – Passes Attempted
  • Cmp% – Pass Completion Percentage
    • Minimum 30 minutes played per squad game to qualify as a leader
  • TotDist – Total distance, in yards, that completed passes have traveled in any direction
  • PrgDist – Progressive Distance
    • Total distance, in yards, that completed passes have traveled towards the opponent’s goal. Note: Passes away from opponent’s goal are counted as zero progressive yards. #### Short
  • Cmp – Passes Completed
    • Passes between 5 and 15 yards
  • Att – Passes Attempted
    • Passes between 5 and 15 yards
  • Cmp% – Pass Completion Percentage
    • Passes between 5 and 15 yards
    • Minimum 30 minutes played per squad game to qualify as a leader #### Medium
  • Cmp – Passes Completed
    • Passes between 15 and 30 yards
  • Att – Passes Attempted
    • Passes between 15 and 30 yards
  • Cmp% – Pass Completion Percentage
    • Passes between 15 and 30 yards
    • Minimum 30 minutes played per squad game to qualify as a leader #### Long
  • Cmp – Passes Completed
    • Passes longer than 30 yards
  • Att – Passes Attempted
    • Passes longer than 30 yards
  • Cmp% – Pass Completion Percentage
    • Passes longer than 30 yards
    • Minimum 30 minutes played per squad game to qualify as a leader #### Others
  • Ast – Assists
  • xA – xG Assisted
    • xG which follows a pass that assists a shot
    • Provided by StatsBomb.
    • An underline indicates there is a match that is missing data, but will be updated when available.
  • KP – Passes that directly lead to a shot (assisted shots)
  • 1/3 – Completed passes that enter the 1/3 of the pitch closest to the goal
    • Not including set pieces
  • PPA – Completed passes into the 18-yard box
    • Not including set pieces
  • CrsPA – Completed crosses into the 18-yard box
    • Not including set pieces
  • Prog – Progressive Passes
    • Completed passes that move the ball towards the opponent’s goal at least 10 yards from its furthest point in the last six passes, or any completed pass into the penalty area. Excludes passes from the defending 40% of the pitch

Pass Types

Total

  • Att – Passes Attempted #### Pass Types
  • Live – Live-ball passes
  • Dead – Dead-ball passes
    • Includes free kicks, corner kicks, kick offs, throw-ins and goal kicks
  • FK – Passes attempted from free kicks
  • TB – Completed pass sent between back defenders into open space
  • Press – Passes made while under pressure from opponent
  • Sw – Passes that travel more than 40 yards of the width of the pitch
  • Crs – Crosses
  • CK – Corner Kicks

Corner Kicks

  • In – Inswinging Corner Kicks
  • Out – Outswinging Corner Kicks
  • Str – Straight Corner Kicks #### Height
  • Ground – Ground passes
  • Low – Passes that leave the ground, but stay below shoulder-level
  • High – Passes that are above shoulder-level at the peak height #### Body Parts
  • Left – Passes attempted using left foot
  • Right – Passes attempted using right foot
  • Head – Passes attempted using head
  • TI – Throw-Ins taken
  • Other – Passes attempted using body parts other than the player’s head or feet #### Outcomes
  • Cmp – Passes Completed
  • Off – Offsides
  • Out – Out of bounds
  • Int – Intercepted
  • Blocks – Blocked by the opponent who was standing it the path

Goal and Shot Creation

SCA Types

  • SCA – Shot-Creating Actions
    • The two offensive actions directly leading to a shot, such as passes, dribbles and drawing fouls. Note: A single player can receive credit for multiple actions and the shot-taker can also receive credit.
  • PassLive – Completed live-ball passes that lead to a shot attempt
  • PassDead – Completed dead-ball passes that lead to a shot attempt.
    • Includes free kicks, corner kicks, kick offs, throw-ins and goal kicks
  • Drib – Successful dribbles that lead to a shot attempt
  • Sh – Shots that lead to another shot attempt
  • Fld – Fouls drawn that lead to a shot attempt
  • Def – Defensive actions that lead to a shot attempt #### GCA Types
  • GCA – Goal-Creating Actions
    • The two offensive actions directly leading to a goal, such as passes, dribbles and drawing fouls. Note: A single player can receive credit for multiple actions and the shot-taker can also receive credit.
  • PassLive – Completed live-ball passes that lead to a goal
  • PassDead – Completed dead-ball passes that lead to a goal. Includes free kicks, corner kicks, kick offs, throw-ins and goal kicks
  • Drib – Successful dribbles that lead to a goal
  • Sh – Shots that lead to another goal-scoring shot
  • Fld – Fouls drawn that lead to a goal
  • Def – Defensive actions that lead to a goal

Defensive Actions

Tackles

  • Tkl – Number of players tackled
  • TklW – Tackles in which the tackler’s team won possession of the ball
  • Def 3rd – Tackles in defensive 1/3
  • Mid 3rd – Tackles in middle 1/3
  • Att 3rd – Tackles in attacking 1/3

Vs Dribbles

  • Tkl – Number of dribblers tackled
  • Att – Number of times dribbled past plus number of tackles
  • Tkl% – Percentage of dribblers tackled
    • Dribblers tackled divided by dribblers tackled plus times dribbled past
    • Minimum .625 dribblers contested per squad game to qualify as a leader
  • Past – Number of times dribbled past by an opposing player

Pressures

  • Press – Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball
  • Succ – Number of times the squad gained possession withing five seconds of applying pressure
  • % – Successful Pressure Percentage
    • Percentage of time the squad gained possession withing five seconds of applying pressure
    • Minimum 6.44 pressures per squad game to qualify as a leader
  • Def 3rd – Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball, in the defensive 1/3
  • Mid 3rd – Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball, in the middle 1/3
  • Att 3rd – Number of times applying pressure to opposing player who is receiving, carrying or releasing the ball, in the attacking 1/3

Blocks

  • Blocks – Number of times blocking the ball by standing in its path
  • Sh – Number of times blocking a shot by standing in its path
  • ShSv – Number of times blocking a shot that was on target, by standing in its path
  • Pass – Number of times blocking a pass by standing in its path
  • Int – Interceptions
  • Tkl+Int – Number of players tackled plus number of interceptions
  • Clr – Clearances
  • Err – Mistakes leading to an opponent’s shot

Possession

  • Poss – Possession
    • Calculated as the percentage of passes attempted #### Touches
  • Touches – Number of times a player touched the ball. Note: Receiving a pass, then dribbling, then sending a pass counts as one touch
  • Def Pen – Touches in defensive penalty area
  • Def 3rd – Touches in defensive 1/3
  • Mid 3rd – Touches in middle 1/3
  • Att 3rd – Touches in attacking 1/3
  • Att Pen – Touches in attacking penalty area
  • Live – Live-ball touches. Does not include corner kicks, free kicks, throw-ins, kick-offs, goal kicks or penalty kicks #### Dribbles
  • Succ – Dribbles Completed Successfully
  • Att – Dribbles Attempted
  • Succ% – Percentage of Dribbles Completed Successfully
    • Minimum .5 dribbles per squad game to qualify as a leader
  • #Pl – Number of Players Dribbled Past
  • Megs – Number of times a player dribbled the ball through an opposing player’s legs

Carries

  • Carries – Number of times the player controlled the ball with their feet
  • TotDist – Total distance, in yards, a player moved the ball while controlling - it with their feet, in any direction
  • PrgDist – Progressive Distance
    • Total distance, in yards, a player moved the ball while controlling it with - their feet towards the opponent’s goal
  • Prog – Carries that move the ball towards the opponent’s goal at least 5 - yards, or any carry into the penalty area. Excludes carries from the defending 40% of the pitch
  • 1/3 – Carries that enter the 1/3 of the pitch closest to the goal
  • CPA – Carries into the 18-yard box
  • Mis – Number of times a player failed when attempting to gain control of a ball
  • Dis – Number of times a player loses control of the ball after being tackled - by an opposing player. Does not include attempted dribbles

Receiving

  • Targ – Number of times a player was the target of an attempted pass
  • Rec – Number of times a player successfully received a pass
  • Rec% – Passes Received Percentage
    • Percentage of time a player successfully received a pass
    • Minimum 30 minutes played per squad game to qualify as a leader
  • Prog – Progressive Passes Received
    • Completed passes that move the ball towards the opponent’s goal at least 10 yards from its furthest point in the last six passes, or any completed pass into the penalty area. Excludes passes from the defending 40% of the pitch

Miscellaneous Stats

Performance

  • CrdY– Yellow Cards
  • CrdR– Red Cards
  • 2CrdY – Second Yellow Card
  • Fls- Fouls Committed
  • Fld – Fouls Drawn
  • Off – Offsides
  • Crs – Crosses
  • Int – Interceptions
  • TklW– Tackles in which the tackler’s team won possession of the ball
  • PKwon – Penalty Kicks Won
  • PKcon – Penalty Kicks Conceded
  • OG -- Own Goals
  • Recov – Number of loose balls recovered #### Aerial Duels
  • Won – Aerials won
  • Lost – Aerials lost
  • Won% – Percentage of aerials won Minimum .97 aerial duels per squad game to qualify as a leader

Prepare the data

import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt
cwd=os.getcwd()

cwd=os.getcwd()
folda=cwd+"/data/epl/"
dira = os.listdir(folda)
dira
['dfEPL_2017.csv',
 'dfEPL_2018.csv',
 'dfEPL_2019.csv',
 'dfEPL_2020.csv',
 'dfEPL_2021.csv',
 'epl2017-2021.csv',
 'epl2017-2021_wivnetscore.csv',
 'epl2017-2021_wivnetscoreAndGFGA_both-HA.csv',
 'epl2017-2021_wivnetscoreAndGFGA_both-HA_modPC.csv',
 'epl2017-2021_wivnetscore_both-HA.csv',
 'epl_beforeAVG_HA.csv']

Load the data and combine in one big DataFrame

#collapse-output
i=0
for d in dira:
    if d[0]=='d':
        df = pd.read_csv(folda+d,index_col=0)
        df['Season']=int(d.split('.')[0].split('_')[-1]) 
        if i==0:
            dfAll=df
            i=i+1
        else:
            dfAll=pd.concat([dfAll,df])
        
with pd.option_context("display.max_columns", None):
    display(dfAll)
Date Time Comp Round Day Venue Result GF GA Opponent Gls Sh_shooting SoT SoT% G/Sh G/SoT Dist FK_shooting PK PKatt_shooting xG npxG npxG/Sh G-xG np:G-xG Match Report SoTA Saves Save% CS PSxG PSxG+/- PKatt_keeper PKA PKsv PKm Cmp_keeper Att_keeper Cmp%_keeper Att_keeper.1 Thr Launch% AvgLen Att_keeper.2 Launch%.1 AvgLen.1 Opp Stp Stp% #OPA AvgDist Cmp_passing Att_passing Cmp%_passing TotDist_passing PrgDist_passing Cmp_passing.1 Att_passing.1 Cmp%_passing.1 Cmp_passing.2 Att_passing.2 Cmp%_passing.2 Cmp_passing.3 Att_passing.3 Cmp%_passing.3 Ast xA KP 1/3_passing PPA CrsPA Prog_passing Att_passing_types Live_passing_types Dead FK_passing_types TB Press_passing_types Sw Crs_passing_types CK In Out Str Ground Low High Left Right Head TI Other Cmp_passing_types Off_passing_types Out.1 Int_passing_types Blocks_passing_types SCA PassLive PassDead Drib Sh_gca Fld_gca Def GCA PassLive.1 PassDead.1 Drib.1 Sh_gca.1 Fld_gca.1 Def.1 Tkl TklW_defense Def 3rd_defense Mid 3rd_defense Att 3rd_defense Tkl.1 Att_defense Tkl% Past Press_defense Succ_defense % Def 3rd_defense.1 Mid 3rd_defense.1 Att 3rd_defense.1 Blocks_defense Sh_defense ShSv Pass Int_defense Tkl+Int Clr Err Poss Touches Def Pen Def 3rd_possession Mid 3rd_possession Att 3rd_possession Att Pen Live_possession Succ_possession Att_possession Succ% #Pl Megs Carries TotDist_possession PrgDist_possession Prog_possession 1/3_possession CPA Mis Dis Targ Rec Rec% Prog_possession.1 CrdY CrdR 2CrdY Fls Fld_misc Off_misc Crs_misc Int_misc TklW_misc PKwon PKcon OG Recov Won Lost Won% team Season
0 2017-08-12 17:30 Premier League Matchweek 1 Sat Away W 2 0 Brighton 1.0 14.0 4.0 28.6 0.07 0.25 19.4 2.0 0.0 0.0 1.8 1.8 0.14 -0.8 -0.8 Match Report 2.0 2.0 100.0 1.0 0.4 0.4 0.0 0.0 0.0 0.0 2.0 5.0 40.0 20.0 7.0 10.0 22.5 4.0 75.0 56.8 3.0 0.0 0.0 1.0 24.2 712.0 808.0 88.1 13422.0 3465.0 297.0 320.0 92.8 315.0 346.0 91.0 89.0 117.0 76.1 1.0 1.1 9.0 69.0 15.0 1.0 60.0 808.0 766.0 42.0 10.0 2.0 63.0 28.0 16.0 10.0 1.0 5.0 0.0 612.0 105.0 91.0 182.0 570.0 20.0 17.0 10.0 712.0 1.0 8.0 19.0 17.0 22.0 17.0 1.0 0.0 1.0 2.0 1.0 2.0 2.0 0.0 0.0 0.0 0.0 0.0 11.0 4.0 5.0 6.0 0.0 4.0 14.0 28.6 10.0 104.0 28.0 26.9 25.0 42.0 37.0 7.0 2.0 0.0 5.0 13.0 NaN 15.0 0.0 77.0 902.0 30.0 146.0 484.0 329.0 31.0 858.0 8.0 15.0 53.3 9.0 1.0 646.0 3346.0 1924.0 91.0 27.0 3.0 15.0 4.0 794.0 712.0 89.7 60.0 2.0 0.0 0.0 9.0 8.0 1.0 16.0 13.0 4.0 0.0 0.0 0.0 85.0 19.0 17.0 52.8 ManchesterCity 2017
1 2017-08-21 20:00 Premier League Matchweek 2 Mon Home D 1 1 Everton 1.0 20.0 6.0 30.0 0.05 0.17 18.9 1.0 0.0 0.0 1.2 1.2 0.06 -0.2 -0.2 Match Report 2.0 1.0 50.0 0.0 0.8 -0.2 0.0 0.0 0.0 0.0 7.0 10.0 70.0 27.0 5.0 25.9 33.5 6.0 50.0 46.0 3.0 2.0 66.7 1.0 16.0 497.0 611.0 81.3 9615.0 3476.0 199.0 228.0 87.3 217.0 254.0 85.4 72.0 110.0 65.5 0.0 1.0 16.0 39.0 16.0 0.0 67.0 611.0 556.0 55.0 11.0 2.0 135.0 24.0 14.0 7.0 0.0 6.0 0.0 389.0 113.0 109.0 158.0 370.0 22.0 29.0 10.0 497.0 0.0 13.0 10.0 14.0 33.0 27.0 3.0 0.0 1.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 15.0 10.0 9.0 5.0 1.0 7.0 14.0 50.0 7.0 127.0 42.0 33.1 42.0 51.0 34.0 13.0 0.0 0.0 13.0 13.0 NaN 18.0 0.0 63.0 732.0 47.0 190.0 374.0 227.0 42.0 678.0 6.0 15.0 40.0 7.0 0.0 447.0 2863.0 1715.0 77.0 25.0 9.0 12.0 16.0 580.0 497.0 85.7 67.0 2.0 1.0 1.0 8.0 13.0 0.0 14.0 13.0 10.0 0.0 0.0 0.0 108.0 28.0 14.0 66.7 ManchesterCity 2017
2 2017-08-26 12:30 Premier League Matchweek 3 Sat Away W 2 1 Bournemouth 2.0 18.0 8.0 44.4 0.11 0.25 16.4 1.0 0.0 0.0 1.6 1.6 0.09 0.4 0.4 Match Report 3.0 2.0 66.7 0.0 1.1 0.1 0.0 0.0 0.0 0.0 1.0 3.0 33.3 17.0 6.0 11.8 26.9 1.0 100.0 98.0 7.0 1.0 14.3 0.0 18.6 568.0 676.0 84.0 10625.0 3119.0 243.0 268.0 90.7 240.0 280.0 85.7 78.0 107.0 72.9 2.0 1.3 12.0 47.0 14.0 2.0 48.0 676.0 621.0 55.0 12.0 1.0 96.0 16.0 17.0 5.0 2.0 2.0 0.0 445.0 126.0 105.0 201.0 377.0 30.0 35.0 7.0 568.0 3.0 7.0 7.0 22.0 26.0 18.0 3.0 1.0 2.0 2.0 0.0 4.0 3.0 1.0 0.0 0.0 0.0 0.0 8.0 6.0 5.0 1.0 2.0 1.0 11.0 9.1 10.0 91.0 29.0 31.9 15.0 38.0 38.0 13.0 4.0 0.0 9.0 8.0 NaN 23.0 0.0 71.0 802.0 54.0 191.0 403.0 268.0 41.0 748.0 11.0 17.0 64.7 13.0 0.0 566.0 2875.0 1691.0 77.0 25.0 5.0 25.0 15.0 653.0 568.0 87.0 56.0 5.0 1.0 1.0 14.0 16.0 3.0 17.0 8.0 6.0 0.0 0.0 0.0 119.0 30.0 11.0 73.2 ManchesterCity 2017
3 2017-09-09 12:30 Premier League Matchweek 4 Sat Home W 5 0 Liverpool 5.0 13.0 10.0 76.9 0.38 0.50 14.2 0.0 0.0 0.0 2.7 2.7 0.21 2.3 2.3 Match Report 3.0 3.0 100.0 2.0 0.5 0.5 0.0 0.0 0.0 0.0 5.0 8.0 62.5 30.0 5.0 23.3 33.6 3.0 33.3 50.3 0.0 0.0 NaN 1.0 16.0 694.0 773.0 89.8 13656.0 3583.0 254.0 268.0 94.8 338.0 365.0 92.6 94.0 122.0 77.0 5.0 2.6 11.0 32.0 8.0 5.0 44.0 773.0 729.0 44.0 11.0 2.0 111.0 22.0 20.0 8.0 0.0 3.0 0.0 631.0 73.0 69.0 241.0 475.0 20.0 21.0 6.0 694.0 5.0 6.0 18.0 11.0 23.0 17.0 3.0 1.0 0.0 1.0 1.0 10.0 9.0 0.0 1.0 0.0 0.0 0.0 20.0 14.0 9.0 7.0 4.0 7.0 21.0 33.3 14.0 135.0 43.0 31.9 33.0 62.0 40.0 10.0 2.0 0.0 8.0 16.0 NaN 18.0 0.0 65.0 878.0 51.0 182.0 588.0 152.0 26.0 835.0 3.0 12.0 25.0 5.0 2.0 678.0 2392.0 1341.0 54.0 17.0 7.0 10.0 9.0 754.0 694.0 92.0 44.0 2.0 0.0 0.0 12.0 11.0 5.0 20.0 16.0 14.0 0.0 0.0 0.0 62.0 12.0 8.0 60.0 ManchesterCity 2017
4 2017-09-13 20:45 Champions Lg Group stage Wed Away W 4 0 nl Feyenoord 4.0 11.0 8.0 72.7 0.36 0.50 NaN NaN 0.0 0.0 NaN NaN NaN NaN NaN Match Report 1.0 1.0 100.0 1.0 NaN NaN 0.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 22.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 6.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 9.0 NaN NaN NaN 72.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 0.0 0.0 10.0 12.0 6.0 22.0 9.0 6.0 0.0 0.0 0.0 NaN NaN NaN NaN ManchesterCity 2017
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
39 2022-05-08 14:00 Premier League Matchweek 36 Sun Home L 0 4 West Ham 0.0 8.0 2.0 25.0 0.00 0.00 21.5 1.0 0.0 0.0 0.7 0.7 0.08 -0.7 -0.7 Match Report 4.0 1.0 25.0 0.0 2.3 -1.7 1.0 1.0 0.0 0.0 4.0 10.0 40.0 25.0 4.0 28.0 31.3 5.0 60.0 52.0 10.0 0.0 0.0 0.0 11.3 335.0 412.0 81.3 6189.0 2105.0 140.0 158.0 88.6 146.0 162.0 90.1 42.0 72.0 58.3 0.0 0.5 5.0 19.0 6.0 1.0 25.0 412.0 375.0 37.0 10.0 4.0 36.0 11.0 12.0 9.0 7.0 0.0 0.0 306.0 37.0 69.0 102.0 275.0 12.0 8.0 8.0 335.0 3.0 4.0 11.0 14.0 12.0 8.0 0.0 1.0 0.0 2.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 11.0 6.0 3.0 5.0 3.0 3.0 14.0 21.4 11.0 161.0 32.0 19.9 64.0 65.0 32.0 16.0 6.0 0.0 10.0 12.0 NaN 21.0 1.0 37.0 499.0 66.0 188.0 215.0 132.0 19.0 462.0 7.0 8.0 87.5 7.0 1.0 310.0 1933.0 1122.0 55.0 19.0 4.0 8.0 7.0 374.0 335.0 89.6 25.0 1.0 0.0 0.0 15.0 12.0 2.0 12.0 12.0 6.0 0.0 1.0 0.0 57.0 12.0 13.0 48.0 NorwichCity 2021
40 2022-05-11 19:45 Premier League Matchweek 21 Wed Away L 0 3 Leicester City 0.0 9.0 5.0 55.6 0.00 0.00 16.2 0.0 0.0 0.0 1.2 1.2 0.15 -1.2 -1.2 Match Report 8.0 5.0 62.5 0.0 3.0 0.0 0.0 0.0 0.0 0.0 3.0 20.0 15.0 26.0 3.0 50.0 38.9 9.0 77.8 57.4 10.0 1.0 10.0 0.0 8.0 295.0 367.0 80.4 5864.0 2054.0 132.0 148.0 89.2 119.0 133.0 89.5 41.0 74.0 55.4 0.0 0.8 5.0 16.0 7.0 1.0 22.0 367.0 323.0 44.0 11.0 0.0 58.0 18.0 7.0 2.0 2.0 0.0 0.0 250.0 46.0 71.0 90.0 238.0 11.0 18.0 4.0 295.0 0.0 6.0 9.0 11.0 12.0 8.0 1.0 1.0 2.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 20.0 9.0 9.0 9.0 2.0 7.0 16.0 43.8 9.0 264.0 56.0 21.2 102.0 124.0 38.0 19.0 8.0 0.0 11.0 16.0 NaN 26.0 0.0 36.0 511.0 95.0 242.0 194.0 99.0 19.0 468.0 6.0 15.0 40.0 7.0 0.0 310.0 1330.0 723.0 26.0 9.0 1.0 13.0 17.0 307.0 295.0 96.1 22.0 0.0 0.0 0.0 10.0 12.0 0.0 7.0 16.0 9.0 0.0 0.0 0.0 71.0 8.0 14.0 36.4 NorwichCity 2021
41 2022-05-15 14:00 Premier League Matchweek 37 Sun Away D 1 1 Wolves 1.0 11.0 2.0 18.2 0.09 0.50 13.4 0.0 0.0 0.0 1.3 1.3 0.12 -0.3 -0.3 Match Report 4.0 3.0 75.0 0.0 1.1 0.1 0.0 0.0 0.0 0.0 10.0 22.0 45.5 25.0 4.0 60.0 40.9 14.0 50.0 40.6 15.0 0.0 0.0 1.0 14.0 275.0 363.0 75.8 4915.0 1856.0 125.0 143.0 87.4 112.0 131.0 85.5 31.0 73.0 42.5 1.0 1.2 9.0 12.0 2.0 0.0 14.0 363.0 320.0 43.0 5.0 2.0 40.0 6.0 4.0 3.0 3.0 0.0 0.0 243.0 47.0 73.0 120.0 199.0 13.0 19.0 7.0 275.0 0.0 10.0 15.0 6.0 19.0 8.0 4.0 3.0 0.0 1.0 3.0 2.0 1.0 0.0 0.0 0.0 0.0 1.0 24.0 15.0 16.0 6.0 2.0 8.0 22.0 36.4 14.0 180.0 41.0 22.8 99.0 58.0 23.0 10.0 5.0 0.0 5.0 22.0 NaN 23.0 0.0 37.0 465.0 96.0 250.0 173.0 72.0 14.0 423.0 12.0 16.0 75.0 13.0 2.0 275.0 1615.0 844.0 29.0 6.0 5.0 4.0 5.0 340.0 275.0 80.9 14.0 3.0 0.0 0.0 11.0 6.0 0.0 4.0 22.0 15.0 0.0 0.0 0.0 53.0 11.0 16.0 40.7 NorwichCity 2021
42 2022-05-22 16:00 Premier League Matchweek 38 Sun Home L 0 5 Tottenham 0.0 9.0 0.0 0.0 0.00 NaN 17.1 0.0 0.0 0.0 0.3 0.3 0.04 -0.3 -0.3 Match Report 12.0 7.0 58.3 0.0 4.6 -0.4 0.0 0.0 0.0 0.0 4.0 14.0 28.6 28.0 7.0 39.3 39.1 4.0 75.0 59.3 5.0 0.0 0.0 0.0 11.0 335.0 422.0 79.4 7435.0 1726.0 99.0 114.0 86.8 163.0 184.0 88.6 71.0 116.0 61.2 0.0 0.2 5.0 18.0 4.0 1.0 21.0 422.0 383.0 39.0 9.0 0.0 57.0 24.0 12.0 3.0 2.0 1.0 0.0 299.0 42.0 81.0 136.0 244.0 12.0 17.0 8.0 335.0 0.0 9.0 16.0 8.0 12.0 6.0 2.0 1.0 1.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 11.0 5.0 9.0 2.0 0.0 3.0 16.0 18.8 13.0 117.0 32.0 27.4 44.0 40.0 33.0 8.0 4.0 1.0 4.0 16.0 NaN 7.0 1.0 40.0 505.0 61.0 206.0 222.0 101.0 17.0 466.0 6.0 13.0 46.2 8.0 2.0 344.0 1648.0 875.0 42.0 9.0 4.0 11.0 4.0 417.0 335.0 80.3 21.0 3.0 0.0 0.0 17.0 8.0 0.0 12.0 16.0 5.0 0.0 0.0 0.0 58.0 9.0 13.0 40.9 NorwichCity 2021
43 NaN NaN NaN NaN NaN NaN 8-7-28 32 89 NaN 30.0 426.0 131.0 30.8 0.06 0.21 18.1 18.0 3.0 4.0 33.3 31.0 0.09 -3.3 -4.0 NaN 224.0 142.0 64.3 9.0 78.7 -3.3 13.0 9.0 2.0 2.0 250.0 649.0 38.5 1046.0 194.0 46.6 39.6 309.0 52.4 44.1 350.0 17.0 4.9 17.0 12.9 11974.0 15658.0 76.5 242050.0 81553.0 4705.0 5440.0 86.5 5095.0 6038.0 84.4 1985.0 3585.0 55.4 24.0 22.7 254.0 705.0 230.0 47.0 922.0 15658.0 13820.0 1838.0 475.0 27.0 2280.0 518.0 411.0 164.0 87.0 61.0 4.0 10004.0 2064.0 3590.0 4770.0 8734.0 721.0 767.0 253.0 11974.0 48.0 315.0 518.0 399.0 546.0 368.0 59.0 30.0 29.0 40.0 20.0 32.0 25.0 2.0 1.0 0.0 2.0 2.0 678.0 429.0 358.0 256.0 64.0 256.0 643.0 39.8 387.0 6146.0 1570.0 25.5 2410.0 2488.0 1248.0 639.0 207.0 5.0 432.0 673.0 NaN 840.0 9.0 42.7 20165.0 3068.0 8529.0 8372.0 4463.0 692.0 18371.0 289.0 576.0 50.2 317.0 35.0 11712.0 62580.0 33620.0 1194.0 384.0 115.0 415.0 388.0 14702.0 11974.0 81.4 922.0 72.0 1.0 1.0 524.0 548.0 71.0 411.0 673.0 429.0 2.0 12.0 2.0 2883.0 585.0 669.0 46.7 NorwichCity 2021

4884 rows × 177 columns

#collapse-output
with pd.option_context("display.max_columns", None):
    display(dfAll.describe(include='all'))
Date Time Comp Round Day Venue Result GF GA Opponent Gls Sh_shooting SoT SoT% G/Sh G/SoT Dist FK_shooting PK PKatt_shooting xG npxG npxG/Sh G-xG np:G-xG Match Report SoTA Saves Save% CS PSxG PSxG+/- PKatt_keeper PKA PKsv PKm Cmp_keeper Att_keeper Cmp%_keeper Att_keeper.1 Thr Launch% AvgLen Att_keeper.2 Launch%.1 AvgLen.1 Opp Stp Stp% #OPA AvgDist Cmp_passing Att_passing Cmp%_passing TotDist_passing PrgDist_passing Cmp_passing.1 Att_passing.1 Cmp%_passing.1 Cmp_passing.2 Att_passing.2 Cmp%_passing.2 Cmp_passing.3 Att_passing.3 Cmp%_passing.3 Ast xA KP 1/3_passing PPA CrsPA Prog_passing Att_passing_types Live_passing_types Dead FK_passing_types TB Press_passing_types Sw Crs_passing_types CK In Out Str Ground Low High Left Right Head TI Other Cmp_passing_types Off_passing_types Out.1 Int_passing_types Blocks_passing_types SCA PassLive PassDead Drib Sh_gca Fld_gca Def GCA PassLive.1 PassDead.1 Drib.1 Sh_gca.1 Fld_gca.1 Def.1 Tkl TklW_defense Def 3rd_defense Mid 3rd_defense Att 3rd_defense Tkl.1 Att_defense Tkl% Past Press_defense Succ_defense % Def 3rd_defense.1 Mid 3rd_defense.1 Att 3rd_defense.1 Blocks_defense Sh_defense ShSv Pass Int_defense Tkl+Int Clr Err Poss Touches Def Pen Def 3rd_possession Mid 3rd_possession Att 3rd_possession Att Pen Live_possession Succ_possession Att_possession Succ% #Pl Megs Carries TotDist_possession PrgDist_possession Prog_possession 1/3_possession CPA Mis Dis Targ Rec Rec% Prog_possession.1 CrdY CrdR 2CrdY Fls Fld_misc Off_misc Crs_misc Int_misc TklW_misc PKwon PKcon OG Recov Won Lost Won% team Season
count 4784 4784 4784 4784 4784 4784 4884 4884 4884 4784 4882.000000 4855.000000 4855.000000 4852.000000 4852.000000 4702.000000 4183.000000 4186.000000 4882.000000 4882.000000 4186.000000 4186.000000 4183.000000 4186.000000 4186.000000 4784 4855.000000 4855.000000 4665.000000 4872.000000 4186.000000 4186.000000 4882.000000 4882.000000 4882.000000 4882.000000 4186.000000 4186.000000 4180.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4174.000000 4174.000000 4186.000000 4186.000000 4167.000000 4186.000000 4152.00000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.00000 4186.000000 4880.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4855.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4855.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4855.000000 0.0 4186.000000 4186.000000 4853.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4185.000000 4185.000000 4185.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4186.000000 4882.000000 4882.000000 4882.000000 4855.000000 4855.000000 4851.000000 4855.000000 4855.000000 4855.000000 4429.000000 4429.000000 4882.000000 4186.000000 4186.000000 4186.000000 4186.000000 4884 4884.000000
unique 932 55 8 57 7 3 102 100 79 196 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 28 NaN
top 2020-07-26 15:00 Premier League Group stage Sat Away W 1 1 Chelsea NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Match Report NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Chelsea NaN
freq 20 1304 3800 204 1955 2379 2024 1458 1530 223 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4784 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 302 NaN
mean NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.807046 25.024099 8.454377 34.208780 0.107178 0.302303 17.016161 0.916866 0.212618 0.265465 2.650358 2.447611 0.102305 0.325991 0.297731 NaN 7.774253 5.405973 69.542851 0.599343 2.554754 0.007215 0.244982 0.191315 0.038918 0.014748 12.899188 33.093168 41.963493 47.056856 7.976589 49.877926 41.947181 14.445294 65.545208 51.673766 16.774009 1.260870 7.495824 1.270425 14.44039 778.702819 979.663641 77.703321 15260.310081 5068.769231 318.322026 361.893454 86.891519 330.830387 384.102246 84.293048 115.584329 197.11419 57.708624 2.017623 1.808576 17.715719 58.741042 16.486861 3.909221 64.507883 979.663641 886.577162 93.086479 22.768275 1.907788 146.339226 28.565217 25.307930 10.187769 4.153846 3.428571 0.769231 646.799331 132.541806 200.322504 271.279025 588.523650 39.738175 41.058290 12.734353 778.702819 3.241280 17.334448 22.790731 23.553751 38.194458 27.595318 3.312470 2.381749 1.967511 2.090301 0.847109 4.365504 2.998567 0.281892 0.305781 0.356904 0.320115 0.102246 34.440038 20.515345 17.107023 13.040134 4.292881 11.609651 31.862398 36.524319 20.252747 294.902054 85.661252 29.545485 100.644052 128.161968 66.096034 30.767320 7.319637 0.153846 23.447683 23.006797 NaN 47.913521 0.542284 51.159242 1213.857143 127.489250 388.689441 576.553273 322.073101 47.647396 1122.888677 19.130435 32.597707 58.473340 20.764931 1.397516 763.685141 3891.837076 2095.075490 85.597611 25.697013 8.394265 23.881032 23.030100 923.780698 778.700430 82.872934 68.917821 3.081934 0.107743 0.046293 23.768486 22.819773 3.692022 25.307930 23.006797 20.515345 0.222624 0.245202 0.081934 175.380315 36.778309 36.482083 50.300263 NaN 2018.999795
std NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 10.633434 88.799785 30.533485 15.894195 0.106387 0.260790 2.956683 3.218861 0.931731 1.123947 8.894023 8.208661 0.045897 3.151378 3.001137 NaN 26.650273 18.596239 27.910813 2.295011 8.196880 1.103346 0.979508 0.793245 0.237192 0.134986 41.759788 107.882512 17.911863 150.427337 26.089799 23.767444 11.250174 45.771924 31.433860 17.035460 53.173125 4.175571 10.930395 4.553672 4.93127 2605.489763 3208.035980 7.414447 50630.636493 16299.172616 1068.855698 1202.422025 4.805741 1123.135331 1282.943241 6.690822 377.078549 624.91629 11.307517 7.763338 6.131791 58.080111 197.829053 55.986741 12.654073 214.622704 3208.035980 2925.037009 292.535663 72.090205 7.136800 470.335604 93.135721 88.390685 33.119199 13.731657 12.164358 2.915172 2207.910917 423.589872 630.932794 917.313240 1963.754151 125.726147 129.347165 40.881764 2605.489763 10.552076 54.773459 75.022925 74.970220 125.518394 91.755801 10.691361 8.137498 6.618276 6.873735 2.885309 15.374347 10.847460 1.038030 1.256439 1.326036 1.209979 0.497602 108.322327 70.287520 54.061008 41.315181 13.986381 36.726989 100.519999 14.096407 64.120993 927.776033 270.289930 6.137151 318.495026 404.369150 212.614304 96.855501 23.584690 0.649772 73.919927 80.317731 NaN 152.214124 1.913732 12.765971 3922.818669 401.789918 1232.300695 1895.667845 1063.463054 158.530092 3643.085181 62.084728 104.776235 14.357941 67.270208 4.731968 2539.014612 12895.247448 7030.655371 293.742138 86.595433 29.033431 75.264289 73.034432 3043.493873 2605.477967 6.643909 228.821950 10.680582 0.491127 0.269891 81.167475 78.675002 13.030748 88.390685 80.317731 70.287520 0.936620 0.944928 0.389166 552.150279 116.590220 116.000905 10.049628 NaN 1.414431
min NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 4.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.020000 -10.300000 -9.700000 NaN 0.000000 0.000000 -100.000000 0.000000 0.000000 -11.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 5.000000 0.000000 0.000000 15.800000 0.000000 0.000000 6.700000 0.000000 0.000000 0.000000 0.000000 2.00000 97.000000 179.000000 49.100000 2201.000000 931.000000 34.000000 54.000000 60.700000 22.000000 41.000000 53.700000 11.000000 46.00000 20.400000 0.000000 0.000000 0.000000 3.000000 0.000000 0.000000 2.000000 179.000000 130.000000 21.000000 0.000000 0.000000 5.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 56.000000 12.000000 41.000000 26.000000 92.000000 2.000000 3.000000 0.000000 97.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 4.000000 0.000000 0.000000 0.000000 0.000000 0.000000 3.000000 0.000000 0.000000 39.000000 10.000000 11.500000 4.000000 11.000000 3.000000 2.000000 0.000000 0.000000 1.000000 0.000000 NaN 1.000000 0.000000 17.000000 299.000000 16.000000 56.000000 91.000000 25.000000 0.000000 261.000000 0.000000 3.000000 0.000000 0.000000 0.000000 105.000000 446.000000 188.000000 3.000000 0.000000 0.000000 0.000000 0.000000 150.000000 97.000000 55.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 37.000000 1.000000 1.000000 10.000000 NaN 2017.000000
25% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.000000 9.000000 2.000000 23.800000 0.000000 0.000000 15.100000 0.000000 0.000000 0.000000 0.800000 0.700000 0.070000 -0.600000 -0.600000 NaN 2.000000 1.000000 50.000000 0.000000 0.600000 -0.400000 0.000000 0.000000 0.000000 0.000000 4.000000 10.000000 30.000000 19.000000 2.000000 30.800000 33.000000 5.000000 42.900000 38.600000 5.000000 0.000000 0.000000 0.000000 11.30000 289.250000 395.000000 73.000000 5805.250000 2197.000000 117.000000 139.000000 84.100000 114.000000 141.000000 80.625000 45.000000 88.00000 49.600000 0.000000 0.500000 6.000000 20.000000 5.000000 1.000000 23.000000 395.000000 346.250000 42.000000 9.000000 0.000000 54.000000 10.000000 8.000000 3.000000 1.000000 0.000000 0.000000 224.000000 53.000000 86.000000 99.000000 220.000000 15.000000 17.000000 4.000000 289.250000 1.000000 7.000000 7.000000 9.000000 13.000000 9.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 14.000000 8.000000 6.000000 5.000000 1.000000 4.000000 12.000000 27.300000 7.000000 122.000000 35.000000 25.400000 36.000000 51.000000 25.000000 12.000000 2.000000 0.000000 9.000000 7.000000 NaN 16.000000 0.000000 42.000000 521.000000 53.000000 171.000000 227.000000 120.000000 16.000000 473.000000 7.000000 12.000000 50.000000 7.000000 0.000000 289.000000 1457.500000 743.000000 27.000000 8.000000 2.000000 9.000000 9.000000 365.000000 289.250000 78.700000 25.000000 1.000000 0.000000 0.000000 9.000000 9.000000 1.000000 8.000000 7.000000 8.000000 0.000000 0.000000 0.000000 78.000000 13.000000 13.000000 44.000000 NaN 2018.000000
50% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.000000 12.000000 4.000000 33.300000 0.090000 0.290000 16.900000 0.000000 0.000000 0.000000 1.300000 1.100000 0.090000 -0.100000 -0.100000 NaN 4.000000 3.000000 71.400000 0.000000 1.200000 0.000000 0.000000 0.000000 0.000000 0.000000 6.000000 17.000000 40.000000 23.000000 4.000000 50.000000 40.900000 7.000000 71.400000 54.300000 8.000000 0.000000 0.000000 0.000000 14.00000 384.000000 491.000000 78.600000 7600.500000 2575.000000 155.500000 178.000000 87.700000 162.000000 189.000000 85.600000 58.000000 100.00000 57.800000 1.000000 0.800000 9.000000 28.000000 8.000000 2.000000 32.000000 491.000000 441.000000 48.000000 12.000000 1.000000 72.000000 14.000000 12.000000 5.000000 2.000000 1.000000 0.000000 315.000000 66.000000 102.000000 129.000000 290.000000 20.000000 21.000000 6.000000 384.000000 1.000000 9.000000 11.000000 12.000000 19.000000 13.000000 2.000000 1.000000 1.000000 1.000000 0.000000 2.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 17.000000 10.000000 8.000000 6.000000 2.000000 6.000000 16.000000 36.250000 10.000000 148.000000 43.000000 29.300000 49.000000 64.000000 33.000000 15.000000 3.000000 0.000000 12.000000 11.000000 NaN 23.000000 0.000000 51.000000 611.000000 64.000000 198.000000 288.000000 155.000000 23.000000 564.000000 9.000000 16.000000 58.800000 10.000000 0.000000 379.000000 1935.500000 1024.500000 40.000000 12.000000 4.000000 12.000000 12.000000 460.500000 384.000000 83.800000 34.000000 1.000000 0.000000 0.000000 12.000000 12.000000 2.000000 12.000000 11.000000 10.000000 0.000000 0.000000 0.000000 89.000000 18.000000 18.000000 50.000000 NaN 2019.000000
75% NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.000000 16.000000 6.000000 43.800000 0.170000 0.500000 18.700000 1.000000 0.000000 0.000000 1.900000 1.700000 0.120000 0.600000 0.600000 NaN 6.000000 4.000000 100.000000 1.000000 1.900000 0.400000 0.000000 0.000000 0.000000 0.000000 9.000000 24.000000 50.000000 29.000000 6.000000 68.400000 50.400000 10.000000 100.000000 66.000000 11.000000 1.000000 12.500000 1.000000 16.90000 502.000000 605.000000 83.300000 9785.250000 3022.750000 203.000000 228.000000 90.300000 223.000000 252.000000 89.400000 73.000000 114.00000 66.100000 2.000000 1.300000 12.000000 39.000000 11.000000 3.000000 43.000000 605.000000 557.000000 53.750000 14.000000 2.000000 93.000000 19.000000 17.000000 7.000000 3.000000 3.000000 1.000000 433.000000 82.000000 119.000000 169.000000 379.000000 25.000000 25.000000 9.000000 502.000000 3.000000 11.000000 16.000000 15.000000 26.000000 19.000000 3.000000 2.000000 2.000000 2.000000 1.000000 4.000000 2.000000 0.000000 0.000000 0.000000 0.000000 0.000000 21.000000 13.000000 11.000000 9.000000 3.000000 8.000000 20.000000 45.500000 13.000000 179.000000 53.000000 33.500000 65.000000 80.000000 43.000000 20.000000 5.000000 0.000000 15.000000 16.000000 NaN 32.000000 0.000000 61.000000 723.000000 77.000000 226.000000 358.000000 206.000000 32.000000 676.000000 13.000000 21.000000 68.150000 14.000000 1.000000 491.000000 2538.000000 1391.750000 58.000000 17.000000 6.000000 15.000000 15.000000 578.000000 502.000000 87.800000 45.000000 2.000000 0.000000 0.000000 15.000000 14.000000 3.000000 17.000000 16.000000 13.000000 0.000000 0.000000 0.000000 102.000000 24.000000 24.000000 56.700000 NaN 2020.000000
max NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 162.000000 1140.000000 418.000000 100.000000 1.000000 1.000000 34.900000 50.000000 18.000000 22.000000 115.600000 107.600000 0.370000 59.200000 57.500000 NaN 246.000000 184.000000 100.000000 33.000000 78.700000 11.900000 13.000000 13.000000 3.000000 3.000000 435.000000 1122.000000 100.000000 1702.000000 290.000000 100.000000 78.000000 417.000000 100.000000 106.000000 464.000000 53.000000 100.000000 80.000000 85.00000 32015.000000 37015.000000 93.000000 597272.000000 176070.000000 14699.000000 16107.000000 97.700000 14051.000000 15205.000000 96.400000 4291.000000 6303.00000 89.000000 127.000000 82.600000 676.000000 2552.000000 734.000000 135.000000 2578.000000 37015.000000 34410.000000 2728.000000 751.000000 136.000000 5062.000000 1145.000000 1035.000000 398.000000 170.000000 267.000000 51.000000 28713.000000 4655.000000 5891.000000 14393.000000 27064.000000 1190.000000 1223.000000 419.000000 32015.000000 110.000000 509.000000 962.000000 744.000000 1471.000000 1128.000000 130.000000 119.000000 97.000000 81.000000 39.000000 216.000000 164.000000 14.000000 19.000000 22.000000 17.000000 10.000000 901.000000 670.000000 476.000000 377.000000 168.000000 373.000000 916.000000 100.000000 585.000000 8089.000000 2429.000000 53.800000 2948.000000 3705.000000 2388.000000 867.000000 251.000000 9.000000 688.000000 846.000000 NaN 1486.000000 29.000000 85.000000 43032.000000 3617.000000 12063.000000 22515.000000 13154.000000 2090.000000 40508.000000 695.000000 1086.000000 100.000000 743.000000 70.000000 31423.000000 154478.000000 89994.000000 4119.000000 1150.000000 463.000000 697.000000 735.000000 35814.000000 32015.000000 99.600000 2725.000000 110.000000 7.000000 4.000000 836.000000 889.000000 170.000000 1035.000000 846.000000 670.000000 16.000000 12.000000 5.000000 5087.000000 1155.000000 1193.000000 90.900000 NaN 2021.000000
#collapse-output
[ (x,dfAll[x].dtype) for x in dfAll]
[('Date', dtype('O')),
 ('Time', dtype('O')),
 ('Comp', dtype('O')),
 ('Round', dtype('O')),
 ('Day', dtype('O')),
 ('Venue', dtype('O')),
 ('Result', dtype('O')),
 ('GF', dtype('O')),
 ('GA', dtype('O')),
 ('Opponent', dtype('O')),
 ('Gls', dtype('float64')),
 ('Sh_shooting', dtype('float64')),
 ('SoT', dtype('float64')),
 ('SoT%', dtype('float64')),
 ('G/Sh', dtype('float64')),
 ('G/SoT', dtype('float64')),
 ('Dist', dtype('float64')),
 ('FK_shooting', dtype('float64')),
 ('PK', dtype('float64')),
 ('PKatt_shooting', dtype('float64')),
 ('xG', dtype('float64')),
 ('npxG', dtype('float64')),
 ('npxG/Sh', dtype('float64')),
 ('G-xG', dtype('float64')),
 ('np:G-xG', dtype('float64')),
 ('Match Report', dtype('O')),
 ('SoTA', dtype('float64')),
 ('Saves', dtype('float64')),
 ('Save%', dtype('float64')),
 ('CS', dtype('float64')),
 ('PSxG', dtype('float64')),
 ('PSxG+/-', dtype('float64')),
 ('PKatt_keeper', dtype('float64')),
 ('PKA', dtype('float64')),
 ('PKsv', dtype('float64')),
 ('PKm', dtype('float64')),
 ('Cmp_keeper', dtype('float64')),
 ('Att_keeper', dtype('float64')),
 ('Cmp%_keeper', dtype('float64')),
 ('Att_keeper.1', dtype('float64')),
 ('Thr', dtype('float64')),
 ('Launch%', dtype('float64')),
 ('AvgLen', dtype('float64')),
 ('Att_keeper.2', dtype('float64')),
 ('Launch%.1', dtype('float64')),
 ('AvgLen.1', dtype('float64')),
 ('Opp', dtype('float64')),
 ('Stp', dtype('float64')),
 ('Stp%', dtype('float64')),
 ('#OPA', dtype('float64')),
 ('AvgDist', dtype('float64')),
 ('Cmp_passing', dtype('float64')),
 ('Att_passing', dtype('float64')),
 ('Cmp%_passing', dtype('float64')),
 ('TotDist_passing', dtype('float64')),
 ('PrgDist_passing', dtype('float64')),
 ('Cmp_passing.1', dtype('float64')),
 ('Att_passing.1', dtype('float64')),
 ('Cmp%_passing.1', dtype('float64')),
 ('Cmp_passing.2', dtype('float64')),
 ('Att_passing.2', dtype('float64')),
 ('Cmp%_passing.2', dtype('float64')),
 ('Cmp_passing.3', dtype('float64')),
 ('Att_passing.3', dtype('float64')),
 ('Cmp%_passing.3', dtype('float64')),
 ('Ast', dtype('float64')),
 ('xA', dtype('float64')),
 ('KP', dtype('float64')),
 ('1/3_passing', dtype('float64')),
 ('PPA', dtype('float64')),
 ('CrsPA', dtype('float64')),
 ('Prog_passing', dtype('float64')),
 ('Att_passing_types', dtype('float64')),
 ('Live_passing_types', dtype('float64')),
 ('Dead', dtype('float64')),
 ('FK_passing_types', dtype('float64')),
 ('TB', dtype('float64')),
 ('Press_passing_types', dtype('float64')),
 ('Sw', dtype('float64')),
 ('Crs_passing_types', dtype('float64')),
 ('CK', dtype('float64')),
 ('In', dtype('float64')),
 ('Out', dtype('float64')),
 ('Str', dtype('float64')),
 ('Ground', dtype('float64')),
 ('Low', dtype('float64')),
 ('High', dtype('float64')),
 ('Left', dtype('float64')),
 ('Right', dtype('float64')),
 ('Head', dtype('float64')),
 ('TI', dtype('float64')),
 ('Other', dtype('float64')),
 ('Cmp_passing_types', dtype('float64')),
 ('Off_passing_types', dtype('float64')),
 ('Out.1', dtype('float64')),
 ('Int_passing_types', dtype('float64')),
 ('Blocks_passing_types', dtype('float64')),
 ('SCA', dtype('float64')),
 ('PassLive', dtype('float64')),
 ('PassDead', dtype('float64')),
 ('Drib', dtype('float64')),
 ('Sh_gca', dtype('float64')),
 ('Fld_gca', dtype('float64')),
 ('Def', dtype('float64')),
 ('GCA', dtype('float64')),
 ('PassLive.1', dtype('float64')),
 ('PassDead.1', dtype('float64')),
 ('Drib.1', dtype('float64')),
 ('Sh_gca.1', dtype('float64')),
 ('Fld_gca.1', dtype('float64')),
 ('Def.1', dtype('float64')),
 ('Tkl', dtype('float64')),
 ('TklW_defense', dtype('float64')),
 ('Def 3rd_defense', dtype('float64')),
 ('Mid 3rd_defense', dtype('float64')),
 ('Att 3rd_defense', dtype('float64')),
 ('Tkl.1', dtype('float64')),
 ('Att_defense', dtype('float64')),
 ('Tkl%', dtype('float64')),
 ('Past', dtype('float64')),
 ('Press_defense', dtype('float64')),
 ('Succ_defense', dtype('float64')),
 ('%', dtype('float64')),
 ('Def 3rd_defense.1', dtype('float64')),
 ('Mid 3rd_defense.1', dtype('float64')),
 ('Att 3rd_defense.1', dtype('float64')),
 ('Blocks_defense', dtype('float64')),
 ('Sh_defense', dtype('float64')),
 ('ShSv', dtype('float64')),
 ('Pass', dtype('float64')),
 ('Int_defense', dtype('float64')),
 ('Tkl+Int', dtype('float64')),
 ('Clr', dtype('float64')),
 ('Err', dtype('float64')),
 ('Poss', dtype('float64')),
 ('Touches', dtype('float64')),
 ('Def Pen', dtype('float64')),
 ('Def 3rd_possession', dtype('float64')),
 ('Mid 3rd_possession', dtype('float64')),
 ('Att 3rd_possession', dtype('float64')),
 ('Att Pen', dtype('float64')),
 ('Live_possession', dtype('float64')),
 ('Succ_possession', dtype('float64')),
 ('Att_possession', dtype('float64')),
 ('Succ%', dtype('float64')),
 ('#Pl', dtype('float64')),
 ('Megs', dtype('float64')),
 ('Carries', dtype('float64')),
 ('TotDist_possession', dtype('float64')),
 ('PrgDist_possession', dtype('float64')),
 ('Prog_possession', dtype('float64')),
 ('1/3_possession', dtype('float64')),
 ('CPA', dtype('float64')),
 ('Mis', dtype('float64')),
 ('Dis', dtype('float64')),
 ('Targ', dtype('float64')),
 ('Rec', dtype('float64')),
 ('Rec%', dtype('float64')),
 ('Prog_possession.1', dtype('float64')),
 ('CrdY', dtype('float64')),
 ('CrdR', dtype('float64')),
 ('2CrdY', dtype('float64')),
 ('Fls', dtype('float64')),
 ('Fld_misc', dtype('float64')),
 ('Off_misc', dtype('float64')),
 ('Crs_misc', dtype('float64')),
 ('Int_misc', dtype('float64')),
 ('TklW_misc', dtype('float64')),
 ('PKwon', dtype('float64')),
 ('PKcon', dtype('float64')),
 ('OG', dtype('float64')),
 ('Recov', dtype('float64')),
 ('Won', dtype('float64')),
 ('Lost', dtype('float64')),
 ('Won%', dtype('float64')),
 ('team', dtype('O')),
 ('Season', dtype('int64'))]

Change some columns

  • Add the predictor column as Win from result
  • Change time to an int of the time eg 16:30 goes to 16.5
  • Change date to day, month, year and day of week weekday
  • Convert result to int of 2, 1, 0 for W/D/L
  • Change round to just an int of the matchweek
  • Convert some columns to int
  • Sort the DataFrame by season and then round
  • Make sure team names are consistent (sometimes name in opponent column differ from team column
  • drop columns won’t be using
  • select matches from Premier league only
  • create a new DataFrame not lined to old one in case we want old details
import re
# Some date time functions
def just_time(time):
    return time.time().hour+time.time().minute/60
def just_datesDay(time):
    return time.day
def just_datesMonth(time):
    return time.month
def just_datesYear(time):
    return time.year
def just_datesWeekDay(time):
    return time.weekday()

# Mods to target result
def Result(string):
    if string=='W':
        return 2
    elif string=='D':
        return 1
    else:
        return 0
    
def columnMods(matches):
    
    cols= matches.columns
    cols=[x.lower() for x in cols]
    matches.columns = cols
    
    # Misc
    
    matches = matches.astype({'gf': 'int'})
    matches = matches.astype({'ga': 'int'})
    
    
    # some adjustments to time/dates
    
    matches=matches.astype({'date': 'datetime64[ns]'})
    matches = matches.astype({'time': 'datetime64[ns]'})
    
    matches['day'] = matches['date'].apply(just_datesDay)
    matches['month'] = matches['date'].apply(just_datesMonth)
    matches['year'] = matches['date'].apply(just_datesYear)
    matches['weekday'] = matches['date'].apply(just_datesWeekDay)

    # target and results
    matches['Win']=matches['result']
    
    matches['NetScore']=matches['gf']-matches['ga']
    matches['GoalsFor']=matches['gf']
    matches['GoalsAgainst']=matches['ga']

    matches['result'] = matches['NetScore']#matches['result'].apply(Result)
    
    # Change round to an int
    matches['round']=matches['round'].str.replace('Matchweek ','').astype('int')
    
    # drop some columns
    
    matches.drop(columns=['comp','match report','date','time','tkl+int'],inplace=True)
    
    try:
        matches = matches.drop(columns='index')
    except:
        pass
        
    try:
        matches = matches.drop(columns='unnamed: 0')
    except:
        pass
    
    
    #order by date
    matches=matches.sort_values(['season','round']).reset_index(drop=True)

    
    # team name changes
    changeTeamName={'Brighton':'Brighton and Hove Albion',
                'Manchester Utd':'Manchester United',
                'Newcastle Utd':'Newcastle United',
                'Sheffield Utd' : 'Sheffield United',
                'Huddersfield' : 'Huddersfield Town',
                'Tottenham' : 'Tottenham Hotspur',
                'West Brom' : 'West Bromwich Albion',
                'West Ham' : 'West Ham United',
                'Wolves' : 'Wolverhampton Wanderers'}
    matches['opponent']=matches['opponent'].replace(changeTeamName)
    matches['team']=matches['team'].replace(changeTeamName)

    noms=pd.concat([matches.team,matches.opponent]).sort_values().unique()

    teamDict={}
    for team in noms:

        x= re.sub(r"([a-z])([A-Z])",r"\1 \2",team) 
        teamout =  re.sub(r"(and)\s",r" \1 ",x).strip().replace('  ',' ')    
        teamDict[team]=teamout
        
    matches['team']=matches['team'].replace(teamDict)
    matches['opponent']=matches['opponent'].replace(teamDict)
    
    return matches

import copy
matches=copy.copy(dfAll)
matches=matches[matches.Comp=='Premier League']

matches = columnMods(matches)
matches[matches.season==2019]
round day venue result gf ga opponent gls sh_shooting sot ... won% team season month year weekday Win NetScore GoalsFor GoalsAgainst
1520 1 9 Home 3 4 1 Norwich City 3.0 15.0 7.0 ... 77.8 Liverpool 2019 8 2019 4 W 3 4 1
1521 1 10 Away 5 5 0 West Ham United 5.0 13.0 8.0 ... 50.0 Manchester City 2019 8 2019 5 W 5 5 0
1522 1 11 Home 4 4 0 Chelsea 4.0 10.0 4.0 ... 70.6 Manchester United 2019 8 2019 6 W 4 4 0
1523 1 11 Away -4 0 4 Manchester United 0.0 18.0 7.0 ... 29.4 Chelsea 2019 8 2019 6 L -4 0 4
1524 1 11 Home 0 0 0 Wolverhampton Wanderers 0.0 17.0 1.0 ... 56.8 Leicester City 2019 8 2019 6 D 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2275 38 26 Home 0 1 1 Aston Villa 1.0 10.0 1.0 ... 61.0 West Ham United 2019 7 2020 6 D 0 1 1
2276 38 26 Away 0 1 1 West Ham United 1.0 13.0 4.0 ... 39.0 Aston Villa 2019 7 2020 6 D 0 1 1
2277 38 26 Away 2 3 1 Everton 3.0 12.0 6.0 ... 46.3 Bournemouth 2019 7 2020 6 W 2 3 1
2278 38 26 Away -1 2 3 Arsenal 2.0 18.0 5.0 ... 56.1 Watford 2019 7 2020 6 L -1 2 3
2279 38 26 Away -5 0 5 Manchester City 0.0 5.0 4.0 ... 40.0 Norwich City 2019 7 2020 6 L -5 0 5

760 rows × 179 columns

cluster_features_shootingAll = [
    "gls_x","sh_shooting_x",
    "sot_x","sot%_x","g/sh_x","g/sot_x",
    "dist_x",'fk_shooting_x',"pk_x","pkatt_shooting_x",
    'xg_x','npxg_x','npxg/sh_x','g-xg_x','np:g-xg_x'
]

cluster_features_keeperAll=[
    'sota_x','saves_x','save%_x','cs_x',
 'psxg_x','psxg+/-_x','pkatt_keeper_x','pka_x',
 'pksv_x','pkm_x','cmp_keeper_x','att_keeper_x','cmp%_keeper_x',
'cmp%_keeper_x','att_keeper.1_x','thr_x','launch%_x','avglen_x',
'att_keeper.2_x','launch%.1_x','avglen.1_x','opp_x','stp_x','stp%_x',
    '#opa_x','avgdist_x'
]

cluster_features_passingAll=[
 'cmp_passing_x', 'att_passing_x', 'cmp%_passing_x', 'totdist_passing_x', 'prgdist_passing_x', 'cmp_passing.1_x', 'att_passing.1_x', 'cmp%_passing.1_x',
 'cmp_passing.2_x', 'att_passing.2_x', 'cmp%_passing.2_x', 'cmp_passing.3_x', 'att_passing.3_x', 'cmp%_passing.3_x', 'ast_x', 'xa_x', 'kp_x', '1/3_passing_x',
 'ppa_x', 'crspa_x', 'prog_passing_x']

cluster_features_passtypeAll=[
 'att_passing_types_x', 'live_passing_types_x', 'dead_x', 'fk_passing_types_x', 'tb_x', 'press_passing_types_x', 'sw_x', 'crs_passing_types_x',
 'ck_x', 'in_x', 'out_x', 'str_x', 'ground_x', 'low_x', 'high_x', 'left_x', 'right_x', 'head_x', 'ti_x', 'other_x', 'cmp_passing_types_x', 'off_passing_types_x', 'out.1_x', 'int_passing_types_x', 'blocks_passing_types_x']

cluster_features_shotcreateAll=[
 'sca_x', 'passlive_x', 'passdead_x', 'drib_x', 'sh_gca_x', 'fld_gca_x', 'def_x', 'gca_x', 'passlive.1_x', 'passdead.1_x', 'drib.1_x', 'sh_gca.1_x',
 'fld_gca.1_x', 'def.1_x']

cluster_features_tackleAll=[ 'tkl_x', 'tklw_defense_x', 'def 3rd_defense_x', 'mid 3rd_defense_x', 'att 3rd_defense_x', 'tkl.1_x', 'att_defense_x',
 'tkl%_x', 'past_x', 'press_defense_x', 'succ_defense_x', '%_x', 'def 3rd_defense.1_x', 'mid 3rd_defense.1_x', 'att 3rd_defense.1_x', 'blocks_defense_x',
 'sh_defense_x', 'shsv_x', 'pass_x', 'tkl+int_x','int_defense_x', 'clr_x', 'err_x']

cluster_features_possessionAll=[
 'poss_x', 'touches_x', 'def pen_x', 'def 3rd_possession_x', 'mid 3rd_possession_x', 'att 3rd_possession_x', 'att pen_x', 'live_possession_x', 'succ_possession_x', 'att_possession_x', 'succ%_x',
 '#pl_x', 'megs_x', 'carries_x', 'totdist_possession_x', 'prgdist_possession_x', 'prog_possession_x', '1/3_possession_x', 'cpa_x', 'mis_x', 'dis_x', 'targ_x', 'rec_x',
 'rec%_x', 'prog_possession.1_x']

cluster_features_miscAll=[ 'crdy_x', 'crdr_x', '2crdy_x', 'fls_x', 'fld_misc_x', 'off_misc_x', 'crs_misc_x', 'int_misc_x', 'tklw_misc_x', 'pkwon_x', 'pkcon_x', 'og_x', 'recov_x', 'won_x', 'lost_x', 'won%_x']
# [x for x in df]
import re

def do_dict_col_names(dict1,string,col_names):
    regexp=string
    
    for x in col_names:
        x=re.sub('(_x$)','',x)
        if not re.search(regexp,x):
            x2=string+'_'+x
#             print(x2)
            dict1[x]=x2
        else:
            x2=re.sub(regexp,'',x)
            x2=string+'_'+x2
            x2=x2.strip('_')
            dict1[x]=x2
#             print(x,'---',x2)
    return dict1

dict1={}
dict1=do_dict_col_names(dict1,r'shooting',cluster_features_shootingAll)
dict1=do_dict_col_names(dict1,r'keeper',cluster_features_keeperAll)
dict1=do_dict_col_names(dict1,r'passing',cluster_features_passingAll)
dict1=do_dict_col_names(dict1,r'passing_types',cluster_features_passtypeAll)
dict1=do_dict_col_names(dict1,r'shotcreate',cluster_features_shotcreateAll)
dict1=do_dict_col_names(dict1,r'tackle',cluster_features_tackleAll)
dict1=do_dict_col_names(dict1,r'possession',cluster_features_possessionAll)
dict1=do_dict_col_names(dict1,r'misc',cluster_features_miscAll)


dictall=dict1.copy()

matches.columns=matches.columns.str.lower()
    
cols=[]
for x in matches:
    try:
        cols.append(dictall[x])
#         print('---')
    except:
        cols.append(x)
        
# for i,x in enumerate(cols):
#     if re.search(r'^passing_types',x):
#         x=re.sub(r'(^passing_types)','passingtypes',x)
#         cols[i]=x
matches.columns=cols
matches.columns=matches.columns.str.replace('passing_types','passingtypes')
noms=pd.concat([matches.team,matches.opponent]).sort_values().unique()
noms
array(['Arsenal', 'Aston Villa', 'Bournemouth', 'Brentford',
       'Brighton and Hove Albion', 'Burnley', 'Cardiff City', 'Chelsea',
       'Crystal Palace', 'Everton', 'Fulham', 'Huddersfield Town',
       'Leeds United', 'Leicester City', 'Liverpool', 'Manchester City',
       'Manchester United', 'Newcastle United', 'Norwich City',
       'Sheffield United', 'Southampton', 'Stoke City', 'Swansea City',
       'Tottenham Hotspur', 'Watford', 'West Bromwich Albion',
       'West Ham United', 'Wolverhampton Wanderers'], dtype=object)

I want to make some columns percentages

i.e. number of passes in final 3rd -> pc of passes in final 3rd

def change_to_pc(col,col_tot,df_matches):
    df_matches.loc[:,col]=df_matches.loc[:,col].div(df_matches.loc[:,col_tot])*100
    col_new=re.sub(r'(^[a-z]*_)',r"\1PC_",col)

    df_matches.rename(columns={col:col_new},inplace=True)
    return df_matches
# drop columns 'attempts' but keep the % one after it
cols_to_drop=[x for x in matches if re.search(r'^passing_',x) and re.search(r'att',x)]
print(cols_to_drop)
matches = matches.drop(columns=cols_to_drop)
['passing_att', 'passing_att_.1', 'passing_att_.2', 'passing_att_.3']
cols_to_pc=[x for x in matches if re.search(r'passing_',x) and not re.search(r'passing_ty',x) and not re.search(r'%',x)\
and not re.search(r'xa',x) and not re.search(r'kp',x) and not re.search(r'ast',x)\
and not re.search(r'dist',x) and not re.search(r'passing_cmp$',x) ]

print(cols_to_pc)
col_tot='passing_cmp'
for col in cols_to_pc:
    matches = change_to_pc(col,col_tot,matches)
    


   
matches = change_to_pc('passing_prgdist','passing_cmp',matches)
matches = change_to_pc('passing_totdist','passing_cmp',matches)
['passing_cmp_.1', 'passing_cmp_.2', 'passing_cmp_.3', 'passing_1/3', 'passing_ppa', 'passing_crspa', 'passing_prog']
matches=matches.rename(columns={'passing_cmp': 'passing_pass_complete',
 'passing_PCcmp_.1': 'passing_pass_complete.shortPC',
 'passing_PCcmp_.2': 'passing_pass_complete.mediumPC',
 'passing_PCcmp_.3': 'passing_pass_complete.longPC',
})

cols_to_pc=[x for x in matches if re.search(r'passingtype',x) and not re.search(r'passingtypes_att',x)]
print(cols_to_pc)
col_tot='passingtypes_att'
for col in cols_to_pc:
    matches = change_to_pc(col,col_tot,matches)
   
matches=matches.drop(columns='passingtypes_att')
['passingtypes_live', 'passingtypes_dead', 'passingtypes_fk', 'passingtypes_tb', 'passingtypes_press', 'passingtypes_sw', 'passingtypes_crs', 'passingtypes_ck', 'passingtypes_in', 'passingtypes_out', 'passingtypes_str', 'passingtypes_ground', 'passingtypes_low', 'passingtypes_high', 'passingtypes_left', 'passingtypes_right', 'passingtypes_head', 'passingtypes_ti', 'passingtypes_other', 'passingtypes_cmp', 'passingtypes_off', 'passingtypes_out.1', 'passingtypes_int', 'passingtypes_blocks']

matches=matches.rename(columns={'tackle_tkl.1':'tackle_tkl_dribble',
                      'tackle_tkl%' : 'tackle_dribble%',
                      'tackle_past' : 'tackle_dribllepast',
                       'tackle_def 3rd_defense.1':'tackle_press_def3rd',
                       'tackle_mid 3rd_defense.1':'tackle_press_mid3rd',    
                       'tackle_att 3rd_defense.1':'tackle_press_att3rd',
                     })
matches=matches.drop(columns='tackle_att_defense')
matches=matches.drop(columns='tackle_succ_defense')
## this makes tackles as % for first 6 values
cols_to_pc=[x for x in matches if re.search(r'tackle',x) \
            and not re.search(r'passingtypes_att',x)]
col_tot=cols_to_pc[0]
cols_to_pc = cols_to_pc[1:6]

for col in cols_to_pc:
    matches = change_to_pc(col,col_tot,matches)
    

cols_to_pc=['tackle_press_def3rd',
'tackle_press_mid3rd',    
'tackle_press_att3rd']
col_tot='tackle_press_defense'
for col in cols_to_pc:
    matches = change_to_pc(col,col_tot,matches)
    
col_tot='tackle_blocks_defense'
cols_to_pc=['tackle_sh_defense','tackle_shsv' ,'tackle_pass']
for col in cols_to_pc:
    matches = change_to_pc(col,col_tot,matches)
## possession touches
cols_to_pc=[x for x in matches if re.search(r'possession',x)  ]


col_tot=cols_to_pc[1]
cols_to_pc=cols_to_pc[2:8]
print(cols_to_pc, col_tot)
for col in cols_to_pc:
    matches = change_to_pc(col,col_tot,matches)
['possession_def pen', 'possession_def 3rd', 'possession_mid 3rd', 'possession_att 3rd', 'possession_att pen', 'possession_live'] possession_touches
matches=matches.rename(columns={'possession_succ':'possession_dribblesucc',
'possession_att':'possession_dribbleatt',
'possession_succ%':'possession_dribblesucc%',
'possession_#pl':'possession_dribblepast'})
col_tot='possession_totdist'
col='possession_prgdist'
matches = change_to_pc(col,col_tot,matches)

col_tot='possession_carries'
cols_to_pc=['possession_prog','possession_1/3','possession_cpa','possession_mis','possession_dis']
for col in cols_to_pc:
    matches = change_to_pc(col,col_tot,matches)
    

col_tot='shooting_sh'
col='shooting_dist'
matches = change_to_pc(col,col_tot,matches)
matches.to_csv(folda+'epl_beforeAVG_HA.csv')

I want to predict the results without knowing details of the game

  • so some mods

So as a first step I will give each gameweek stats for the previous gameweek and the average of the last 5 gameweeks

These stats are everything EXCEPT:

  • round
  • day
  • venue
  • opponent
  • team
  • month
  • year
  • weekday
  • Win
#collapse-output
no_change=['round','day','venue','opponent','team',\
 'month','year','weekday','season','Win','NetScore','GoalsFor','GoalsAgainst']
colRepeat = [x for x in matches.columns if x not in no_change]
[x for x in colRepeat]
['result',
 'gf',
 'ga',
 'shooting_gls',
 'shooting_sh',
 'shooting_sot',
 'shooting_sot%',
 'shooting_g/sh',
 'shooting_g/sot',
 'shooting_PC_dist',
 'shooting_fk',
 'shooting_pk',
 'shooting_pkatt',
 'shooting_xg',
 'shooting_npxg',
 'shooting_npxg/sh',
 'shooting_g-xg',
 'shooting_np:g-xg',
 'keeper_sota',
 'keeper_saves',
 'keeper_save%',
 'keeper_cs',
 'keeper_psxg',
 'keeper_psxg+/-',
 'keeper_pkatt',
 'keeper_pka',
 'keeper_pksv',
 'keeper_pkm',
 'keeper_cmp',
 'keeper_att',
 'keeper_cmp%',
 'keeper_att_.1',
 'keeper_thr',
 'keeper_launch%',
 'keeper_avglen',
 'keeper_att_.2',
 'keeper_launch%.1',
 'keeper_avglen.1',
 'keeper_opp',
 'keeper_stp',
 'keeper_stp%',
 'keeper_#opa',
 'keeper_avgdist',
 'passing_pass_complete',
 'passing_cmp%',
 'passing_PC_totdist',
 'passing_PC_prgdist',
 'passing_PC_cmp_.1',
 'passing_cmp%_.1',
 'passing_PC_cmp_.2',
 'passing_cmp%_.2',
 'passing_PC_cmp_.3',
 'passing_cmp%_.3',
 'passing_ast',
 'passing_xa',
 'passing_kp',
 'passing_PC_1/3',
 'passing_PC_ppa',
 'passing_PC_crspa',
 'passing_PC_prog',
 'passingtypes_PC_live',
 'passingtypes_PC_dead',
 'passingtypes_PC_fk',
 'passingtypes_PC_tb',
 'passingtypes_PC_press',
 'passingtypes_PC_sw',
 'passingtypes_PC_crs',
 'passingtypes_PC_ck',
 'passingtypes_PC_in',
 'passingtypes_PC_out',
 'passingtypes_PC_str',
 'passingtypes_PC_ground',
 'passingtypes_PC_low',
 'passingtypes_PC_high',
 'passingtypes_PC_left',
 'passingtypes_PC_right',
 'passingtypes_PC_head',
 'passingtypes_PC_ti',
 'passingtypes_PC_other',
 'passingtypes_PC_cmp',
 'passingtypes_PC_off',
 'passingtypes_PC_out.1',
 'passingtypes_PC_int',
 'passingtypes_PC_blocks',
 'shotcreate_sca',
 'shotcreate_passlive',
 'shotcreate_passdead',
 'shotcreate_drib',
 'shotcreate_sh_gca',
 'shotcreate_fld_gca',
 'shotcreate_def',
 'shotcreate_gca',
 'shotcreate_passlive.1',
 'shotcreate_passdead.1',
 'shotcreate_drib.1',
 'shotcreate_sh_gca.1',
 'shotcreate_fld_gca.1',
 'shotcreate_def.1',
 'tackle_tkl',
 'tackle_PC_tklw_defense',
 'tackle_PC_def 3rd_defense',
 'tackle_PC_mid 3rd_defense',
 'tackle_PC_att 3rd_defense',
 'tackle_PC_tkl_dribble',
 'tackle_dribble%',
 'tackle_dribllepast',
 'tackle_press_defense',
 'tackle_%',
 'tackle_PC_press_def3rd',
 'tackle_PC_press_mid3rd',
 'tackle_PC_press_att3rd',
 'tackle_blocks_defense',
 'tackle_PC_sh_defense',
 'tackle_PC_shsv',
 'tackle_PC_pass',
 'tackle_int_defense',
 'tackle_clr',
 'tackle_err',
 'possession_poss',
 'possession_touches',
 'possession_PC_def pen',
 'possession_PC_def 3rd',
 'possession_PC_mid 3rd',
 'possession_PC_att 3rd',
 'possession_PC_att pen',
 'possession_PC_live',
 'possession_dribblesucc',
 'possession_dribbleatt',
 'possession_dribblesucc%',
 'possession_dribblepast',
 'possession_megs',
 'possession_carries',
 'possession_totdist',
 'possession_PC_prgdist',
 'possession_PC_prog',
 'possession_PC_1/3',
 'possession_PC_cpa',
 'possession_PC_mis',
 'possession_PC_dis',
 'possession_targ',
 'possession_rec',
 'possession_rec%',
 'possession_prog_.1',
 'misc_crdy',
 'misc_crdr',
 'misc_2crdy',
 'misc_fls',
 'misc_fld',
 'misc_off',
 'misc_crs',
 'misc_int',
 'misc_tklw',
 'misc_pkwon',
 'misc_pkcon',
 'misc_og',
 'misc_recov',
 'misc_won',
 'misc_lost',
 'misc_won%',
 'win',
 'netscore',
 'goalsfor',
 'goalsagainst']

Converts data into rolling average of previous matches

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.rolling.html

windowint, offset, or BaseIndexer subclass
  • Size of the moving window.

  • If an integer, the fixed number of observations used for each window.

  • If an offset, the time period of each window. Each window will be a variable sized based on the observations included in the time-period. This is only valid for datetimelike indexes. To learn more about the offsets & frequency strings, please see this link.

  • If a BaseIndexer subclass, the window boundaries based on the defined get_window_bounds method. Additional rolling keyword arguments, namely min_periods, center, and closed will be passed to get_window_bounds.

min_periodsint, default None
  • Minimum number of observations in window required to have a value; otherwise, result is np.nan.

  • For a window that is specified by an offset, min_periods will default to 1.

  • For a window that is specified by an integer, min_periods will default to the size of the window.

closed : str, default None
  • If ‘right’, the first point in the window is excluded from calculations.

  • If ‘left’, the last point in the window is excluded from calculations.

  • If ‘both’, the no points in the window are excluded from calculations.

  • If ‘neither’, the first and last points in the window are excluded from calculations.

  • Default None (‘right’).

#collapse-output
matches_=matches.copy()
for x in matches_.team.unique():
    matches_.loc[matches_.team==x,colRepeat] = matches_.loc[matches_.team==x,colRepeat].rolling(window=3,closed='left',min_periods=1).mean()

matches_.loc[matches_['round']>10].head()
round day venue result gf ga opponent shooting_gls shooting_sh shooting_sot ... misc_won% team season month year weekday win netscore goalsfor goalsagainst
200 11 5 Home 3.000000 4.333333 1.333333 Arsenal 4.333333 16.666667 8.666667 ... 63.633333 Manchester City 2017 11 2017 6 NaN 3.000000 4.333333 1.333333
201 11 5 Away 0.000000 0.666667 0.666667 Chelsea 0.666667 9.000000 2.666667 ... 54.000000 Manchester United 2017 11 2017 6 NaN 0.000000 0.666667 0.666667
202 11 5 Home 1.000000 1.666667 0.666667 Crystal Palace 1.666667 14.666667 5.000000 ... 51.466667 Tottenham Hotspur 2017 11 2017 6 NaN 1.000000 1.666667 0.666667
203 11 4 Away 0.000000 1.333333 1.333333 West Ham United 1.333333 16.000000 6.333333 ... 46.800000 Liverpool 2017 11 2017 5 NaN 0.000000 1.333333 1.333333
204 11 5 Home 0.666667 2.000000 1.333333 Manchester United 2.000000 16.000000 5.666667 ... 63.433333 Chelsea 2017 11 2017 6 NaN 0.666667 2.000000 1.333333

5 rows × 172 columns

matches_.loc[(matches.team=='Arsenal'),['round',]]
round
5 1
25 2
45 3
65 4
85 5
... ...
3704 34
3724 35
3744 36
3764 37
3784 38

190 rows × 1 columns

matches.loc[(matches.team=='Arsenal')]
round day venue result gf ga opponent shooting_gls shooting_sh shooting_sot ... misc_won% team season month year weekday win netscore goalsfor goalsagainst
5 1 11 Home 1 4 3 Leicester City 4.0 27.0 10.0 ... 50.0 Arsenal 2017 8 2017 4 W 1 4 3
25 2 19 Away -1 0 1 Stoke City 0.0 19.0 7.0 ... 40.0 Arsenal 2017 8 2017 5 L -1 0 1
45 3 27 Away -4 0 4 Liverpool 0.0 8.0 0.0 ... 54.5 Arsenal 2017 8 2017 6 L -4 0 4
65 4 9 Home 3 3 0 Bournemouth 3.0 17.0 9.0 ... 51.1 Arsenal 2017 9 2017 5 W 3 3 0
85 5 17 Away 0 0 0 Chelsea 0.0 11.0 2.0 ... 48.8 Arsenal 2017 9 2017 6 D 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3704 34 23 Home 2 3 1 Manchester United 3.0 13.0 6.0 ... 19.0 Arsenal 2021 4 2022 5 W 2 3 1
3724 35 1 Away 1 2 1 West Ham United 2.0 13.0 7.0 ... 51.5 Arsenal 2021 5 2022 6 W 1 2 1
3744 36 8 Home 1 2 1 Leeds United 2.0 19.0 9.0 ... 42.9 Arsenal 2021 5 2022 6 W 1 2 1
3764 37 16 Away -2 0 2 Newcastle United 0.0 11.0 2.0 ... 45.0 Arsenal 2021 5 2022 0 L -2 0 2
3784 38 22 Home 4 5 1 Everton 5.0 25.0 8.0 ... 48.4 Arsenal 2021 5 2022 6 W 4 5 1

190 rows × 172 columns

Combine results for both home and away teams

i.e. combine team/opponent combos for a particular match. Will have different data for team/opponent

#collapse-output
matchesC=matches_.copy()


matchesC=matches_.merge(matches_, left_on = ["month",'year','weekday',"round","day",'season', "team"], \
                       right_on= ["month",'year','weekday',"round","day",'season', "opponent"])
matchesC
round day venue_x result_x gf_x ga_x opponent_x shooting_gls_x shooting_sh_x shooting_sot_x ... misc_og_y misc_recov_y misc_won_y misc_lost_y misc_won%_y team_y win_y netscore_y goalsfor_y goalsagainst_y
0 1 12 Away NaN NaN NaN Brighton and Hove Albion NaN NaN NaN ... NaN NaN NaN NaN NaN Brighton and Hove Albion NaN NaN NaN NaN
1 1 13 Home NaN NaN NaN West Ham United NaN NaN NaN ... NaN NaN NaN NaN NaN West Ham United NaN NaN NaN NaN
2 1 13 Away NaN NaN NaN Newcastle United NaN NaN NaN ... NaN NaN NaN NaN NaN Newcastle United NaN NaN NaN NaN
3 1 12 Away NaN NaN NaN Watford NaN NaN NaN ... NaN NaN NaN NaN NaN Watford NaN NaN NaN NaN
4 1 12 Home NaN NaN NaN Burnley NaN NaN NaN ... NaN NaN NaN NaN NaN Burnley NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3795 38 22 Away 0.333333 1.666667 1.333333 Arsenal 1.666667 9.333333 4.000000 ... 0.333333 74.333333 14.666667 16.666667 46.466667 Arsenal NaN 0.000000 1.333333 1.333333
3796 38 22 Away -1.666667 0.666667 2.333333 Brentford 0.666667 9.666667 2.333333 ... 0.000000 80.333333 15.333333 15.666667 48.333333 Brentford NaN 0.333333 2.000000 1.666667
3797 38 22 Home -0.666667 1.000000 1.666667 Newcastle United 1.000000 13.000000 4.333333 ... 0.000000 64.333333 20.000000 19.000000 48.766667 Newcastle United NaN -1.333333 0.666667 2.000000
3798 38 22 Away -2.000000 0.666667 2.666667 Chelsea 0.333333 10.666667 2.666667 ... 0.000000 88.000000 17.666667 13.666667 57.066667 Chelsea NaN -0.333333 1.000000 1.333333
3799 38 22 Home -2.000000 0.333333 2.333333 Tottenham Hotspur 0.333333 9.666667 2.333333 ... 0.000000 81.666667 23.666667 17.000000 57.633333 Tottenham Hotspur NaN 1.000000 1.666667 0.666667

3800 rows × 338 columns

Save the data

whatsave=1

if whatsave==0:
    matchesOut=matchesC.copy()
    matchesOut=matchesOut.drop(columns=['venue_x','venue_y','win_y','netscore_y','goalsfor_x','goalsfor_y',
                                        'goalsagainst_x','goalsagainst_y'])
    matchesOut.to_csv(folda+'epl2017-2021_wivnetscore_both-HA.csv')
elif whatsave==1:
    matchesOut=matchesC.copy()
    matchesOut=matchesOut.drop(columns=['venue_y','win_y','netscore_y','goalsfor_y','goalsagainst_y'])
    matchesOut.to_csv(folda+'epl2017-2021_wivnetscoreAndGFGA_both-HA.csv')
#    

# matchesC=matchesC.drop(columns=['opponent_x','opponent_y'])
# matchesC=matchesC.loc[matchesC['venue_x']=='Home']
matchesC[['venue_x','venue_y','win_y','netscore_y','goalsfor_x','goalsfor_y',
          'goalsagainst_x','goalsagainst_y']]
venue_x venue_y win_y netscore_y goalsfor_x goalsfor_y goalsagainst_x goalsagainst_y
0 Away Home NaN NaN NaN NaN NaN NaN
1 Home Away NaN NaN NaN NaN NaN NaN
2 Away Home NaN NaN NaN NaN NaN NaN
3 Away Home NaN NaN NaN NaN NaN NaN
4 Home Away NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ...
3795 Away Home NaN 0.000000 1.666667 1.333333 1.333333 1.333333
3796 Away Home NaN 0.333333 0.666667 2.000000 2.333333 1.666667
3797 Home Away NaN -1.333333 1.000000 0.666667 1.666667 2.000000
3798 Away Home NaN -0.333333 0.666667 1.000000 2.666667 1.333333
3799 Home Away NaN 1.000000 0.333333 1.666667 2.333333 0.666667

3800 rows × 8 columns

## SANITY CHECKS
teamName='Manchester United'
year=2017
mc=matchesC.loc[((  (matchesC.team_x==teamName) ))][['round','team_x','team_y','result_x','season']]

mraw=matches.loc[((  matches.team==teamName  ))][['round','team','opponent','result','season']]
plt.subplots(figsize=(15,5))
plt.plot( (mc['round']+(mc['season']-2017)*38)/38, mraw['result'] ,'.-')
plt.plot( (mc['round']+(mc['season']-2017)*38)/38, mc['result_x'] ,'.-')
plt.legend(['raw','running average']);
plt.grid(True)

mg = matchesC.groupby('team_x').mean()
mg
round day result_x gf_x ga_x shooting_gls_x shooting_sh_x shooting_sot_x shooting_sot%_x shooting_g/sh_x ... misc_pkwon_y misc_pkcon_y misc_og_y misc_recov_y misc_won_y misc_lost_y misc_won%_y netscore_y goalsfor_y goalsagainst_y
team_x
Arsenal 19.5 15.210526 0.421517 1.675485 1.253968 1.633157 13.111111 4.514109 35.496473 0.126499 ... 0.112589 0.136525 0.042553 91.012411 19.406915 19.757979 49.480762 -0.061170 1.287234 1.348404
Aston Villa 19.5 15.903509 -0.181416 1.286136 1.467552 1.233038 12.609145 4.261062 33.371829 0.092640 ... 0.102339 0.122807 0.052632 89.173977 17.751462 18.026316 49.695760 0.114035 1.388889 1.274854
Bournemouth 19.5 15.868421 -0.500000 1.218289 1.718289 1.191740 11.336283 3.691740 32.124926 0.095973 ... 0.093093 0.123123 0.030030 93.247748 20.250751 20.214715 50.193544 0.057057 1.409910 1.352853
Brentford 19.5 15.394737 -0.189189 1.234234 1.423423 1.198198 11.297297 3.549550 31.855405 0.092117 ... 0.114035 0.140351 0.061404 82.771930 16.850877 16.429825 51.121930 0.052632 1.394737 1.342105
Brighton and Hove Albion 19.5 16.052632 -0.388007 0.978836 1.366843 0.936508 11.276014 3.218695 30.280688 0.080811 ... 0.095238 0.127866 0.040564 89.408289 19.255732 19.731041 49.229630 0.059965 1.424162 1.364198
Burnley 19.5 16.168421 -0.375661 1.016755 1.392416 0.974427 10.093474 3.194004 31.874427 0.103527 ... 0.095238 0.128748 0.038801 88.843034 18.495591 18.074074 50.651764 -0.073192 1.338624 1.411817
Cardiff City 19.5 15.921053 -1.018018 0.828829 1.846847 0.819820 10.761261 2.842342 29.641441 0.074865 ... 0.078947 0.149123 0.026316 91.561404 19.140351 18.307018 51.198246 0.026316 1.412281 1.385965
Chelsea 19.5 15.963158 0.668430 1.728395 1.059965 1.691358 15.545855 5.257496 34.136684 0.103307 ... 0.121693 0.126984 0.052028 91.537919 20.075838 19.852734 50.077778 -0.105820 1.294533 1.400353
Crystal Palace 19.5 16.310526 -0.296296 1.142857 1.439153 1.100529 10.937390 3.427690 32.483686 0.100838 ... 0.109929 0.143617 0.028369 90.376773 19.329787 19.449468 49.848493 0.152482 1.443262 1.290780
Everton 19.5 14.710526 -0.194885 1.216931 1.411817 1.174603 11.298060 3.654321 32.695414 0.100979 ... 0.093972 0.126773 0.057624 89.239362 18.835993 18.802305 49.994415 -0.075355 1.358156 1.433511
Fulham 19.5 15.289474 -0.964444 0.802222 1.766667 0.775556 11.673333 3.460000 29.526667 0.058756 ... 0.118421 0.184211 0.043860 88.429825 18.258772 18.456140 49.566228 -0.188596 1.381579 1.570175
Huddersfield Town 19.5 16.026316 -1.062222 0.680000 1.742222 0.602222 10.044444 2.935556 30.048889 0.055511 ... 0.146667 0.142222 0.044444 91.108889 21.348889 21.073333 50.432222 -0.233333 1.282222 1.515556
Leeds United 19.5 16.078947 -0.404444 1.388889 1.793333 1.362222 13.184444 4.442222 35.603778 0.110378 ... 0.114035 0.162281 0.048246 84.732456 18.061404 17.293860 51.365789 0.008772 1.364035 1.355263
Leicester City 19.5 15.889474 0.212522 1.582892 1.370370 1.519400 12.416226 4.320106 35.845767 0.121587 ... 0.121693 0.111111 0.045855 88.839506 18.775132 19.029101 49.624868 -0.096120 1.342152 1.438272
Liverpool 19.5 16.289474 1.357143 2.211640 0.854497 2.158730 16.320106 5.810406 36.395767 0.129691 ... 0.096257 0.126560 0.048128 89.806595 19.374332 19.647950 49.776827 0.067736 1.356506 1.288770
Manchester City 19.5 15.947368 1.804233 2.544092 0.739859 2.474427 17.603175 6.336861 36.914815 0.140406 ... 0.134039 0.121693 0.046737 91.272487 19.769841 19.767196 50.053086 0.152557 1.456790 1.304233
Manchester United 19.5 16.047368 0.615520 1.758377 1.142857 1.726631 13.648148 5.078483 38.115608 0.119489 ... 0.097002 0.116402 0.040564 90.966490 19.932981 19.534392 50.678660 0.007937 1.358907 1.350970
Newcastle United 19.5 16.042105 -0.373016 1.088183 1.461199 1.054674 11.138448 3.601411 32.866755 0.095344 ... 0.119929 0.125220 0.047619 90.180776 18.980600 19.029982 49.676367 0.069665 1.417989 1.348325
Norwich City 19.5 15.973684 -1.411111 0.662222 2.073333 0.622222 10.364444 3.097778 30.114889 0.054400 ... 0.092105 0.144737 0.043860 89.100877 18.320175 18.162281 50.484211 -0.127193 1.271930 1.399123
Sheffield United 19.5 15.657895 -0.580000 0.782222 1.362222 0.728889 8.780000 2.657778 30.685556 0.083556 ... 0.129386 0.164474 0.039474 90.660088 16.978070 17.546053 49.354605 -0.109649 1.361842 1.471491
Southampton 19.5 16.231579 -0.466490 1.173721 1.640212 1.152557 12.388007 4.113757 34.837654 0.090088 ... 0.125220 0.121693 0.045855 89.746914 19.088183 18.665785 50.723016 0.035273 1.398589 1.363316
Stoke City 19.5 16.394737 -0.918919 0.878378 1.797297 0.851351 10.432432 3.184685 33.167117 0.087342 ... 0.081081 0.072072 0.009009 94.153153 21.887387 23.333333 48.231532 0.450450 1.522523 1.072072
Swansea City 19.5 16.868421 -0.720721 0.729730 1.450450 0.702703 8.189189 2.193694 28.216667 0.092162 ... 0.099099 0.081081 0.031532 95.824324 24.725225 22.824324 52.114414 0.216216 1.459459 1.243243
Tottenham Hotspur 19.5 15.394737 0.676367 1.771605 1.095238 1.675485 13.314815 4.734568 36.830423 0.129233 ... 0.093972 0.125887 0.035461 90.899823 19.563830 19.655142 49.633688 0.017730 1.366135 1.348404
Watford 19.5 16.184211 -0.620309 1.107064 1.727373 1.073951 11.065121 3.399558 31.508940 0.089536 ... 0.099338 0.116998 0.038631 90.599338 19.486755 19.296909 50.105740 -0.099338 1.304636 1.403974
West Bromwich Albion 19.5 16.421053 -0.817778 0.866667 1.684444 0.840000 9.457778 2.793333 31.112000 0.083800 ... 0.111111 0.151111 0.057778 89.895556 20.335556 20.268889 50.009333 0.017778 1.380000 1.362222
West Ham United 19.5 16.157895 -0.078483 1.416226 1.494709 1.389771 11.238095 3.870370 35.340388 0.120414 ... 0.125220 0.114638 0.047619 90.360670 19.217813 19.444444 49.579894 0.006173 1.402116 1.395944
Wolverhampton Wanderers 19.5 15.559211 -0.048565 1.134658 1.183223 1.081678 11.768212 3.706402 31.860706 0.091347 ... 0.118421 0.120614 0.067982 89.114035 17.844298 17.971491 49.912939 -0.085526 1.357456 1.442982

28 rows × 330 columns

X
round day result_x gf_x ga_x shooting_gls_x shooting_sh_x shooting_sot_x shooting_sot%_x shooting_g/sh_x ... misc_pkwon_y misc_pkcon_y misc_og_y misc_recov_y misc_won_y misc_lost_y misc_won%_y netscore_y goalsfor_y goalsagainst_y
round 1.000000 -0.101106 -0.000427 -0.032829 -0.033619 -0.031728 -0.033182 -0.031253 -0.012582 -0.005572 ... -0.026473 -0.039209 -0.016878 0.053467 0.027972 0.028820 0.000147 -0.000427 -0.032829 -0.033619
day -0.101106 1.000000 0.001526 0.012942 0.011046 0.017277 0.018508 0.013796 0.011117 0.013813 ... 0.017057 -0.014209 -0.019603 0.000052 0.011315 0.008934 0.007511 0.001526 0.012942 0.011046
result_x -0.000427 0.001526 1.000000 0.804871 -0.784386 0.796159 0.489624 0.598669 0.302810 0.553999 ... -0.008706 -0.020023 0.015524 0.000158 -0.003279 -0.004667 0.006454 0.020826 0.017049 -0.016039
gf_x -0.032829 0.012942 0.804871 1.000000 -0.263228 0.988970 0.489024 0.682367 0.418380 0.724261 ... 0.010279 -0.026590 0.016782 0.015622 0.005549 0.006301 0.001188 0.017049 0.011392 -0.015809
ga_x -0.033619 0.011046 -0.784386 -0.263228 1.000000 -0.260595 -0.284823 -0.260010 -0.054968 -0.143603 ... 0.024876 0.004775 -0.007703 0.016053 0.011122 0.014161 -0.009246 -0.016039 -0.015809 0.009557
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
misc_lost_y 0.028820 0.008934 -0.004667 0.006301 0.014161 0.007396 0.001735 0.012405 0.009634 0.014438 ... -0.119425 -0.028281 0.011898 0.373947 0.694807 1.000000 -0.398785 -0.222763 -0.280020 0.069455
misc_won%_y 0.000147 0.007511 0.006454 0.001188 -0.009246 -0.000937 -0.002138 0.005459 0.010180 0.009635 ... 0.058702 -0.067979 -0.029776 0.040269 0.327227 -0.398785 1.000000 0.202235 0.155459 -0.166274
netscore_y -0.000427 0.001526 0.020826 0.017049 -0.016039 0.018878 -0.000026 0.013563 0.023359 0.019314 ... 0.189385 -0.197948 -0.164797 0.084513 -0.111881 -0.222763 0.202235 1.000000 0.804871 -0.784386
goalsfor_y -0.032829 0.012942 0.017049 0.011392 -0.015809 0.012763 0.007157 0.006324 0.017994 0.014789 ... 0.253819 -0.045267 -0.062638 0.016541 -0.197868 -0.280020 0.155459 0.804871 1.000000 -0.263228
goalsagainst_y -0.033619 0.011046 -0.016039 -0.015809 0.009557 -0.017349 0.007515 -0.015435 -0.019168 -0.015943 ... -0.042581 0.274478 0.202431 -0.120098 -0.024934 0.069455 -0.166274 -0.784386 -0.263228 1.000000

330 rows × 330 columns

plt.plot(mg.netscore_x,mg.shooting_sh_x,'ok')

# plt.plot(matchesC.gf_x,matchesC.NetScore_x,'ok')

X=matchesC.corr()
corrnetscore=X.sort_values(by="netscore_x").reset_index()
corrnetscore=corrnetscore.rename(columns={'index':'category'})
corrnetscore
category round day result_x gf_x ga_x shooting_gls_x shooting_sh_x shooting_sot_x shooting_sot%_x ... misc_pkwon_y misc_pkcon_y misc_og_y misc_recov_y misc_won_y misc_lost_y misc_won%_y netscore_y goalsfor_y goalsagainst_y
0 ga_x -0.033619 0.011046 -0.784386 -0.263228 1.000000 -0.260595 -0.284823 -0.260010 -0.054968 ... 0.024876 0.004775 -0.007703 0.016053 0.011122 0.014161 -0.009246 -0.016039 -0.015809 0.009557
1 goalsagainst_x -0.033619 0.011046 -0.784386 -0.263228 1.000000 -0.260595 -0.284823 -0.260010 -0.054968 ... 0.024876 0.004775 -0.007703 0.016053 0.011122 0.014161 -0.009246 -0.016039 -0.015809 0.009557
2 keeper_psxg_x -0.020151 0.004270 -0.689039 -0.288664 0.818415 -0.285579 -0.342578 -0.292562 -0.040214 ... 0.016138 0.005909 0.004696 0.022612 0.008525 0.021666 -0.019468 -0.026014 -0.025878 0.015252
3 keeper_sota_x -0.027422 0.009134 -0.585212 -0.306845 0.630628 -0.303735 -0.390630 -0.326298 -0.029044 ... -0.003544 0.019901 -0.012280 0.008879 0.015911 0.029753 -0.018988 -0.016977 -0.017976 0.008819
4 passingtypes_PC_dead_x -0.020593 0.025862 -0.428442 -0.400000 0.278412 -0.400698 -0.520973 -0.469203 -0.092371 ... -0.022421 0.000464 -0.011870 -0.003227 0.015526 0.010683 0.004033 -0.001923 -0.001384 0.001681
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
325 shooting_gls_x -0.031728 0.017277 0.796159 0.988970 -0.260595 1.000000 0.489307 0.687750 0.424698 ... 0.008137 -0.028288 0.015246 0.016248 0.005333 0.007396 -0.000937 0.018878 0.012763 -0.017349
326 gf_x -0.032829 0.012942 0.804871 1.000000 -0.263228 0.988970 0.489024 0.682367 0.418380 ... 0.010279 -0.026590 0.016782 0.015622 0.005549 0.006301 0.001188 0.017049 0.011392 -0.015809
327 goalsfor_x -0.032829 0.012942 0.804871 1.000000 -0.263228 0.988970 0.489024 0.682367 0.418380 ... 0.010279 -0.026590 0.016782 0.015622 0.005549 0.006301 0.001188 0.017049 0.011392 -0.015809
328 result_x -0.000427 0.001526 1.000000 0.804871 -0.784386 0.796159 0.489624 0.598669 0.302810 ... -0.008706 -0.020023 0.015524 0.000158 -0.003279 -0.004667 0.006454 0.020826 0.017049 -0.016039
329 netscore_x -0.000427 0.001526 1.000000 0.804871 -0.784386 0.796159 0.489624 0.598669 0.302810 ... -0.008706 -0.020023 0.015524 0.000158 -0.003279 -0.004667 0.006454 0.020826 0.017049 -0.016039

330 rows × 331 columns