Outline

The purpose of this notebook is to calculate zone entry and exit-related data for tracked games, and generate TOI and shot-related data for the same games.

Here's what we'll do:

Download NZ data (not included)
Aggregate NZ data
Augment with PBP/TOI data as needed
Calculate metrics at team, individual, line, and pair levels, for and against. For example:
- Entries per 60
- Failed entries per 60
- Controlled entries per 60
- Failed entries per 60
- Controlled entry%
- Failed entry%
- Shots per entry
- Controlled exits per 60



In [1]:

    
from os import listdir, chdir, getcwd
import pandas as pd
from pylab import *
from tqdm import tqdm # progress bar
%matplotlib inline
current_wd = getcwd()

Aggregate NZ data

They're in a bunch of separate .xlsx files. We just aggregate them together.

Each excel file has three sheets we want:

Shot Data
Raw Entries
Zone Exits Raw Data

First, let's copy everything to csv, which will make for faster file read-in later on.



In [2]:

    
# Want to combine all files
folders = ['/Users/muneebalam/Downloads/Game Reports 1718/', 
           #'/Users/muneebalam/Downloads/Passing Game Archive 1718/',
           '/Users/muneebalam/Downloads/Game Reports 1617/']
sheets = {'shots': 'Shot Data', 'entries': 'Raw Entries', 'exits': 'Zone Exits Raw Data'}

copy = False

if copy:
    for folder in folders:
        chdir(folder)
        files = listdir()
        files = [f for f in files if f[-5:] == '.xlsx']
        for file in tqdm(files, desc='Converting to csv'):
            xl = pd.ExcelFile(file)
            sheetnames = xl.sheet_names
            for s in sheetnames:
                df = xl.parse(s)
                fout = '{0:s}_{1:s}.csv'.format(file[:-5], s)
                df.to_csv(fout, index=False)
        print('Done with', folder)

Now, let's check--do the sheets we're interested in (listed in cell above) contain the same columns?



In [3]:

    
colnames = {}
for skey, sval in sheets.items():
    colnames[sval] = {}
    for folder in folders:
        chdir(folder)
        files = listdir()
        files = [f for f in files if f[f.rfind('_')+1:-4] == sval]
        for file in tqdm(files, desc='Reading files'):
            try:
                cnames = pd.read_csv(file).columns
            except Exception as e:
                print(skey, sval, file, e, e.args)
                continue
                
            cnames = tuple(sorted(cnames))
            if cnames not in colnames[sval]:
                colnames[sval][cnames] = set()
            colnames[sval][cnames].add('{0:s}-{1:s}'.format(folder[-5:-1], file[:5]))  # Season and game number
        print('Done with', skey, folder)









    



Reading files: 100%|██████████| 220/220 [00:01<00:00, 147.72it/s]
Reading files:   6%|▌         | 18/301 [00:00<00:01, 171.62it/s]





    



Done with shots /Users/muneebalam/Downloads/Game Reports 1718/






    



Reading files: 100%|██████████| 301/301 [00:01<00:00, 184.26it/s]
Reading files:   8%|▊         | 17/220 [00:00<00:01, 167.05it/s]





    



Done with shots /Users/muneebalam/Downloads/Game Reports 1617/






    



Reading files: 100%|██████████| 220/220 [00:01<00:00, 192.22it/s]
Reading files:   7%|▋         | 20/301 [00:00<00:01, 197.02it/s]





    



Done with entries /Users/muneebalam/Downloads/Game Reports 1718/






    



Reading files: 100%|██████████| 301/301 [00:01<00:00, 200.99it/s]
Reading files:  13%|█▎        | 29/220 [00:00<00:00, 279.97it/s]





    



Done with entries /Users/muneebalam/Downloads/Game Reports 1617/






    



Reading files: 100%|██████████| 220/220 [00:00<00:00, 272.17it/s]
Reading files:   9%|▉         | 27/301 [00:00<00:01, 267.12it/s]





    



exits Zone Exits Raw Data 20330 Dallas at Colorado_Zone Exits Raw Data.csv No columns to parse from file ('No columns to parse from file',)
Done with exits /Users/muneebalam/Downloads/Game Reports 1718/






    



Reading files: 100%|██████████| 301/301 [00:01<00:00, 251.42it/s]





    



Done with exits /Users/muneebalam/Downloads/Game Reports 1617/



In [4]:

    
def intersect(*sets):
    if len(sets) == 0:
        return set()
    if len(sets) == 1:
        return sets[0]
    if len(sets) == 2:
        return set(sets[0]) & set(sets[1])
    return set(sets[0]) & intersect(*tuple(sets[1:]))
        
for sval in colnames:
    # Figure out column name frequency
    colcount = {}
    for clist in colnames[sval].keys():
        for c in clist:
            if c not in colcount:
                colcount[c] = 0
            colcount[c] += len(colnames[sval][clist])
    colcount = [(k, v) for k, v in colcount.items()]
    colcount = sorted(colcount, key = lambda x: x[1], reverse=True)
    print(sval)
    for k, v in colcount:
        # Only ones with more than 200, for brevity's sake
        if v >= 200:
            print(k, v)
    print('')









    



Shot Data
A1 517
A2 517
A3 517
Away Goalies 517
Away:  517
Date 517
Game ID 517
Goalie 517
Home Goalies 517
Home: 517
SOG? 517
Strength 517
Team 517
Time 517
PBP Time 516
Period 516
Unnamed: 28 513
Unnamed: 29 478
Screen 462
A1 Zone 297
A2 Zone 297
A3 Zone 297
G? 297
Oddman 297
RB/2C 297
RB/2C SOG? 297
SC? 297
Shot Type? 297
RB/2C G? 296
Home Score State 295
Shooter 295
Player 221
Chance? 220
Goal? 220
Goal?.1 220
Oddman? 220
P1 Z 220
P2 Z 220
P3 Z 220
Rebound? 220
SOG 220
State 220
Type 220

Raw Entries
Controlled? 516
Defended by 516
Dump recovered? 516
Entry by 516
Entry type 516
Fail 516
Fen total 516
Game 516
Goal total 516
Goalie touch? 516
Location 516
Middle driver 516
Opp 516
Opp strength 516
Team strength 516
Time 516
Period 515

Zone Exits Raw Data
Attempt 515
Direction 515
Entry? 515
Pass Target 515
Period 515
Pressured? 515
Result 515
Time 515

They look like they're in decent shape--there's a lot of overlap. I'll keep items with at least 515 occurrences.



In [5]:

    
cols_to_keep = {}
for sval in colnames:
    # Figure out column name frequency
    colcount = {}
    for clist in colnames[sval].keys():
        for c in clist:
            if c not in colcount:
                colcount[c] = 0
            colcount[c] += len(colnames[sval][clist])
    cols_to_keep[sval] = [k for k, v in colcount.items() if v >= 515]
print(cols_to_keep)









    



{'Shot Data': ['A1', 'A2', 'A3', 'Away Goalies', 'Away: ', 'Date', 'Game ID', 'Goalie', 'Home Goalies', 'Home:', 'PBP Time', 'Period', 'SOG?', 'Strength', 'Team', 'Time'], 'Raw Entries': ['Controlled?', 'Defended by', 'Dump recovered?', 'Entry by', 'Entry type', 'Fail', 'Fen total', 'Game', 'Goal total', 'Goalie touch?', 'Location', 'Middle driver', 'Opp', 'Opp strength', 'Period', 'Team strength', 'Time'], 'Zone Exits Raw Data': ['Attempt', 'Direction', 'Entry?', 'Pass Target', 'Period', 'Pressured?', 'Result', 'Time']}

So now, we'll combine.



In [6]:

    
dfs = {k: [] for k in sheets.keys()}

generate = False

for skey, sval in sheets.items():
    fout = skey + ' combined.csv'
    
    if generate:
        for folder in folders:
            chdir(folder)
            files = listdir()
            files = [f for f in files if f[f.rfind('_')+1:-4] == sval]
            for file in tqdm(files, desc='Reading files'):
                try:
                    df = pd.read_csv(file)

                    # Exclude columns I don't want
                    cols = set(df.columns) - set(cols_to_keep[sval])
                    df = df.drop(cols, axis=1, errors='ignore')
                    df = df.assign(Season=2000 + int(folder[-5:-3]), Game=int(file[:5]))
                    dfs[skey].append(df)
                except Exception as e:
                    print(skey, sval, file, e, e.args)
                    continue

            print('Done with', skey, folder)
        dfs[skey] = pd.concat(dfs[skey])
        dfs[skey].to_csv(fout, index=False)
    dfs[skey] = pd.read_csv(fout)
chdir(current_wd)
print('Done aggregating and reading files')









    



/Users/muneebalam/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2717: DtypeWarning: Columns (7,9) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)






    



Done aggregating and reading files






    



/Users/muneebalam/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2717: DtypeWarning: Columns (13,16) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)
/Users/muneebalam/anaconda/lib/python3.6/site-packages/IPython/core/interactiveshell.py:2717: DtypeWarning: Columns (0) have mixed types. Specify dtype option on import or set low_memory=False.
  interactivity=interactivity, compiler=compiler, result=result)

Here's what the dataframes look like:



In [7]:

    
dfs['shots'].head()









    Out[7]:







  
    
      
      A1
      A2
      A3
      Away Goalies
      Away:
      Date
      Game
      Game ID
      Goalie
      Home Goalies
      Home:
      PBP Time
      Period
      SOG?
      Season
      Strength
      Team
      Time
    
  
  
    
      0
      16NYI
      13NYI
      NYI
      35.0
      PHI
      2017-01-15
      20279
      20279
      33.0
      31
      WSH
      00:53:00
      1.0
      NaN
      2017
      5v5
      NYI
      19:23:00
    
    
      1
      NYI
      NYI
      NYI
      30.0
      PHI
      2017-01-15
      20279
      20279
      33.0
      70
      WSH
      02:10:00
      1.0
      NaN
      2017
      5v5
      NYI
      19:16:00
    
    
      2
      CAR
      CAR
      CAR
      NaN
      PHI
      2017-01-15
      20279
      20279
      1.0
      NaN
      WSH
      02:15:00
      1.0
      NaN
      2017
      5v5
      CAR
      18:31:00
    
    
      3
      66NYI
      72NYI
      NYI
      NaN
      PHI
      2017-01-15
      20279
      20279
      33.0
      NaN
      WSH
      02:17:00
      1.0
      NaN
      2017
      5v5
      NYI
      18:21:00
    
    
      4
      25NYI
      66NYI
      NYI
      NaN
      PHI
      2017-01-15
      20279
      20279
      33.0
      NaN
      WSH
      02:18:00
      1.0
      NaN
      2017
      5v5
      NYI
      17:51:00



In [8]:

    
dfs['entries'].head()









    Out[8]:







  
    
      
      Controlled?
      Defended by
      Dump recovered?
      Entry by
      Entry type
      Fail
      Fen total
      Game
      Goal total
      Goalie touch?
      Location
      Middle driver
      Opp
      Opp strength
      Period
      Season
      Team strength
      Time
    
  
  
    
      0
      55WPG
      N
      N
      55WPG
      C
      NaN
      0.0
      20193
      0.0
      R
      Home
      N
      DAL
      5.0
      1.0
      2017
      5
      19:45:00
    
    
      1
      NaN
      44WPG
      N
      3DAL
      F
      3DAL
      0.0
      20193
      0.0
      R
      Home
      NaN
      DAL
      5.0
      1.0
      2017
      5
      19:33:00
    
    
      2
      NaN
      N
      N
      13WPG
      D
      NaN
      0.0
      20193
      0.0
      R
      Home
      NaN
      DAL
      5.0
      1.0
      2017
      5
      19:12:00
    
    
      3
      NaN
      N
      N
      12DAL
      D
      NaN
      0.0
      20193
      0.0
      R
      Home
      NaN
      DAL
      5.0
      1.0
      2017
      5
      19:02:00
    
    
      4
      NaN
      N
      16WPG
      9WPG
      D
      NaN
      1.0
      20193
      0.0
      R
      Home
      NaN
      DAL
      5.0
      1.0
      2017
      5
      18:56:00



In [9]:

    
dfs['exits'].head()









    Out[9]:







  
    
      
      Period
      Time
      Attempt
      Result
      Pressured?
      Pass Target
      Entry?
      Direction
      Game
      Season
    
  
  
    
      0
      1
      19:19:00
      25EDM
      P
      N
      93EDM
      Y
      L
      20198
      2017
    
    
      1
      1
      18:33:00
      63NJ
      M
      N
      N
      N
      R
      20198
      2017
    
    
      2
      1
      18:25:00
      14NJ
      M
      N
      11NJ
      N
      L
      20198
      2017
    
    
      3
      1
      18:05:00
      25NJ
      P
      N
      11NJ
      Y
      L
      20198
      2017
    
    
      4
      1
      18:00:00
      55EDM
      C
      N
      N
      Y
      C
      20198
      2017

Augment with PBP/TOI

Let's look at how complete the shot data is. I'll take a 10% sample of Caps games and compare shot counts in the tracked data with the PBP.

First, the games selected:



In [10]:

    
np.random.seed(8)  # Obv, for shots, pick Ovechkin
wsh_games = dfs['shots'].query('Team == "WSH"')[['Season', 'Game']].drop_duplicates().sort_values(['Season', 'Game'])
wsh_games.loc[:, 'RandomNum'] = np.random.randint(low=0, high=100, size=len(wsh_games))
wsh_games.loc[:, 'Selected'] = wsh_games.RandomNum.apply(lambda x: x <= 10)
wsh_games = wsh_games.query('Selected == True').drop('RandomNum', axis=1)
wsh_games.set_index(['Season', 'Game'])  # Just for viewing purposes

Here are the shot counts from the tracked data:



In [11]:

    
wsh_shots = dfs['shots'].merge(wsh_games[['Season', 'Game']], how='inner', on=['Season', 'Game']) \
    .query('Strength == "5v5"')
wsh_shots.loc[:, 'Team'] = wsh_shots.Team.apply(lambda x: x if x == 'WSH' else 'Opp')
wsh_shots = wsh_shots[['Season', 'Game', 'Team']].assign(Count=1) \
    .groupby(['Season', 'Game', 'Team'], as_index=False) \
    .count()
wsh_shots = wsh_shots.pivot_table(index=['Season', 'Game'], columns='Team', values='Count')
wsh_shots

Now, let's pull shot counts for those games from our scraped data.



In [12]:

    
from scrapenhl2.scrape import teams
df1 = teams.get_team_pbp(2016, 'WSH').assign(Season=2016)
df2 = teams.get_team_pbp(2017, 'WSH').assign(Season=2017)
df3 = pd.concat([df1, df2]).merge(wsh_games[['Season', 'Game']], how='inner', on=['Season', 'Game'])

# Go to 5v5 only
from scrapenhl2.manipulate import manipulate as manip
df3 = manip.filter_for_five_on_five(manip.filter_for_corsi(df3))

counts = df3[['Season', 'Game', 'Team']]
counts.loc[:, 'Team'] = counts.Team.apply(lambda x: 'WSH' if x == 15 else 'Opp')

counts = counts.assign(Count=1) \
    .groupby(['Season', 'Game', 'Team'], as_index=False) \
    .count()
counts = counts.pivot_table(index=['Season', 'Game'], columns='Team', values='Count')
counts









    



/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:517: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s






    Out[12]:







  
    
      
      Team
      Opp
      WSH
    
    
      Season
      Game
      
      
    
  
  
    
      2016
      20022
      40
      42
    
    
      20071
      40
      43
    
    
      20860
      43
      31
    
    
      2017
      20047
      43
      44
    
    
      20555
      42
      52

The counts are pretty close, so we'll use the tracked data.

We still need TOI, though. I'll pull a second-by-second log into a dataframe.



In [13]:

    
from scrapenhl2.scrape import players, team_info

dfs['toi'] = {}

team_convert = {'LA': 'LAK', 'NJ': 'NJD', 'TB': 'TBL', 'SJ': 'SJS', 
                'L.A': 'LAK', 'N.J': 'NJD', 'T.B': 'TBL', 'S.J': 'SJS'}
for team in dfs['shots'].Team.unique():
    if team in team_convert:
        team = team_convert[team]
    if not isinstance(team, str) or len(team) != 3:
        #print('Skipping', team)
        continue
            
    toi = []
    for season in range(2016, 2018):
        try:
            toi.append(teams.get_team_toi(season, team).assign(Season=season))
        except Exception as e:
            print('Could not read', team, e, e.args)
            
    toi = pd.concat(toi)
    
    # Filter for appropriate games using an inner join, and filter for 5v5
    toi = toi.merge(dfs['shots'][['Season', 'Game']].drop_duplicates(), 
                    how='inner', on=['Season', 'Game']) 
    toi = manip.filter_for_five_on_five(toi)
    
    # Get only certain columns
    toi = toi[['Season', 'Game', 'Time', 'Team1', 'Team2', 'Team3', 'Team4', 'Team5']]
    renaming = {'Team{0:d}'.format(i): '{0:s}{1:d}'.format(team, i) for i in range(1, 6)}
    toi = toi.rename(columns=renaming)
    
    # Convert to player names
    for col in toi.columns[3:]:
        toi.loc[:, col] = toi[col].apply(lambda x: players.player_as_str(x))
    dfs['toi'][team] = toi

dfs['toi']['WSH'].head()









    



Could not read VGK IOError: Failed to open file: /Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/data/teams/toi/2016/VGK.feather ('IOError: Failed to open file: /Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/data/teams/toi/2016/VGK.feather',)






    Out[13]:







  
    
      
      Season
      Game
      Time
      WSH1
      WSH2
      WSH3
      WSH4
      WSH5
    
  
  
    
      0
      2016
      20007
      1
      Matt Niskanen
      Karl Alzner
      Evgeny Kuznetsov
      Alex Ovechkin
      T.J. Oshie
    
    
      1
      2016
      20007
      2
      Matt Niskanen
      Karl Alzner
      Evgeny Kuznetsov
      Alex Ovechkin
      T.J. Oshie
    
    
      2
      2016
      20007
      3
      Matt Niskanen
      Karl Alzner
      Evgeny Kuznetsov
      Alex Ovechkin
      T.J. Oshie
    
    
      3
      2016
      20007
      4
      Matt Niskanen
      Karl Alzner
      Evgeny Kuznetsov
      Alex Ovechkin
      T.J. Oshie
    
    
      4
      2016
      20007
      5
      Matt Niskanen
      Karl Alzner
      Evgeny Kuznetsov
      Alex Ovechkin
      T.J. Oshie

One final item: let's filter the entries, exits, and shots dataframes to 5v5 using this TOI data.



In [14]:

    
fives = {}

from scrapenhl2.manipulate import add_onice_players as onice
dfs['shots'].loc[:, 'Time2'] = dfs['shots'].Time.str.extract(r'(\d{1,2}:\d{1,2}):\d{1,2}$')
dfs['shots'] = onice.add_times_to_file(dfs['shots'].dropna(subset=['Time']),
                                       periodcol='Period', timecol='Time2', time_format='remaining')
fives['shots'] = dfs['shots']
for team, toi in dfs['toi'].items():
    toi = toi.rename(columns={'Time': '_Secs'})
    fives['shots'] = fives['shots'].merge(toi, how='left', on=['Season', 'Game', '_Secs'])

fives['shots'].head()









    



/Users/muneebalam/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:4: FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame) but in a future version of pandas this will be changed to expand=True (return DataFrame)
  after removing the cwd from sys.path.
/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:517: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:337: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)






    Out[14]:







  
    
      
      A1
      A2
      A3
      Away Goalies
      Away:
      Date
      Game
      Game ID
      Goalie
      Home Goalies
      ...
      VGK1
      VGK2
      VGK3
      VGK4
      VGK5
      VAN1
      VAN2
      VAN3
      VAN4
      VAN5
    
  
  
    
      0
      16NYI
      13NYI
      NYI
      35.0
      PHI
      2017-01-15
      20279
      20279
      33.0
      31
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      NYI
      NYI
      NYI
      30.0
      PHI
      2017-01-15
      20279
      20279
      33.0
      70
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2
      CAR
      CAR
      CAR
      NaN
      PHI
      2017-01-15
      20279
      20279
      1.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3
      66NYI
      72NYI
      NYI
      NaN
      PHI
      2017-01-15
      20279
      20279
      33.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      25NYI
      66NYI
      NYI
      NaN
      PHI
      2017-01-15
      20279
      20279
      33.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

5 rows × 175 columns



In [15]:

    
dfs['entries'].loc[:, 'Time2'] = dfs['entries'].Time.str.extract(r'(\d{1,2}:\d{1,2}):\d{1,2}$')
dfs['entries'] = onice.add_times_to_file(dfs['entries'].dropna(subset=['Time']),
                                         periodcol='Period', timecol='Time2', time_format='remaining')

fives['entries'] = dfs['entries']
for team, toi in dfs['toi'].items():
    toi = toi.rename(columns={'Time': '_Secs'})
    fives['entries'] = fives['entries'].merge(toi, how='left', on=['Season', 'Game', '_Secs'])

fives['entries'].head()









    



/Users/muneebalam/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame) but in a future version of pandas this will be changed to expand=True (return DataFrame)
  """Entry point for launching an IPython kernel.
/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:517: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:337: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)






    Out[15]:







  
    
      
      Controlled?
      Defended by
      Dump recovered?
      Entry by
      Entry type
      Fail
      Fen total
      Game
      Goal total
      Goalie touch?
      ...
      VGK1
      VGK2
      VGK3
      VGK4
      VGK5
      VAN1
      VAN2
      VAN3
      VAN4
      VAN5
    
  
  
    
      0
      55WPG
      N
      N
      55WPG
      C
      NaN
      0.0
      20193
      0.0
      R
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      NaN
      44WPG
      N
      3DAL
      F
      3DAL
      0.0
      20193
      0.0
      R
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2
      NaN
      N
      N
      13WPG
      D
      NaN
      0.0
      20193
      0.0
      R
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3
      NaN
      N
      N
      12DAL
      D
      NaN
      0.0
      20193
      0.0
      R
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      NaN
      N
      16WPG
      9WPG
      D
      NaN
      1.0
      20193
      0.0
      R
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

5 rows × 175 columns



In [16]:

    
dfs['exits'].loc[:, 'Time2'] = dfs['exits'].Time.str.extract(r'(\d{1,2}:\d{1,2}):\d{1,2}$')
dfs['exits'] = onice.add_times_to_file(dfs['exits'].dropna(subset=['Time']),
                                       periodcol='Period', timecol='Time2', time_format='remaining')

fives['exits'] = dfs['exits']
for team, toi in dfs['toi'].items():
    toi = toi.rename(columns={'Time': '_Secs'})
    fives['exits'] = fives['exits'].merge(toi, how='left', on=['Season', 'Game', '_Secs'])

fives['exits'].head()









    



/Users/muneebalam/anaconda/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame) but in a future version of pandas this will be changed to expand=True (return DataFrame)
  """Entry point for launching an IPython kernel.
/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:517: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s
/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:337: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)






    Out[16]:







  
    
      
      Period
      Time
      Attempt
      Result
      Pressured?
      Pass Target
      Entry?
      Direction
      Game
      Season
      ...
      VGK1
      VGK2
      VGK3
      VGK4
      VGK5
      VAN1
      VAN2
      VAN3
      VAN4
      VAN5
    
  
  
    
      0
      1
      19:19:00
      25EDM
      P
      N
      93EDM
      Y
      L
      20198
      2017
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      1
      1
      18:33:00
      63NJ
      M
      N
      N
      N
      R
      20198
      2017
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      2
      1
      18:25:00
      14NJ
      M
      N
      11NJ
      N
      L
      20198
      2017
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      3
      1
      18:05:00
      25NJ
      P
      N
      11NJ
      Y
      L
      20198
      2017
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4
      1
      18:00:00
      55EDM
      C
      N
      N
      Y
      C
      20198
      2017
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

5 rows × 167 columns

Let's look at games included by team.



In [17]:

    
teamcolnames = [x for x in fives['shots'].columns if x == x.upper() and len(x) == 4 and x[-1] == '1']
gps = []
for teamcol in teamcolnames:
    gp = len(dfs['toi'][teamcol[:-1]][['Season', 'Game']].drop_duplicates())
    print(teamcol[:3], gp)
    gps.append(gp)
print('Total', sum(gps))









    



NYI 46
CAR 28
BOS 28
LAK 25
MIN 32
MTL 36
NYR 52
TOR 35
PIT 37
EDM 34
ANA 29
CBJ 43
FLA 28
NJD 30
STL 25
TBL 39
COL 29
SJS 27
ARI 21
PHI 99
DET 27
NSH 29
WPG 28
CHI 23
WSH 34
CGY 39
OTT 27
BUF 25
DAL 28
VGK 12
VAN 21
Total 1016

Let's reduce this to just Caps games now.



In [18]:

    
wsh = {key: fives[key].dropna(subset=['WSH1']) for key in ['shots', 'entries', 'exits']}
wsh['shots'].head()









    Out[18]:







  
    
      
      A1
      A2
      A3
      Away Goalies
      Away:
      Date
      Game
      Game ID
      Goalie
      Home Goalies
      ...
      VGK1
      VGK2
      VGK3
      VGK4
      VGK5
      VAN1
      VAN2
      VAN3
      VAN4
      VAN5
    
  
  
    
      4143
      51TOR
      TOR
      TOR
      35.0
      PHI
      2017-01-15
      20084
      20084
      70.0
      31
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4144
      TOR
      TOR
      TOR
      30.0
      PHI
      2017-01-15
      20084
      20084
      70.0
      70
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4145
      34TOR
      TOR
      TOR
      NaN
      PHI
      2017-01-15
      20084
      20084
      70.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4146
      19WSH
      77WSH
      74WSH
      NaN
      PHI
      2017-01-15
      20084
      20084
      31.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4147
      12TOR
      43TOR
      TOR
      NaN
      PHI
      2017-01-15
      20084
      20084
      70.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

5 rows × 175 columns

Let's convert numbers to names. I don't have a lookup table handy, so I'll do it by hand.



In [19]:

    
wsh['shots'][['WSH1', 'WSH2', 'WSH3', 'WSH4', 'WSH5']] \
    .melt(var_name='P', value_name='Name') \
    ['Name'].value_counts()









    Out[19]:





Dmitry Orlov           1532
John Carlson           1450
Matt Niskanen          1344
Alex Ovechkin          1294
Brooks Orpik           1293
Evgeny Kuznetsov       1214
Nicklas Backstrom      1181
T.J. Oshie             1084
Lars Eller             1023
Karl Alzner             957
Brett Connolly          865
Jay Beagle              839
Andre Burakovsky        798
Tom Wilson              775
Marcus Johansson        685
Justin Williams         656
Nate Schmidt            585
Jakub Vrana             522
Daniel Winnik           430
Christian Djoos         376
Madison Bowey           323
Devante Smith-Pelly     307
Alex Chiasson           257
Zach Sanford            217
Chandler Stephenson     184
Taylor Chorney          174
Aaron Ness              152
Kevin Shattenkirk       106
Nathan Walker            59
Tyler Graovac            38
Name: Name, dtype: int64



In [20]:

    
wsh_players = {'Alex Ovechkin': 8, 'Nicklas Backstrom': 19, 'Andre Burakovsky': 65,
              'T.J. Oshie': 77, 'John Carlson': 74, 'Evgeny Kuznetsov': 92,
              'Dmitry Orlov': 9, 'Christian Djoos': 29, 'Devante Smith-Pelly': 25,
              'Jay Beagle': 83, 'Brooks Orpik': 44, 'Chandler Stephenson': 18,
              'Jakub Vrana': 13, 'Tom Wilson': 43, 'Lars Eller': 20,
              'Alex Chiasson': 39, 'Brett Connolly': 10, 'Madison Bowey': 22, 'Kevin Shattenkirk': 22,
              'Tyler Graovac': 91, '': 36, 'Matt Niskanen': 2, 
              'Aaron Ness': 55, 'Nathan Walker': 79, 'Taylor Chorney': 4,
              'Zach Sanford': 12, 'Karl Alzner': 27, 'Marcus Johansson': 90, 
               'Zachary Sanford': 12,
               'Justin Williams': 14, 'Daniel Winnik': 26, 'Nate Schmidt': 88}

for col in ['WSH1', 'WSH2', 'WSH3', 'WSH4', 'WSH5']:
    for key in ['shots', 'entries', 'exits']:
        wsh[key].loc[:, col] = wsh[key][col].apply(lambda x: str(wsh_players[x]) + 'WSH' if x in wsh_players else x)
wsh['shots'].head()









    



/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:517: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s






    Out[20]:







  
    
      
      A1
      A2
      A3
      Away Goalies
      Away:
      Date
      Game
      Game ID
      Goalie
      Home Goalies
      ...
      VGK1
      VGK2
      VGK3
      VGK4
      VGK5
      VAN1
      VAN2
      VAN3
      VAN4
      VAN5
    
  
  
    
      4143
      51TOR
      TOR
      TOR
      35.0
      PHI
      2017-01-15
      20084
      20084
      70.0
      31
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4144
      TOR
      TOR
      TOR
      30.0
      PHI
      2017-01-15
      20084
      20084
      70.0
      70
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4145
      34TOR
      TOR
      TOR
      NaN
      PHI
      2017-01-15
      20084
      20084
      70.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4146
      19WSH
      77WSH
      74WSH
      NaN
      PHI
      2017-01-15
      20084
      20084
      31.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
    
      4147
      12TOR
      43TOR
      TOR
      NaN
      PHI
      2017-01-15
      20084
      20084
      70.0
      NaN
      ...
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
      NaN
    
  

5 rows × 175 columns

Entries per 60



In [21]:

    
# Team comparison

# Drop extra team cols
allteamcols = [x for x in fives['entries'].columns if not x.upper() == x]
allteams = fives['entries'][allteamcols]

# Remove fails and faceoffs
allteams = allteams[pd.isnull(allteams.Fail)]
allteams = allteams[allteams['Entry type'] != 'FAC']

# Counts by game and team
allteams = allteams[['Season', 'Game', 'Entry by', 'Entry type']]
# Extract ending text part of entry. 
import re
def extract_team(string):
    result = re.search('\d*(\w{2,3})$', str(string))
    if result:
        return result.group(1)
    return string
allteams.loc[:, 'Team'] = allteams['Entry by'].apply(lambda x: extract_team(x))

# Get rid of typo teams -- about 100 in 77000
valid_teams = set(allteams.Team.value_counts().index[:31])
allteams = allteams[allteams.Team.apply(lambda x: x in valid_teams)]

allteams = allteams.drop('Entry by', axis=1) \
    .assign(Count=1) \
    .groupby(['Season', 'Game', 'Team', 'Entry type'], as_index=False) \
    .count()
    
# Add opp numbers
gametotals = allteams.drop('Team', axis=1) \
    .groupby(['Season', 'Game', 'Entry type'], as_index=False) \
    .sum() \
    .rename(columns={'Count': 'OppCount'})

fives['toi'] = pd.concat([dfs['toi'][team] \
                          [['Season', 'Game']] \
                          .assign(TOI=1) \
                          .groupby(['Season', 'Game'], as_index=False) \
                          .count() for team in dfs['toi']]).drop_duplicates()

# Sum by season and calculate per 60
allteams = allteams.merge(fives['toi'], how='left', on=['Season', 'Game'])
allteams = allteams.merge(gametotals, how='inner', on=['Season', 'Game', 'Entry type']) \
    .drop('Game', axis=1) \
    .groupby(['Season', 'Entry type', 'Team'], as_index=False) \
    .sum()
allteams.loc[:, 'OppCount'] = allteams['OppCount'] - allteams['Count']

allteams.loc[:, 'Per60'] = allteams['Count'] / (allteams.TOI / 3600)
allteams.loc[:, 'OppPer60'] = allteams['OppCount'] / (allteams.TOI / 3600)

allteams.head()



In [22]:

    
f = figure(figsize=[8, 8])

tmp = allteams[allteams['Entry type'] != 'X'] \
    .drop(['Count', 'OppCount', 'TOI'], axis=1) \
    .pivot_table(index=['Entry type', 'Team'], columns='Season', values='Per60') \
    .reset_index()
    
for etype in tmp['Entry type'].unique():
    tmp2 = tmp[tmp['Entry type'] == etype]
    scatter(tmp2.loc[:, 2016].values, tmp2.loc[:, 2017].values, label=etype, s=200, alpha=0.5)

for s, etype, t, r1, r2 in tmp.itertuples():
    annotate(t, xy=(r1, r2), ha='center', va='center') 

from scrapenhl2.plot import visualization_helper as vhelper
vhelper.add_good_bad_fast_slow(bottomleft='Bad', topleft='Improved', topright='Good', bottomright='Declined')
vhelper.add_cfpct_ref_lines_to_plot(ax=gca(), refs=[50])

title('Team 5v5 entry rate, 2016 vs 2017')
xlabel('2016')
ylabel('2017')
legend(loc=2, bbox_to_anchor=(1, 1))









    



/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:678: RuntimeWarning: invalid value encountered in subtract
  m = (y2 - y1) / (x2 - x1)
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes






    Out[22]:





<matplotlib.legend.Legend at 0x11c175f98>



In [23]:

    
f = figure(figsize=[8, 8])

tmp = allteams[allteams['Entry type'] != 'X'] \
    .drop(['Count', 'OppCount', 'TOI'], axis=1) \
    .pivot_table(index=['Season', 'Team'], columns='Entry type', values='Per60') \
    .reset_index()
    
tmp.loc[:, 'Ctrl%'] = tmp.C / (tmp.C + tmp.D)
tmp = tmp.drop(['C', 'D'], axis=1)
tmp = tmp.pivot_table(index='Team', columns='Season', values='Ctrl%').reset_index()
    
scatter(tmp.loc[:, 2016].values, tmp.loc[:, 2017].values, s=200, alpha=0.5)

for i, t, r1, r2 in tmp.itertuples():
    annotate(t, xy=(r1, r2), ha='center', va='center') 
    
vhelper.add_good_bad_fast_slow(bottomleft='Bad', topleft='Improved', topright='Good', bottomright='Declined')
vhelper.add_cfpct_ref_lines_to_plot(ax=gca(), refs=[50])

title('Team 5v5 controlled entry rate, 2016 vs 2017')
xlabel('2016')
ylabel('2017')









    



/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:678: RuntimeWarning: invalid value encountered in subtract
  m = (y2 - y1) / (x2 - x1)
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes






    Out[23]:





<matplotlib.text.Text at 0x11c113358>



In [24]:

    
fig, axes = subplots(1, 2, sharex=True, sharey=True, figsize=[12, 6])

for i, season in enumerate(allteams.Season.unique()):
    tmp = allteams[(allteams.Season == season) & (allteams['Entry type'] != 'X')]
    for etype in tmp['Entry type'].unique():
        tmp2 = tmp[tmp['Entry type'] == etype]
        axes[i].scatter(tmp2.Per60.values, tmp2.OppPer60.values, s=250, alpha=0.3, label=etype)
    axes[i].set_title('Entries for and against, {0:d}'.format(season))
    axes[i].set_xlabel('Entries per 60')
    if i == 0:
        axes[i].set_ylabel('Entries against per 60')
    for _, t, r1, r2 in tmp[['Team', 'Per60', 'OppPer60']].itertuples():
        axes[i].annotate(t, xy=(r1, r2), ha='center', va='center')
    vhelper.add_good_bad_fast_slow(bottomleft='NZ Jam', topleft='Bad', topright='Racetrack', bottomright='Good')
    vhelper.add_cfpct_ref_lines_to_plot(ax=axes[i])
    
legend(loc=2, bbox_to_anchor=(1, 1))









    



/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:678: RuntimeWarning: invalid value encountered in subtract
  m = (y2 - y1) / (x2 - x1)
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes






    Out[24]:





<matplotlib.legend.Legend at 0x11eb71cc0>



In [25]:

    
tmp = allteams[allteams['Entry type'] != 'X'] \
    .drop(['Count', 'OppCount', 'TOI'], axis=1) \
    .melt(id_vars=['Season', 'Entry type', 'Team']) \
    .pivot_table(index=['Season', 'Team', 'variable'], columns='Entry type', values='value') \
    .reset_index()

tmp.loc[:, 'CE%'] = tmp.C / (tmp.C + tmp.D)
tmp = tmp.drop(['C', 'D'], axis=1) \
    .pivot_table(index=['Season', 'Team'], columns='variable', values='CE%') \
    .rename(columns={'Per60': 'TeamCE%', 'OppPer60': 'OppCE%'}) \
    .reset_index()
    
fig, axes = subplots(1, 2, sharex=True, sharey=True, figsize=[12, 6])

for i, season in enumerate(tmp.Season.unique()):
    tmp2 = tmp[(tmp.Season == season)]
    axes[i].scatter(tmp2['TeamCE%'].values, tmp2['OppCE%'].values, s=250, alpha=0.3)
    
    axes[i].set_title('Controlled entries for and against, {0:d}'.format(season))
    axes[i].set_xlabel('Team CE%')
    if i == 0:
        axes[i].set_ylabel('Opp CE%')
    for _, t, r1, r2 in tmp2[['Team', 'TeamCE%', 'OppCE%']].itertuples():
        axes[i].annotate(t, xy=(r1, r2), ha='center', va='center')
        
    vhelper.add_good_bad_fast_slow(bottomleft='NZ Jam', topleft='Bad', topright='Racetrack', bottomright='Good')
    vhelper.add_cfpct_ref_lines_to_plot(ax=axes[i])
    
legend(loc=2, bbox_to_anchor=(1, 1))









    



/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:678: RuntimeWarning: invalid value encountered in subtract
  m = (y2 - y1) / (x2 - x1)
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes
/Users/muneebalam/anaconda/lib/python3.6/site-packages/matplotlib/axes/_axes.py:545: UserWarning: No labelled objects found. Use label='...' kwarg on individual plots.
  warnings.warn("No labelled objects found. "



In [26]:

    
entries = wsh['entries']

# Drop extra team cols
colnames = [x for x in entries.columns if not x.upper() == x or x[:3] == 'WSH']
entries = entries[colnames]

# Remove fails
entries = entries[pd.isnull(entries.Fail)]

# Flag entries as WSH or Opp
entries.loc[:, 'Team'] = entries['Entry by'].apply(lambda x: 'WSH' if str(x)[-3:] == 'WSH' else 'Opp')

# Remove faceoffs
# entries2 = entries
entries2 = entries[entries['Entry type'] != 'FAC']

# Melt to long
idvars = [x for x in entries2.columns if x not in ['WSH1', 'WSH2', 'WSH3', 'WSH4', 'WSH5']]
entries2 = entries2.melt(id_vars=idvars, value_name='Player').sort_values(['Season', 'Game', '_Secs', 'variable'])

entries2.head()









    Out[26]:







  
    
      
      Controlled?
      Defended by
      Dump recovered?
      Entry by
      Entry type
      Fail
      Fen total
      Game
      Goal total
      Goalie touch?
      ...
      Opp strength
      Period
      Season
      Team strength
      Time
      Time2
      _Secs
      Team
      variable
      Player
    
  
  
    
      2850
      19WSH
      3PIT
      N
      19WSH
      C
      NaN
      0.0
      20007
      0.0
      R
      ...
      5
      1.0
      2016
      5
      19:28:00
      19:28
      33
      WSH
      WSH1
      2WSH
    
    
      7131
      19WSH
      3PIT
      N
      19WSH
      C
      NaN
      0.0
      20007
      0.0
      R
      ...
      5
      1.0
      2016
      5
      19:28:00
      19:28
      33
      WSH
      WSH2
      27WSH
    
    
      11412
      19WSH
      3PIT
      N
      19WSH
      C
      NaN
      0.0
      20007
      0.0
      R
      ...
      5
      1.0
      2016
      5
      19:28:00
      19:28
      33
      WSH
      WSH3
      65WSH
    
    
      15693
      19WSH
      3PIT
      N
      19WSH
      C
      NaN
      0.0
      20007
      0.0
      R
      ...
      5
      1.0
      2016
      5
      19:28:00
      19:28
      33
      WSH
      WSH4
      19WSH
    
    
      19974
      19WSH
      3PIT
      N
      19WSH
      C
      NaN
      0.0
      20007
      0.0
      R
      ...
      5
      1.0
      2016
      5
      19:28:00
      19:28
      33
      WSH
      WSH5
      90WSH
    
  

5 rows × 23 columns



In [27]:

    
# Season level

# Count by entry type
entries60 = entries2[['Season', 'Entry type', 'Team', 'Game', 'Period', 'Time']] \
    .drop_duplicates() \
    [['Season', 'Entry type', 'Team']] \
    .assign(Count=1) \
    .groupby(['Season', 'Entry type', 'Team'], as_index=False) \
    .count()

# Add TOI
toi = dfs['toi']['WSH'] \
    [['Season']].assign(TOI=1) \
    .groupby('Season', as_index=False).count()

entries60 = entries60.merge(toi, how='left', on='Season')
entries60.loc[:, 'Per60'] = entries60.Count / (entries60.TOI / 3600)

entries60



In [28]:

    
tmp = entries60.assign(Height=entries60.Per60).sort_values(['Season', 'Team', 'Entry type'])
tmp.loc[:, 'Left'] = tmp.Team.apply(lambda x: 0 if x == 'WSH' else 1) + tmp.Season.apply(lambda x: 0 if x == 2016 else 0.4)
tmp.loc[:, 'Bottom'] = tmp.groupby(['Season', 'Team']).Height.cumsum() - tmp.Height
for etype in tmp['Entry type'].unique():
    tmp2 = tmp[tmp['Entry type'] == etype]
    bar(left=tmp2.Left.values, height=tmp2.Height.values, bottom=tmp2.Bottom.values, label=etype, width=0.3)

xlabs = tmp.drop_duplicates(subset=['Season', 'Team', 'Left'])
xticks([x for x in xlabs.Left], ['{0:d} {1:s}'.format(s, t) for s, t in zip(tmp2.Season, tmp2.Team)])
legend(loc=2, bbox_to_anchor=(1, 1))
title('Zone entries per 60, Caps vs opponents')









    Out[28]:





<matplotlib.text.Text at 0x1159e0320>



In [29]:

    
# Individual level, on-ice

# Count by entry type
entries60 = entries2[['Season', 'Entry type', 'Team', 'Game', 'Period', 'Time', 'Player']] \
    [['Season', 'Entry type', 'Team', 'Player']] \
    .assign(Count=1) \
    .groupby(['Season', 'Entry type', 'Team', 'Player'], as_index=False) \
    .count() \
    .pivot_table(index=['Season', 'Entry type', 'Player'], columns='Team', values='Count') \
    .reset_index()

# Add TOI
toi = dfs['toi']['WSH'] \
    [['Season', 'WSH1', 'WSH2', 'WSH3', 'WSH4', 'WSH5']] \
    .melt(id_vars='Season', value_name='Player') \
    .drop('variable', axis=1) \
    .assign(TOI=1) \
    .groupby(['Season', 'Player'], as_index=False).count()
toi.loc[:, 'Player'] = toi['Player'].apply(lambda x: str(wsh_players[x]) + 'WSH' if x in wsh_players else x)

entries60 = entries60.merge(toi, how='left', on=['Season', 'Player']) \
    .sort_values(['Player', 'Season', 'Entry type'])
entries60.loc[:, 'WSH60'] = entries60.WSH / (entries60.TOI / 3600)
entries60.loc[:, 'Opp60'] = entries60.Opp / (entries60.TOI / 3600)

entries60









    Out[29]:







  
    
      
      Season
      Entry type
      Player
      Opp
      WSH
      TOI
      WSH60
      Opp60
    
  
  
    
      0
      2016
      C
      10WSH
      91.0
      133.0
      10314
      46.422339
      31.762653
    
    
      23
      2016
      D
      10WSH
      95.0
      97.0
      10314
      33.856894
      33.158813
    
    
      46
      2016
      X
      10WSH
      10.0
      4.0
      10314
      1.396161
      3.490401
    
    
      68
      2017
      C
      10WSH
      72.0
      71.0
      6894
      37.075718
      37.597911
    
    
      91
      2017
      D
      10WSH
      58.0
      81.0
      6894
      42.297650
      30.287206
    
    
      114
      2017
      X
      10WSH
      13.0
      9.0
      6894
      4.699739
      6.788512
    
    
      1
      2016
      C
      12WSH
      59.0
      67.0
      5321
      45.329825
      39.917309
    
    
      24
      2016
      D
      12WSH
      52.0
      58.0
      5321
      39.240744
      35.181357
    
    
      47
      2016
      X
      12WSH
      2.0
      3.0
      5321
      2.029694
      1.353129
    
    
      2
      2016
      C
      13WSH
      13.0
      14.0
      1433
      35.170970
      32.658758
    
    
      25
      2016
      D
      13WSH
      12.0
      11.0
      1433
      27.634334
      30.146546
    
    
      48
      2016
      X
      13WSH
      NaN
      2.0
      1433
      5.024424
      NaN
    
    
      69
      2017
      C
      13WSH
      101.0
      146.0
      10468
      50.210164
      34.734429
    
    
      92
      2017
      D
      13WSH
      117.0
      78.0
      10468
      26.824608
      40.236912
    
    
      115
      2017
      X
      13WSH
      13.0
      2.0
      10468
      0.687810
      4.470768
    
    
      3
      2016
      C
      14WSH
      147.0
      161.0
      14716
      39.385703
      35.960859
    
    
      26
      2016
      D
      14WSH
      143.0
      158.0
      14716
      38.651808
      34.982332
    
    
      49
      2016
      X
      14WSH
      5.0
      11.0
      14716
      2.690949
      1.223158
    
    
      4
      2016
      C
      18WSH
      1.0
      4.0
      466
      30.901288
      7.725322
    
    
      27
      2016
      D
      18WSH
      7.0
      6.0
      466
      46.351931
      54.077253
    
    
      70
      2017
      C
      18WSH
      39.0
      39.0
      4723
      29.726869
      29.726869
    
    
      93
      2017
      D
      18WSH
      55.0
      60.0
      4723
      45.733644
      41.922507
    
    
      116
      2017
      X
      18WSH
      8.0
      4.0
      4723
      3.048910
      6.097819
    
    
      5
      2016
      C
      19WSH
      181.0
      201.0
      17399
      41.588597
      37.450428
    
    
      28
      2016
      D
      19WSH
      171.0
      177.0
      17399
      36.622794
      35.381344
    
    
      50
      2016
      X
      19WSH
      9.0
      12.0
      17399
      2.482901
      1.862176
    
    
      71
      2017
      C
      19WSH
      114.0
      143.0
      11360
      45.316901
      36.126761
    
    
      94
      2017
      D
      19WSH
      110.0
      109.0
      11360
      34.542254
      34.859155
    
    
      117
      2017
      X
      19WSH
      9.0
      13.0
      11360
      4.119718
      2.852113
    
    
      6
      2016
      C
      20WSH
      136.0
      181.0
      15087
      43.189501
      32.451780
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      86
      2017
      C
      83WSH
      66.0
      62.0
      7677
      29.073857
      30.949590
    
    
      109
      2017
      D
      83WSH
      84.0
      88.0
      7677
      41.266120
      39.390387
    
    
      132
      2017
      X
      83WSH
      9.0
      13.0
      7677
      6.096131
      4.220399
    
    
      18
      2016
      C
      88WSH
      129.0
      189.0
      14731
      46.188310
      31.525355
    
    
      41
      2016
      D
      88WSH
      155.0
      138.0
      14731
      33.724798
      37.879302
    
    
      63
      2016
      X
      88WSH
      8.0
      5.0
      14731
      1.221913
      1.955061
    
    
      19
      2016
      C
      8WSH
      201.0
      224.0
      18229
      44.237204
      39.694991
    
    
      42
      2016
      D
      8WSH
      187.0
      163.0
      18229
      32.190466
      36.930166
    
    
      64
      2016
      X
      8WSH
      10.0
      19.0
      18229
      3.752263
      1.974875
    
    
      87
      2017
      C
      8WSH
      126.0
      148.0
      11351
      46.938596
      39.961237
    
    
      110
      2017
      D
      8WSH
      127.0
      109.0
      11351
      34.569641
      40.278390
    
    
      133
      2017
      X
      8WSH
      11.0
      13.0
      11351
      4.122985
      3.488679
    
    
      20
      2016
      C
      90WSH
      178.0
      181.0
      16914
      38.524299
      37.885775
    
    
      43
      2016
      D
      90WSH
      177.0
      170.0
      16914
      36.183044
      37.672934
    
    
      65
      2016
      X
      90WSH
      6.0
      17.0
      16914
      3.618304
      1.277049
    
    
      88
      2017
      C
      91WSH
      11.0
      7.0
      1441
      17.487856
      27.480916
    
    
      111
      2017
      D
      91WSH
      14.0
      14.0
      1441
      34.975711
      34.975711
    
    
      134
      2017
      X
      91WSH
      1.0
      3.0
      1441
      7.494795
      2.498265
    
    
      21
      2016
      C
      92WSH
      187.0
      203.0
      17422
      41.946964
      38.640799
    
    
      44
      2016
      D
      92WSH
      182.0
      158.0
      17422
      32.648376
      37.607623
    
    
      66
      2016
      X
      92WSH
      6.0
      19.0
      17422
      3.926070
      1.239812
    
    
      89
      2017
      C
      92WSH
      113.0
      149.0
      11133
      48.181083
      36.540016
    
    
      112
      2017
      D
      92WSH
      124.0
      89.0
      11133
      28.779305
      40.097009
    
    
      135
      2017
      X
      92WSH
      13.0
      8.0
      11133
      2.586904
      4.203719
    
    
      22
      2016
      C
      9WSH
      236.0
      261.0
      22548
      41.671102
      37.679617
    
    
      45
      2016
      D
      9WSH
      222.0
      223.0
      22548
      35.604045
      35.444385
    
    
      67
      2016
      X
      9WSH
      13.0
      20.0
      22548
      3.193188
      2.075572
    
    
      90
      2017
      C
      9WSH
      127.0
      193.0
      15255
      45.545723
      29.970501
    
    
      113
      2017
      D
      9WSH
      160.0
      163.0
      15255
      38.466077
      37.758112
    
    
      136
      2017
      X
      9WSH
      16.0
      11.0
      15255
      2.595870
      3.775811
    
  

137 rows × 8 columns



In [30]:

    
title('Entry%, 2016 vs 2017')
xlabel('2016')
ylabel('2017')

tmp = entries60[['Season', 'Player', 'WSH', 'Opp']] \
    .groupby(['Season', 'Player', 'WSH', 'Opp'], as_index=False) \
    .sum()

tmp.loc[:, 'Entry%'] = tmp.WSH / (tmp.WSH + tmp.Opp)
tmp = tmp.drop(['WSH', 'Opp'], axis=1)
tmp = tmp.pivot_table(index='Player', columns='Season', values='Entry%')

scatter(tmp.loc[:, 2016].values, tmp.loc[:, 2017].values, s=200, alpha=0.5)
for p, e1, e2 in tmp.itertuples():
    annotate(p[:-3], xy=(e1, e2), ha='center', va='center')
    
vhelper.add_good_bad_fast_slow(bottomleft='Bad', topleft='Improved', topright='Good', bottomright='Declined')
vhelper.add_cfpct_ref_lines_to_plot(ax=gca(), refs=[50])









    



/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/reshape/pivot.py:135: FutureWarning: 'Player' is both a column name and an index level.
Defaulting to column but this will raise an ambiguity error in a future version
  grouped = data.groupby(keys)
/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/reshape/pivot.py:135: FutureWarning: 'Season' is both a column name and an index level.
Defaulting to column but this will raise an ambiguity error in a future version
  grouped = data.groupby(keys)
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:678: RuntimeWarning: invalid value encountered in subtract
  m = (y2 - y1) / (x2 - x1)
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes



In [31]:

    
title('Entries per 60, 2016 vs 2017')
xlabel('2016')
ylabel('2017')

tmp = entries60[['Season', 'Player', 'WSH60']] \
    .groupby(['Season', 'Player'], as_index=False) \
    .sum()

tmp = tmp.pivot_table(index='Player', columns='Season', values='WSH60')

scatter(tmp.loc[:, 2016].values, tmp.loc[:, 2017].values, s=200, alpha=0.5)
for p, e1, e2 in tmp.itertuples():
    annotate(p[:-3], xy=(e1, e2), ha='center', va='center')
    
vhelper.add_good_bad_fast_slow(bottomleft='Bad', topleft='Improved', topright='Good', bottomright='Declined')
vhelper.add_cfpct_ref_lines_to_plot(ax=gca(), refs=[50])









    



/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:678: RuntimeWarning: invalid value encountered in subtract
  m = (y2 - y1) / (x2 - x1)
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes



In [32]:

    
title('Entries against per 60, 2016 vs 2017')
xlabel('2016')
ylabel('2017')

tmp = entries60[['Season', 'Player', 'Opp60']] \
    .groupby(['Season', 'Player'], as_index=False) \
    .sum()

tmp = tmp.pivot_table(index='Player', columns='Season', values='Opp60')

scatter(tmp.loc[:, 2016].values, tmp.loc[:, 2017].values, s=200, alpha=0.5)
for p, e1, e2 in tmp.itertuples():
    annotate(p[:-3], xy=(e1, e2), ha='center', va='center')
    
vhelper.add_good_bad_fast_slow(bottomleft='Good', topleft='Declined', topright='Bad', bottomright='Improved')
vhelper.add_cfpct_ref_lines_to_plot(ax=gca(), refs=[50])

ylim(bottom=xlim()[0])
xlim(right=ylim()[1])









    



/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:678: RuntimeWarning: invalid value encountered in subtract
  m = (y2 - y1) / (x2 - x1)
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes






    Out[32]:





(59.225931217350521, 86.247630499935255)



In [33]:

    
sums = entries60[['Season', 'Player', 'WSH', 'Opp']] \
    .groupby(['Season', 'Player'], as_index=False) \
    .sum()
    
cep = entries60[entries60['Entry type'] == 'C'] \
    [['Season', 'Player', 'WSH', 'Opp']] \
    .merge(sums, how='inner', on=['Season', 'Player'], suffixes=['', '_Tot']) 
    
cep.loc[:, 'CE%'] = cep.WSH / (cep.WSH + cep.WSH_Tot)
cep.loc[:, 'Opp CE%'] = cep.Opp / (cep.Opp + cep.Opp_Tot)

ce = cep[['Season', 'Player', 'CE%']] \
    .pivot_table(index='Player', columns='Season', values='CE%') 
oppce = cep[['Season', 'Player', 'Opp CE%']] \
    .pivot_table(index='Player', columns='Season', values='Opp CE%')



In [34]:

    
title('On-ice controlled entry%, 2016 vs 2017')
xlabel('2016')
ylabel('2017')

scatter(ce.loc[:, 2016].values, ce.loc[:, 2017].values, s=200, alpha=0.5)
for p, e1, e2 in ce.itertuples():
    annotate(p[:-3], xy=(e1, e2), ha='center', va='center')
    
vhelper.add_good_bad_fast_slow(bottomleft='Bad', topleft='Improved', topright='Good', bottomright='Declined')
vhelper.add_cfpct_ref_lines_to_plot(ax=gca(), refs=[50])









    



/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:678: RuntimeWarning: invalid value encountered in subtract
  m = (y2 - y1) / (x2 - x1)
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes



In [35]:

    
title('Opp on-ice controlled entry%, 2016 vs 2017')
xlabel('2016')
ylabel('2017')

scatter(oppce.loc[:, 2016].values, oppce.loc[:, 2017].values, s=200, alpha=0.5)
for p, e1, e2 in oppce.itertuples():
    annotate(p[:-3], xy=(e1, e2), ha='center', va='center')
    
vhelper.add_good_bad_fast_slow(bottomleft='Good', topleft='Declined', topright='Bad', bottomright='Improved')
vhelper.add_cfpct_ref_lines_to_plot(ax=gca(), refs=[50])
xlim(left=0.2)









    



/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:678: RuntimeWarning: invalid value encountered in subtract
  m = (y2 - y1) / (x2 - x1)
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes






    Out[35]:





(0.2, 0.43302679618489265)



In [36]:

    
# Individual level, individual

# Count by entry type
ientries60 = entries2[pd.notnull(entries2['Entry by'])]
ientries60 = ientries60[ientries60['Entry by'].str.contains('WSH')] \
    [['Season', 'Entry type', 'Team', 'Game', 'Period', 'Time', 'Entry by', 'Fen total']] \
    .drop_duplicates() \
    [['Season', 'Entry type', 'Team', 'Entry by', 'Fen total']] \
    .assign(Count=1) \
    .groupby(['Season', 'Entry type', 'Team', 'Entry by'], as_index=False) \
    .count() \
    .pivot_table(index=['Season', 'Entry type', 'Entry by'], columns='Team', values='Count') \
    .reset_index()

# Add TOI
toi = dfs['toi']['WSH'] \
    [['Season', 'WSH1', 'WSH2', 'WSH3', 'WSH4', 'WSH5']] \
    .melt(id_vars='Season', value_name='Entry by') \
    .drop('variable', axis=1) \
    .assign(TOI=1) \
    .groupby(['Season', 'Entry by'], as_index=False).count()
toi.loc[:, 'Entry by'] = toi['Entry by'].apply(lambda x: str(wsh_players[x]) + 'WSH' if x in wsh_players else x)

ientries60 = ientries60.merge(toi, how='left', on=['Season', 'Entry by']) \
    .sort_values(['Entry by', 'Season', 'Entry type'])
ientries60.loc[:, 'WSH60'] = entries60.WSH / (entries60.TOI / 3600)

ientries60.drop(['TOI', 'WSH'], axis=1).pivot_table(index=['Entry by', 'Season'], columns='Entry type', values='WSH60')



In [37]:

    
tmp = entries60[['Season', 'Entry type', 'Player', 'WSH']] \
    .merge(ientries60[['Season', 'Entry type', 'Entry by', 'WSH']] \
               .rename(columns={'Entry by': 'Player', 'WSH': 'iWSH'}), 
           how='left', on=['Season', 'Entry type', 'Player']) \
    .drop('Entry type', axis=1) \
    .groupby(['Season', 'Player'], as_index=False) \
    .sum()
    
tmp.loc[:, 'iE%'] = tmp.iWSH / tmp.WSH
tmp = tmp.drop({'WSH', 'iWSH'}, axis=1) \
    .pivot_table(index='Player', columns='Season', values='iE%') \
    .fillna(0)
    
title('Individual entry share, 2016 vs 2017')
xlabel('2016')
ylabel('2017')

scatter(tmp.loc[:, 2016].values, tmp.loc[:, 2017].values, s=200, alpha=0.5)
for p, e1, e2 in tmp.itertuples():
    annotate(p[:-3], xy=(e1, e2), ha='center', va='center')
    
vhelper.add_good_bad_fast_slow(bottomleft='Bad', topleft='Improved', topright='Good', bottomright='Declined')
vhelper.add_cfpct_ref_lines_to_plot(ax=gca(), refs=[50])









    



/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:679: RuntimeWarning: invalid value encountered in subtract
  b = y1 - m*x1
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes

Shots per entry

Luckily, Corey has a Fenwick-post-entry column. We'll just use that.



In [38]:

    
# Season level

# Count by entry type
spe = entries2[['Season', 'Entry type', 'Team', 'Game', 'Period', 'Time', 'Fen total']] \
    .drop_duplicates() \
    [['Season', 'Entry type', 'Team', 'Fen total']] \
    .groupby(['Season', 'Entry type', 'Team'], as_index=False) \
    .mean() \
    .pivot_table(index=['Season', 'Entry type'], columns='Team', values='Fen total') \
    .reset_index() \
    .sort_values(['Entry type', 'Season'])

spe



In [39]:

    
# Individual level, on-ice

# Count by entry type
spe = entries2[['Season', 'Entry type', 'Team', 'Game', 'Period', 'Time', 'Player', 'Fen total']] \
    [['Season', 'Entry type', 'Team', 'Player', 'Fen total']] \
    .assign(Count=1) \
    .groupby(['Season', 'Entry type', 'Team', 'Player'], as_index=False) \
    .mean() \
    .pivot_table(index=['Season', 'Entry type', 'Player'], columns='Team', values='Fen total') \
    .reset_index() \
    .sort_values(['Player', 'Season', 'Entry type'])

spe









    Out[39]:







  
    
      Team
      Season
      Entry type
      Player
      Opp
      WSH
    
  
  
    
      0
      2016
      C
      10WSH
      0.626374
      0.541353
    
    
      23
      2016
      D
      10WSH
      0.263158
      0.226804
    
    
      46
      2016
      X
      10WSH
      1.000000
      0.000000
    
    
      68
      2017
      C
      10WSH
      0.875000
      0.478873
    
    
      91
      2017
      D
      10WSH
      0.293103
      0.234568
    
    
      114
      2017
      X
      10WSH
      0.538462
      0.555556
    
    
      1
      2016
      C
      12WSH
      0.644068
      0.597015
    
    
      24
      2016
      D
      12WSH
      0.288462
      0.310345
    
    
      47
      2016
      X
      12WSH
      1.000000
      0.333333
    
    
      2
      2016
      C
      13WSH
      0.692308
      0.428571
    
    
      25
      2016
      D
      13WSH
      0.166667
      0.181818
    
    
      48
      2016
      X
      13WSH
      NaN
      0.000000
    
    
      69
      2017
      C
      13WSH
      0.693069
      0.541096
    
    
      92
      2017
      D
      13WSH
      0.358974
      0.205128
    
    
      115
      2017
      X
      13WSH
      0.461538
      1.000000
    
    
      3
      2016
      C
      14WSH
      0.632653
      0.596273
    
    
      26
      2016
      D
      14WSH
      0.237762
      0.272152
    
    
      49
      2016
      X
      14WSH
      0.200000
      0.363636
    
    
      4
      2016
      C
      18WSH
      0.000000
      0.500000
    
    
      27
      2016
      D
      18WSH
      0.285714
      0.500000
    
    
      70
      2017
      C
      18WSH
      0.538462
      0.717949
    
    
      93
      2017
      D
      18WSH
      0.345455
      0.183333
    
    
      116
      2017
      X
      18WSH
      0.625000
      0.500000
    
    
      5
      2016
      C
      19WSH
      0.674033
      0.542289
    
    
      28
      2016
      D
      19WSH
      0.198830
      0.242938
    
    
      50
      2016
      X
      19WSH
      0.888889
      0.666667
    
    
      71
      2017
      C
      19WSH
      0.596491
      0.601399
    
    
      94
      2017
      D
      19WSH
      0.290909
      0.311927
    
    
      117
      2017
      X
      19WSH
      0.333333
      0.076923
    
    
      6
      2016
      C
      20WSH
      0.625000
      0.546961
    
    
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      86
      2017
      C
      83WSH
      0.757576
      0.516129
    
    
      109
      2017
      D
      83WSH
      0.333333
      0.181818
    
    
      132
      2017
      X
      83WSH
      0.777778
      0.615385
    
    
      18
      2016
      C
      88WSH
      0.589147
      0.507937
    
    
      41
      2016
      D
      88WSH
      0.309677
      0.210145
    
    
      63
      2016
      X
      88WSH
      0.500000
      0.200000
    
    
      19
      2016
      C
      8WSH
      0.651741
      0.549107
    
    
      42
      2016
      D
      8WSH
      0.181818
      0.269939
    
    
      64
      2016
      X
      8WSH
      0.800000
      0.736842
    
    
      87
      2017
      C
      8WSH
      0.650794
      0.513514
    
    
      110
      2017
      D
      8WSH
      0.322835
      0.275229
    
    
      133
      2017
      X
      8WSH
      0.545455
      0.384615
    
    
      20
      2016
      C
      90WSH
      0.640449
      0.563536
    
    
      43
      2016
      D
      90WSH
      0.203390
      0.241176
    
    
      65
      2016
      X
      90WSH
      0.666667
      0.470588
    
    
      88
      2017
      C
      91WSH
      0.909091
      0.285714
    
    
      111
      2017
      D
      91WSH
      0.285714
      0.000000
    
    
      134
      2017
      X
      91WSH
      0.000000
      1.000000
    
    
      21
      2016
      C
      92WSH
      0.657754
      0.591133
    
    
      44
      2016
      D
      92WSH
      0.219780
      0.234177
    
    
      66
      2016
      X
      92WSH
      0.666667
      0.526316
    
    
      89
      2017
      C
      92WSH
      0.716814
      0.483221
    
    
      112
      2017
      D
      92WSH
      0.387097
      0.269663
    
    
      135
      2017
      X
      92WSH
      0.461538
      0.500000
    
    
      22
      2016
      C
      9WSH
      0.639831
      0.528736
    
    
      45
      2016
      D
      9WSH
      0.193694
      0.278027
    
    
      67
      2016
      X
      9WSH
      0.923077
      0.500000
    
    
      90
      2017
      C
      9WSH
      0.724409
      0.575130
    
    
      113
      2017
      D
      9WSH
      0.256250
      0.257669
    
    
      136
      2017
      X
      9WSH
      0.687500
      0.272727
    
  

137 rows × 5 columns



In [40]:

    
tmp = spe[spe['Entry type'] != 'X'] \
    .drop('Opp', axis=1) \
    .pivot_table(index=['Player', 'Entry type'], columns='Season', values='WSH') \
    .fillna(0) \
    .reset_index()
    
for etype in tmp['Entry type'].unique():
    tmp2 = tmp[tmp['Entry type'] == etype]
    scatter(tmp2.loc[:, 2016].values, tmp2.loc[:, 2017].values, label=etype, s=200, alpha=0.5)

for s, p, etype, e1, e2 in tmp.itertuples():
    annotate(p[:-3], xy=(e1, e2), ha='center', va='center') 
    
vhelper.add_good_bad_fast_slow(bottomleft='Bad', topleft='Improved', topright='Good', bottomright='Declined')
vhelper.add_cfpct_ref_lines_to_plot(ax=gca(), refs=[50])

title('On-ice shots per entry')
xlabel('2016')
ylabel('2017')
legend(loc=2, bbox_to_anchor=(1, 1))









    



/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:679: RuntimeWarning: invalid value encountered in subtract
  b = y1 - m*x1
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes






    Out[40]:





<matplotlib.legend.Legend at 0x11a432cc0>



In [41]:

    
tmp = spe[spe['Entry type'] != 'X'] \
    .drop('WSH', axis=1) \
    .pivot_table(index=['Player', 'Entry type'], columns='Season', values='Opp') \
    .fillna(0) \
    .reset_index()
    
for etype in tmp['Entry type'].unique():
    tmp2 = tmp[tmp['Entry type'] == etype]
    scatter(tmp2.loc[:, 2016].values, tmp2.loc[:, 2017].values, label=etype, s=200, alpha=0.5)

for s, p, etype, e1, e2 in tmp.itertuples():
    annotate(p[:-3], xy=(e1, e2), ha='center', va='center') 
    
vhelper.add_good_bad_fast_slow(bottomleft='Good', topleft='Declined', topright='Bad', bottomright='Improved')
vhelper.add_cfpct_ref_lines_to_plot(ax=gca(), refs=[50])

title('On-ice shots against per entry')
xlabel('2016')
ylabel('2017')
legend(loc=2, bbox_to_anchor=(1, 1))









    



/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:578: RuntimeWarning: divide by zero encountered in true_divide
  return cf/cfpct - cf
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:679: RuntimeWarning: invalid value encountered in subtract
  b = y1 - m*x1
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:614: RuntimeWarning: divide by zero encountered in true_divide
  bottomx = bottomy / slopes
/Users/muneebalam/PycharmProjects/scrapenhl2/scrapenhl2/plot/visualization_helper.py:615: RuntimeWarning: divide by zero encountered in true_divide
  topx = topy / slopes






    Out[41]:





<matplotlib.legend.Legend at 0x11f13bdd8>



In [42]:

    
# Individual level, individual

# Count by entry type
spe = entries2[pd.notnull(entries2['Entry by'])]
spe = spe[spe['Entry by'].str.contains('WSH')] \
    [['Season', 'Entry type', 'Team', 'Game', 'Period', 'Time', 'Entry by', 'Fen total']] \
    .drop_duplicates() \
    [['Season', 'Entry type', 'Team', 'Entry by', 'Fen total']] \
    .assign(Count=1) \
    .groupby(['Season', 'Entry type', 'Team', 'Entry by'], as_index=False) \
    .mean() \
    .pivot_table(index=['Season', 'Entry type', 'Entry by'], columns='Team', values='Fen total') \
    .reset_index() \
    .sort_values(['Entry by', 'Season', 'Entry type'])

spe.pivot_table(index=['Entry by', 'Season'], columns='Entry type', values='WSH')

Exits per 60

Some of the relevant result codes:

P for pass
C for carry
D for dump
M for clear
F for fail



In [43]:

    
exits = wsh['exits']

# Drop extra team cols
colnames = [x for x in exits.columns if not x.upper() == x or x[:3] == 'WSH']
exits = exits[colnames]

# Flag exits as WSH or Opp
exits.loc[:, 'Team'] = exits['Attempt'].apply(lambda x: 'WSH' if str(x)[-3:] == 'WSH' else 'Opp')

# Melt to long
idvars = [x for x in exits.columns if x not in ['WSH1', 'WSH2', 'WSH3', 'WSH4', 'WSH5']]
exits2 = exits.melt(id_vars=idvars, value_name='Player').sort_values(['Season', 'Game', '_Secs', 'variable'])

exits2.head()









    



/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:337: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[key] = _infer_fill_value(value)
/Users/muneebalam/anaconda/lib/python3.6/site-packages/pandas/core/indexing.py:517: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self.obj[item] = s






    Out[43]:







  
    
      
      Period
      Time
      Attempt
      Result
      Pressured?
      Pass Target
      Entry?
      Direction
      Game
      Season
      Time2
      _Secs
      Team
      variable
      Player
    
  
  
    
      2406
      1
      19:55:00
      8PIT
      I
      N
      13PIT
      NaN
      L
      20007
      2016
      19:55
      6
      Opp
      WSH1
      2WSH
    
    
      9068
      1
      19:55:00
      8PIT
      I
      N
      13PIT
      NaN
      L
      20007
      2016
      19:55
      6
      Opp
      WSH2
      27WSH
    
    
      15730
      1
      19:55:00
      8PIT
      I
      N
      13PIT
      NaN
      L
      20007
      2016
      19:55
      6
      Opp
      WSH3
      92WSH
    
    
      22392
      1
      19:55:00
      8PIT
      I
      N
      13PIT
      NaN
      L
      20007
      2016
      19:55
      6
      Opp
      WSH4
      8WSH
    
    
      29054
      1
      19:55:00
      8PIT
      I
      N
      13PIT
      NaN
      L
      20007
      2016
      19:55
      6
      Opp
      WSH5
      77WSH



In [44]:

    
# Season level

# Count by exit type
exits60 = exits2[['Season', 'Result', 'Team', 'Game', 'Period', 'Time']] \
    .drop_duplicates() \
    [['Season', 'Result', 'Team']] \
    .assign(Count=1) \
    .groupby(['Season', 'Result', 'Team'], as_index=False) \
    .count() \
    .pivot_table(index=['Season', 'Result'], columns='Team', values='Count') \
    .reset_index() \
    .sort_values(['Result', 'Season'])

# Add TOI
toi = dfs['toi']['WSH'] \
    [['Season']].assign(TOI=1) \
    .groupby('Season', as_index=False).count()

exits60 = exits60.merge(toi, how='left', on='Season')
exits60.loc[:, 'WSH60'] = exits60.WSH / (exits60.TOI / 3600)
exits60.loc[:, 'Opp60'] = exits60.Opp / (exits60.TOI / 3600)

exits60



In [45]:

    
# Individual level, on-ice

# Count by exit type
exits60 = exits2[['Season', 'Result', 'Team', 'Game', 'Period', 'Time', 'Player']] \
    [['Season', 'Result', 'Team', 'Player']] \
    .assign(Count=1) \
    .groupby(['Season', 'Result', 'Team', 'Player'], as_index=False) \
    .count() \
    .pivot_table(index=['Season', 'Result', 'Player'], columns='Team', values='Count') \
    .reset_index() \
    .sort_values(['Result', 'Season', 'Player'])

# Add TOI
toi = dfs['toi']['WSH'] \
    [['Season', 'WSH1', 'WSH2', 'WSH3', 'WSH4', 'WSH5']] \
    .melt(id_vars='Season', value_name='Player') \
    .drop('variable', axis=1) \
    .assign(TOI=1) \
    .groupby(['Season', 'Player'], as_index=False).count()
toi.loc[:, 'Player'] = toi['Player'].apply(lambda x: str(wsh_players[x]) + 'WSH' if x in wsh_players else x)

exits60 = exits60.merge(toi, how='left', on=['Season', 'Player'])
exits60.loc[:, 'WSH60'] = exits60.WSH / (exits60.TOI / 3600)
exits60.loc[:, 'Opp60'] = exits60.Opp / (exits60.TOI / 3600)

exits60









    Out[45]:







  
    
      
      Season
      Result
      Player
      Opp
      WSH
      TOI
      WSH60
      Opp60
    
  
  
    
      0
      2016
      C
      10WSH
      70.0
      72.0
      10314
      25.130890
      24.432810
    
    
      1
      2016
      C
      12WSH
      40.0
      40.0
      5321
      27.062582
      27.062582
    
    
      2
      2016
      C
      13WSH
      12.0
      11.0
      1433
      27.634334
      30.146546
    
    
      3
      2016
      C
      14WSH
      88.0
      80.0
      14716
      19.570535
      21.527589
    
    
      4
      2016
      C
      18WSH
      1.0
      4.0
      466
      30.901288
      7.725322
    
    
      5
      2016
      C
      19WSH
      85.0
      96.0
      17399
      19.863211
      17.587218
    
    
      6
      2016
      C
      20WSH
      103.0
      107.0
      15087
      25.531915
      24.577451
    
    
      7
      2016
      C
      22WSH
      17.0
      17.0
      2676
      22.869955
      22.869955
    
    
      8
      2016
      C
      26WSH
      72.0
      64.0
      11752
      19.605174
      22.055820
    
    
      9
      2016
      C
      27WSH
      124.0
      110.0
      20882
      18.963701
      21.377263
    
    
      10
      2016
      C
      2WSH
      147.0
      123.0
      21875
      20.242286
      24.192000
    
    
      11
      2016
      C
      43WSH
      74.0
      60.0
      12407
      17.409527
      21.471750
    
    
      12
      2016
      C
      44WSH
      116.0
      110.0
      18880
      20.974576
      22.118644
    
    
      13
      2016
      C
      4WSH
      3.0
      7.0
      789
      31.939163
      13.688213
    
    
      14
      2016
      C
      65WSH
      78.0
      78.0
      11876
      23.644325
      23.644325
    
    
      15
      2016
      C
      74WSH
      104.0
      100.0
      19699
      18.275039
      19.006041
    
    
      16
      2016
      C
      77WSH
      88.0
      81.0
      16742
      17.417274
      18.922470
    
    
      17
      2016
      C
      83WSH
      82.0
      70.0
      12722
      19.808206
      23.203899
    
    
      18
      2016
      C
      88WSH
      90.0
      92.0
      14731
      22.483199
      21.994434
    
    
      19
      2016
      C
      8WSH
      102.0
      87.0
      18229
      17.181414
      20.143727
    
    
      20
      2016
      C
      90WSH
      103.0
      95.0
      16914
      20.219936
      21.922668
    
    
      21
      2016
      C
      92WSH
      109.0
      85.0
      17422
      17.564000
      22.523246
    
    
      22
      2016
      C
      9WSH
      137.0
      131.0
      22548
      20.915381
      21.873337
    
    
      23
      2017
      C
      10WSH
      23.0
      41.0
      6894
      21.409922
      12.010444
    
    
      24
      2017
      C
      13WSH
      47.0
      59.0
      10468
      20.290409
      16.163546
    
    
      25
      2017
      C
      18WSH
      19.0
      24.0
      4723
      18.293458
      14.482321
    
    
      26
      2017
      C
      19WSH
      58.0
      57.0
      11360
      18.063380
      18.380282
    
    
      27
      2017
      C
      20WSH
      28.0
      52.0
      8071
      23.194152
      12.489159
    
    
      28
      2017
      C
      22WSH
      37.0
      40.0
      7878
      18.278751
      16.907845
    
    
      29
      2017
      C
      25WSH
      41.0
      45.0
      7975
      20.313480
      18.507837
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      285
      2017
      P
      19WSH
      95.0
      125.0
      11360
      39.612676
      30.105634
    
    
      286
      2017
      P
      20WSH
      69.0
      72.0
      8071
      32.114980
      30.776855
    
    
      287
      2017
      P
      22WSH
      77.0
      68.0
      7878
      31.073877
      35.186596
    
    
      288
      2017
      P
      25WSH
      67.0
      68.0
      7975
      30.695925
      30.244514
    
    
      289
      2017
      P
      29WSH
      73.0
      86.0
      8784
      35.245902
      29.918033
    
    
      290
      2017
      P
      2WSH
      84.0
      99.0
      10013
      35.593728
      30.200739
    
    
      291
      2017
      P
      39WSH
      59.0
      49.0
      6624
      26.630435
      32.065217
    
    
      292
      2017
      P
      43WSH
      71.0
      81.0
      7687
      37.934175
      33.250943
    
    
      293
      2017
      P
      44WSH
      113.0
      119.0
      12994
      32.969063
      31.306757
    
    
      294
      2017
      P
      4WSH
      33.0
      22.0
      3533
      22.417209
      33.625814
    
    
      295
      2017
      P
      55WSH
      24.0
      21.0
      2908
      25.997249
      29.711142
    
    
      296
      2017
      P
      65WSH
      51.0
      74.0
      7199
      37.005140
      25.503542
    
    
      297
      2017
      P
      74WSH
      112.0
      154.0
      14439
      38.396011
      27.924371
    
    
      298
      2017
      P
      77WSH
      70.0
      99.0
      9137
      39.006238
      27.580169
    
    
      299
      2017
      P
      79WSH
      10.0
      10.0
      1696
      21.226415
      21.226415
    
    
      300
      2017
      P
      83WSH
      57.0
      57.0
      7677
      26.729191
      26.729191
    
    
      301
      2017
      P
      8WSH
      100.0
      115.0
      11351
      36.472557
      31.715267
    
    
      302
      2017
      P
      91WSH
      12.0
      8.0
      1441
      19.986121
      29.979181
    
    
      303
      2017
      P
      92WSH
      101.0
      105.0
      11133
      33.953112
      32.659660
    
    
      304
      2017
      P
      9WSH
      128.0
      139.0
      15255
      32.802360
      30.206490
    
    
      305
      2017
      PN
      20WSH
      1.0
      NaN
      8071
      NaN
      0.446041
    
    
      306
      2017
      PN
      25WSH
      1.0
      NaN
      7975
      NaN
      0.451411
    
    
      307
      2017
      PN
      39WSH
      1.0
      NaN
      6624
      NaN
      0.543478
    
    
      308
      2017
      PN
      55WSH
      1.0
      NaN
      2908
      NaN
      1.237964
    
    
      309
      2017
      PN
      9WSH
      1.0
      NaN
      15255
      NaN
      0.235988
    
    
      310
      2017
      p
      22WSH
      NaN
      1.0
      7878
      0.456969
      NaN
    
    
      311
      2017
      p
      43WSH
      NaN
      1.0
      7687
      0.468323
      NaN
    
    
      312
      2017
      p
      44WSH
      NaN
      1.0
      12994
      0.277051
      NaN
    
    
      313
      2017
      p
      8WSH
      NaN
      1.0
      11351
      0.317153
      NaN
    
    
      314
      2017
      p
      92WSH
      NaN
      1.0
      11133
      0.323363
      NaN
    
  

315 rows × 8 columns



In [46]:

    
# Individual level, individual

# Count by exit type
exits60 = exits2[exits2.Attempt.str.contains('WSH')] \
    .query('Game == 20502') \
    [['Season', 'Result', 'Team', 'Game', 'Period', 'Time', 'Attempt']] \
    .drop_duplicates() \
    [['Season', 'Result', 'Team', 'Attempt']] \
    .assign(Count=1) \
    .groupby(['Season', 'Result', 'Team', 'Attempt'], as_index=False) \
    .count() \
    .pivot_table(index=['Season', 'Result', 'Attempt'], columns='Team', values='Count') \
    .reset_index() \
    .sort_values(['Result', 'Season', 'Attempt'])

# Add TOI
toi = dfs['toi']['WSH'] \
    [['Season', 'WSH1', 'WSH2', 'WSH3', 'WSH4', 'WSH5']] \
    .melt(id_vars='Season', value_name='Attempt') \
    .drop('variable', axis=1) \
    .assign(TOI=1) \
    .groupby(['Season', 'Attempt'], as_index=False).count()
toi.loc[:, 'Attempt'] = toi['Attempt'].apply(lambda x: str(wsh_players[x]) + 'WSH' if x in wsh_players else x)

exits60 = exits60.merge(toi, how='left', on=['Season', 'Attempt'])
exits60.loc[:, 'WSH60'] = exits60.WSH / (exits60.TOI / 3600)

exits60.pivot_table(index=['Attempt', 'Season'], columns='Result', values='WSH60')

	Season	Result	Opp	WSH	TOI	WSH60	Opp60
0	2016	C	369.0	344.0	60976	20.309630	21.785621
1	2017	C	171.0	207.0	37848	19.689283	16.265060
2	2016	C	1.0	NaN	60976	NaN	0.059040
3	2017	C8	NaN	1.0	37848	0.095117	NaN
4	2016	CD	NaN	1.0	60976	0.059040	NaN
5	2016	D	168.0	157.0	60976	9.269221	9.918657
6	2017	D	115.0	111.0	37848	10.558022	10.938491
7	2016	F	402.0	362.0	60976	21.372343	23.733928
8	2017	F	199.0	222.0	37848	21.116043	18.928345
9	2016	I	67.0	34.0	60976	2.007347	3.955655
10	2017	I	32.0	35.0	37848	3.329106	3.043754
11	2016	M	568.0	544.0	60976	32.117554	33.534505
12	2017	M	348.0	267.0	37848	25.396322	33.100824
13	2016	N	1.0	1.0	60976	0.059040	0.059040
14	2017	N	1.0	NaN	37848	NaN	0.095117
15	2016	P	560.0	640.0	60976	37.785358	33.062188
16	2017	P	321.0	353.0	37848	33.576411	30.532657
17	2017	PN	1.0	NaN	37848	NaN	0.095117
18	2017	p	NaN	1.0	37848	0.095117	NaN

	A1	A2	A3	Away Goalies	Away:	Date	Game	Game ID	Goalie	Home Goalies	Home:	PBP Time	Period	SOG?	Season	Strength	Team	Time
0	16NYI	13NYI	NYI	35.0	PHI	2017-01-15	20279	20279	33.0	31	WSH	00:53:00	1.0	NaN	2017	5v5	NYI	19:23:00
1	NYI	NYI	NYI	30.0	PHI	2017-01-15	20279	20279	33.0	70	WSH	02:10:00	1.0	NaN	2017	5v5	NYI	19:16:00
2	CAR	CAR	CAR	NaN	PHI	2017-01-15	20279	20279	1.0	NaN	WSH	02:15:00	1.0	NaN	2017	5v5	CAR	18:31:00
3	66NYI	72NYI	NYI	NaN	PHI	2017-01-15	20279	20279	33.0	NaN	WSH	02:17:00	1.0	NaN	2017	5v5	NYI	18:21:00
4	25NYI	66NYI	NYI	NaN	PHI	2017-01-15	20279	20279	33.0	NaN	WSH	02:18:00	1.0	NaN	2017	5v5	NYI	17:51:00

	Controlled?	Defended by	Dump recovered?	Entry by	Entry type	Fail	Fen total	Game	Goalie touch?	Location	Middle driver	Opp	Opp strength	Period	Season	Team strength	Time
0	55WPG	N	N	55WPG	C	NaN	0.0	20193	R	Home	N	DAL	5.0	1.0	2017	5	19:45:00
1	NaN	44WPG	N	3DAL	F	3DAL	0.0	20193	R	Home	NaN	DAL	5.0	1.0	2017	5	19:33:00
2	NaN	N	N	13WPG	D	NaN	0.0	20193	R	Home	NaN	DAL	5.0	1.0	2017	5	19:12:00
3	NaN	N	N	12DAL	D	NaN	0.0	20193	R	Home	NaN	DAL	5.0	1.0	2017	5	19:02:00
4	NaN	N	16WPG	9WPG	D	NaN	1.0	20193	R	Home	NaN	DAL	5.0	1.0	2017	5	18:56:00

	Period	Time	Attempt	Result	Pressured?	Pass Target	Entry?	Direction	Game	Season
0	1	19:19:00	25EDM	P	N	93EDM	Y	L	20198	2017
1	1	18:33:00	63NJ	M	N	N	N	R	20198	2017
2	1	18:25:00	14NJ	M	N	11NJ	N	L	20198	2017
3	1	18:05:00	25NJ	P	N	11NJ	Y	L	20198	2017
4	1	18:00:00	55EDM	C	N	N	Y	C	20198	2017

		Selected
Season	Game
2016	20022	True
	20071	True
	20860	True
2017	20047	True
2017	20555	True

	Season	Game	Time	WSH1	WSH2	WSH3	WSH4	WSH5
0	2016	20007	1	Matt Niskanen	Karl Alzner	Evgeny Kuznetsov	Alex Ovechkin	T.J. Oshie
1	2016	20007	2	Matt Niskanen	Karl Alzner	Evgeny Kuznetsov	Alex Ovechkin	T.J. Oshie
2	2016	20007	3	Matt Niskanen	Karl Alzner	Evgeny Kuznetsov	Alex Ovechkin	T.J. Oshie
3	2016	20007	4	Matt Niskanen	Karl Alzner	Evgeny Kuznetsov	Alex Ovechkin	T.J. Oshie
4	2016	20007	5	Matt Niskanen	Karl Alzner	Evgeny Kuznetsov	Alex Ovechkin	T.J. Oshie

	A1	A2	A3	Away Goalies	Away:	Date	Game	Game ID	Goalie	Home Goalies	...	VGK1	VGK2	VGK3	VGK4	VGK5	VAN1	VAN2	VAN3	VAN4	VAN5
4143	51TOR	TOR	TOR	35.0	PHI	2017-01-15	20084	20084	70.0	31	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4144	TOR	TOR	TOR	30.0	PHI	2017-01-15	20084	20084	70.0	70	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4145	34TOR	TOR	TOR	NaN	PHI	2017-01-15	20084	20084	70.0	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4146	19WSH	77WSH	74WSH	NaN	PHI	2017-01-15	20084	20084	31.0	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
4147	12TOR	43TOR	TOR	NaN	PHI	2017-01-15	20084	20084	70.0	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	Season	Entry type	Team	Count	TOI	OppCount	Per60	OppPer60
0	2016	C	ANA	607	47929.0	542	45.592439	40.710217
1	2016	C	ARI	483	37469.0	574	46.406363	55.149590
2	2016	C	BOS	665	48305.0	589	49.560087	43.896077
3	2016	C	BUF	560	44498.0	609	45.305407	49.269630
4	2016	C	CAR	830	58917.0	742	50.715413	45.338357

	Controlled?	Defended by	Dump recovered?	Entry by	Entry type	Fail	Game	Goalie touch?	...	Opp strength	Period	Season	Team strength	Time	Time2	_Secs	Team	variable	Player
2850	19WSH	3PIT	N	19WSH	C	NaN	20007	R	...	5	1.0	2016	5	19:28:00	19:28	33	WSH	WSH1	2WSH
7131	19WSH	3PIT	N	19WSH	C	NaN	20007	R	...	5	1.0	2016	5	19:28:00	19:28	33	WSH	WSH2	27WSH
11412	19WSH	3PIT	N	19WSH	C	NaN	20007	R	...	5	1.0	2016	5	19:28:00	19:28	33	WSH	WSH3	65WSH
15693	19WSH	3PIT	N	19WSH	C	NaN	20007	R	...	5	1.0	2016	5	19:28:00	19:28	33	WSH	WSH4	19WSH
19974	19WSH	3PIT	N	19WSH	C	NaN	20007	R	...	5	1.0	2016	5	19:28:00	19:28	33	WSH	WSH5	90WSH

	Season	Entry type	Team	Count	TOI	Per60
0	2016	C	Opp	601	60976	35.482813
1	2016	C	WSH	686	60976	40.501181
2	2016	D	Opp	609	60976	35.955130
3	2016	D	WSH	611	60976	36.073209
4	2016	X	Opp	36	60976	2.125426
5	2016	X	WSH	48	60976	2.833902
6	2017	C	Opp	361	37848	34.337349
7	2017	C	WSH	429	37848	40.805327
8	2017	D	Opp	385	37848	36.620165
9	2017	D	WSH	376	37848	35.764109
10	2017	X	Opp	44	37848	4.185162
11	2017	X	WSH	41	37848	3.899810

	Season	Entry type	Player	Opp	WSH	TOI	WSH60	Opp60
0	2016	C	10WSH	91.0	133.0	10314	46.422339	31.762653
23	2016	D	10WSH	95.0	97.0	10314	33.856894	33.158813
46	2016	X	10WSH	10.0	4.0	10314	1.396161	3.490401
68	2017	C	10WSH	72.0	71.0	6894	37.075718	37.597911
91	2017	D	10WSH	58.0	81.0	6894	42.297650	30.287206
114	2017	X	10WSH	13.0	9.0	6894	4.699739	6.788512
1	2016	C	12WSH	59.0	67.0	5321	45.329825	39.917309
24	2016	D	12WSH	52.0	58.0	5321	39.240744	35.181357
47	2016	X	12WSH	2.0	3.0	5321	2.029694	1.353129
2	2016	C	13WSH	13.0	14.0	1433	35.170970	32.658758
25	2016	D	13WSH	12.0	11.0	1433	27.634334	30.146546
48	2016	X	13WSH	NaN	2.0	1433	5.024424	NaN
69	2017	C	13WSH	101.0	146.0	10468	50.210164	34.734429
92	2017	D	13WSH	117.0	78.0	10468	26.824608	40.236912
115	2017	X	13WSH	13.0	2.0	10468	0.687810	4.470768
3	2016	C	14WSH	147.0	161.0	14716	39.385703	35.960859
26	2016	D	14WSH	143.0	158.0	14716	38.651808	34.982332
49	2016	X	14WSH	5.0	11.0	14716	2.690949	1.223158
4	2016	C	18WSH	1.0	4.0	466	30.901288	7.725322
27	2016	D	18WSH	7.0	6.0	466	46.351931	54.077253
70	2017	C	18WSH	39.0	39.0	4723	29.726869	29.726869
93	2017	D	18WSH	55.0	60.0	4723	45.733644	41.922507
116	2017	X	18WSH	8.0	4.0	4723	3.048910	6.097819
5	2016	C	19WSH	181.0	201.0	17399	41.588597	37.450428
28	2016	D	19WSH	171.0	177.0	17399	36.622794	35.381344
50	2016	X	19WSH	9.0	12.0	17399	2.482901	1.862176
71	2017	C	19WSH	114.0	143.0	11360	45.316901	36.126761
94	2017	D	19WSH	110.0	109.0	11360	34.542254	34.859155
117	2017	X	19WSH	9.0	13.0	11360	4.119718	2.852113
6	2016	C	20WSH	136.0	181.0	15087	43.189501	32.451780
...	...	...	...	...	...	...	...	...
86	2017	C	83WSH	66.0	62.0	7677	29.073857	30.949590
109	2017	D	83WSH	84.0	88.0	7677	41.266120	39.390387
132	2017	X	83WSH	9.0	13.0	7677	6.096131	4.220399
18	2016	C	88WSH	129.0	189.0	14731	46.188310	31.525355
41	2016	D	88WSH	155.0	138.0	14731	33.724798	37.879302
63	2016	X	88WSH	8.0	5.0	14731	1.221913	1.955061
19	2016	C	8WSH	201.0	224.0	18229	44.237204	39.694991
42	2016	D	8WSH	187.0	163.0	18229	32.190466	36.930166
64	2016	X	8WSH	10.0	19.0	18229	3.752263	1.974875
87	2017	C	8WSH	126.0	148.0	11351	46.938596	39.961237
110	2017	D	8WSH	127.0	109.0	11351	34.569641	40.278390
133	2017	X	8WSH	11.0	13.0	11351	4.122985	3.488679
20	2016	C	90WSH	178.0	181.0	16914	38.524299	37.885775
43	2016	D	90WSH	177.0	170.0	16914	36.183044	37.672934
65	2016	X	90WSH	6.0	17.0	16914	3.618304	1.277049
88	2017	C	91WSH	11.0	7.0	1441	17.487856	27.480916
111	2017	D	91WSH	14.0	14.0	1441	34.975711	34.975711
134	2017	X	91WSH	1.0	3.0	1441	7.494795	2.498265
21	2016	C	92WSH	187.0	203.0	17422	41.946964	38.640799
44	2016	D	92WSH	182.0	158.0	17422	32.648376	37.607623
66	2016	X	92WSH	6.0	19.0	17422	3.926070	1.239812
89	2017	C	92WSH	113.0	149.0	11133	48.181083	36.540016
112	2017	D	92WSH	124.0	89.0	11133	28.779305	40.097009
135	2017	X	92WSH	13.0	8.0	11133	2.586904	4.203719
22	2016	C	9WSH	236.0	261.0	22548	41.671102	37.679617
45	2016	D	9WSH	222.0	223.0	22548	35.604045	35.444385
67	2016	X	9WSH	13.0	20.0	22548	3.193188	2.075572
90	2017	C	9WSH	127.0	193.0	15255	45.545723	29.970501
113	2017	D	9WSH	160.0	163.0	15255	38.466077	37.758112
136	2017	X	9WSH	16.0	11.0	15255	2.595870	3.775811

Team	Season	Entry type	Opp	WSH
0	2016	C	0.623960	0.555394
3	2017	C	0.700831	0.561772
1	2016	D	0.224959	0.261866
4	2017	D	0.311688	0.252660
2	2016	X	0.777778	0.500000
5	2017	X	0.568182	0.439024

Team	Season	Entry type	Player	Opp	WSH
0	2016	C	10WSH	0.626374	0.541353
23	2016	D	10WSH	0.263158	0.226804
46	2016	X	10WSH	1.000000	0.000000
68	2017	C	10WSH	0.875000	0.478873
91	2017	D	10WSH	0.293103	0.234568
114	2017	X	10WSH	0.538462	0.555556
1	2016	C	12WSH	0.644068	0.597015
24	2016	D	12WSH	0.288462	0.310345
47	2016	X	12WSH	1.000000	0.333333
2	2016	C	13WSH	0.692308	0.428571
25	2016	D	13WSH	0.166667	0.181818
48	2016	X	13WSH	NaN	0.000000
69	2017	C	13WSH	0.693069	0.541096
92	2017	D	13WSH	0.358974	0.205128
115	2017	X	13WSH	0.461538	1.000000
3	2016	C	14WSH	0.632653	0.596273
26	2016	D	14WSH	0.237762	0.272152
49	2016	X	14WSH	0.200000	0.363636
4	2016	C	18WSH	0.000000	0.500000
27	2016	D	18WSH	0.285714	0.500000
70	2017	C	18WSH	0.538462	0.717949
93	2017	D	18WSH	0.345455	0.183333
116	2017	X	18WSH	0.625000	0.500000
5	2016	C	19WSH	0.674033	0.542289
28	2016	D	19WSH	0.198830	0.242938
50	2016	X	19WSH	0.888889	0.666667
71	2017	C	19WSH	0.596491	0.601399
94	2017	D	19WSH	0.290909	0.311927
117	2017	X	19WSH	0.333333	0.076923
6	2016	C	20WSH	0.625000	0.546961
...	...	...	...	...	...
86	2017	C	83WSH	0.757576	0.516129
109	2017	D	83WSH	0.333333	0.181818
132	2017	X	83WSH	0.777778	0.615385
18	2016	C	88WSH	0.589147	0.507937
41	2016	D	88WSH	0.309677	0.210145
63	2016	X	88WSH	0.500000	0.200000
19	2016	C	8WSH	0.651741	0.549107
42	2016	D	8WSH	0.181818	0.269939
64	2016	X	8WSH	0.800000	0.736842
87	2017	C	8WSH	0.650794	0.513514
110	2017	D	8WSH	0.322835	0.275229
133	2017	X	8WSH	0.545455	0.384615
20	2016	C	90WSH	0.640449	0.563536
43	2016	D	90WSH	0.203390	0.241176
65	2016	X	90WSH	0.666667	0.470588
88	2017	C	91WSH	0.909091	0.285714
111	2017	D	91WSH	0.285714	0.000000
134	2017	X	91WSH	0.000000	1.000000
21	2016	C	92WSH	0.657754	0.591133
44	2016	D	92WSH	0.219780	0.234177
66	2016	X	92WSH	0.666667	0.526316
89	2017	C	92WSH	0.716814	0.483221
112	2017	D	92WSH	0.387097	0.269663
135	2017	X	92WSH	0.461538	0.500000
22	2016	C	9WSH	0.639831	0.528736
45	2016	D	9WSH	0.193694	0.278027
67	2016	X	9WSH	0.923077	0.500000
90	2017	C	9WSH	0.724409	0.575130
113	2017	D	9WSH	0.256250	0.257669
136	2017	X	9WSH	0.687500	0.272727

	Period	Time	Attempt	Result	Pressured?	Pass Target	Entry?	Direction	Game	Season	Time2	_Secs	Team	variable	Player
2406	1	19:55:00	8PIT	I	N	13PIT	NaN	L	20007	2016	19:55	6	Opp	WSH1	2WSH
9068	1	19:55:00	8PIT	I	N	13PIT	NaN	L	20007	2016	19:55	6	Opp	WSH2	27WSH
15730	1	19:55:00	8PIT	I	N	13PIT	NaN	L	20007	2016	19:55	6	Opp	WSH3	92WSH
22392	1	19:55:00	8PIT	I	N	13PIT	NaN	L	20007	2016	19:55	6	Opp	WSH4	8WSH
29054	1	19:55:00	8PIT	I	N	13PIT	NaN	L	20007	2016	19:55	6	Opp	WSH5	77WSH