As one of the biggest tournaments hosted by USAU, the D-I College Nationals is one of the few tournaments where player statistics are relatively reliably tracked. For each tournament game, each player's aggregate scores, assists, Ds, and turns are counted, although its quite possible the definition of a "D" or a "Turn" could differ across stat-keepers.
Data below was scraped from the USAU website. First we'll set up some imports to be able to load this data.
In [1]:
import usau.reports
import usau.fantasy
In [2]:
from IPython.display import display, HTML
import pandas as pd
pd.options.display.width = 200
pd.options.display.max_colwidth = 200
pd.options.display.max_columns = 200
In [3]:
def display_url_column(df):
"""Helper for formatting url links"""
df.url = df.url.apply(lambda url: "<a href='{base}{url}'>Match Report Link</a>"
.format(base=usau.reports.USAUResults.BASE_URL, url=url))
display(HTML(df.to_html(escape=False)))
Since we should already have the data downloaded as csv files in this repository, we will not need to re-scrape the data. Omit this cell to directly download from the USAU website (may be slow).
In [4]:
# Read data from csv files
usau.reports.d1_college_nats_men_2016.load_from_csvs()
usau.reports.d1_college_nats_women_2016.load_from_csvs()
Out[4]:
Let's take a look at the games for which the sum of the player goals/assists is less than the final score of the game:
In [5]:
display_url_column(pd.concat([usau.reports.d1_college_nats_men_2016.missing_tallies,
usau.reports.d1_college_nats_women_2016.missing_tallies])
[["Score", "Gs", "As", "Ds", "Ts", "Team", "Opponent", "url"]])
All in all, not too bad! A few of the women's consolation games are missing player statistics, and there are several other games for which a couple of goals or assists were missed. For missing assists, it is technically possible that there were one or more callahans scored in those game, but obviously that's not the case with all ~14 missing assists. Surprisingly, there were 10 more assists recorded by the statkeepers than goals; I would have guessed that assists would be harder to keep track.
Turns and Ds are the other stats available. In past tournaments these haven't been tracked very well, but actually there was only one game where no Turns or Ds were recorded:
In [6]:
men_matches = usau.reports.d1_college_nats_men_2016.match_results
women_matches = usau.reports.d1_college_nats_women_2016.match_results
display_url_column(pd.concat([men_matches[(men_matches.Ts == 0) & (men_matches.Gs > 0)],
women_matches[(women_matches.Ts == 0) & (women_matches.Gs > 0)]])
[["Score", "Gs", "As", "Ds", "Ts", "Team", "Opponent", "url"]])
This implies that there was a pretty good effort made to keep up with counting turns and Ds. By contrast, see how many teams did not keep track of Ds and turns last year (2015)!
In [7]:
# Read last year's data from csv files
usau.reports.d1_college_nats_men_2015.load_from_csvs()
usau.reports.d1_college_nats_women_2015.load_from_csvs()
display_url_column(pd.concat([usau.reports.d1_college_nats_men_2015.missing_tallies,
usau.reports.d1_college_nats_women_2015.missing_tallies])
[["Score", "Gs", "As", "Ds", "Ts", "Team", "Opponent", "url"]])
In [ ]: