As one of the biggest tournaments hosted by USAU, the Club Nationals is one of the few tournaments where player statistics are relatively reliably tracked. For each tournament game, each player's aggregate scores, assists, Ds, and turns are counted, although its quite possible the definition of a "D" or a "Turn" could differ across stat-keepers.
Data below was scraped from the USAU website. First we'll set up some imports to be able to load this data.
In [1]:
import usau.reports
import usau.fantasy
In [2]:
from IPython.display import display, HTML
import pandas as pd
pd.options.display.width = 200
pd.options.display.max_colwidth = 200
pd.options.display.max_columns = 200
In [3]:
def display_url_column(df):
"""Helper for formatting url links"""
df.url = df.url.apply(lambda url: "<a href='{base}{url}'>Match Report Link</a>"
.format(base=usau.reports.USAUResults.BASE_URL, url=url))
display(HTML(df.to_html(escape=False)))
Since we should already have the data downloaded as csv files in this repository, we will not need to re-scrape the data. Omit this cell to directly download from the USAU website (may be slow).
In [4]:
# Read data from csv files
usau.reports.club_nats_men_2016.load_from_csvs()
usau.reports.club_nats_mixed_2016.load_from_csvs()
usau.reports.club_nats_women_2016.load_from_csvs()
Out[4]:
Let's take a look at the games for which the sum of the player goals/assists is less than the final score of the game:
In [5]:
missing_tallies = pd.concat([usau.reports.club_nats_men_2016.missing_tallies,
usau.reports.club_nats_mixed_2016.missing_tallies,
usau.reports.club_nats_women_2016.missing_tallies,
])
display_url_column(missing_tallies[["Score", "Gs", "As", "Ds", "Ts", "Team", "Opponent", "url"]])
There are a total of 69 unreported scorers and 86 unreported assisters (although its possible some of those 17 scores were callahans). At a quick glance a lot of these missing results are from less important games, such as the Machine-Madison Club placement game.
In [6]:
(missing_tallies["Score"] - missing_tallies["Gs"]).sum(), (missing_tallies["Score"] - missing_tallies["As"]).sum()
Out[6]:
All games had reported turnovers:
In [7]:
men_matches = usau.reports.club_nats_men_2016.match_results
mixed_matches = usau.reports.club_nats_mixed_2016.match_results
women_matches = usau.reports.club_nats_women_2016.match_results
display_url_column(pd.concat([men_matches[(men_matches.Ts == 0) & (men_matches.Gs > 0)],
mixed_matches[(mixed_matches.Ts == 0) & (mixed_matches.Gs > 0)],
women_matches[(women_matches.Ts == 0) & (women_matches.Gs > 0)]])
[["Score", "Gs", "As", "Ds", "Ts", "Team", "Opponent", "url"]])
In [ ]: