Stats Quality for 2016 College Nationals

As one of the biggest tournaments hosted by USAU, the Club Nationals is one of the few tournaments where player statistics are relatively reliably tracked. For each tournament game, each player's aggregate scores, assists, Ds, and turns are counted, although its quite possible the definition of a "D" or a "Turn" could differ across stat-keepers.

Data below was scraped from the USAU website. First we'll set up some imports to be able to load this data.


In [1]:
import usau.reports
import usau.fantasy

In [2]:
from IPython.display import display, HTML
import pandas as pd
pd.options.display.width = 200
pd.options.display.max_colwidth = 200
pd.options.display.max_columns = 200

In [3]:
def display_url_column(df):
  """Helper for formatting url links"""
  df.url = df.url.apply(lambda url: "<a href='{base}{url}'>Match Report Link</a>"
                        .format(base=usau.reports.USAUResults.BASE_URL, url=url))
  display(HTML(df.to_html(escape=False)))

Since we should already have the data downloaded as csv files in this repository, we will not need to re-scrape the data. Omit this cell to directly download from the USAU website (may be slow).


In [4]:
# Read data from csv files
usau.reports.club_nats_men_2016.load_from_csvs()
usau.reports.club_nats_mixed_2016.load_from_csvs()
usau.reports.club_nats_women_2016.load_from_csvs()


Out[4]:
<usau.reports.USAUResults at 0x3d231d0>

Let's take a look at the games for which the sum of the player goals/assists is less than the final score of the game:


In [5]:
missing_tallies = pd.concat([usau.reports.club_nats_men_2016.missing_tallies,
                             usau.reports.club_nats_mixed_2016.missing_tallies,
                             usau.reports.club_nats_women_2016.missing_tallies,
                            ])
display_url_column(missing_tallies[["Score", "Gs", "As", "Ds", "Ts", "Team", "Opponent", "url"]])


/spare/local/azhu/venv_el6-usau-stonelib/lib/python2.7/site-packages/pandas/core/generic.py:2698: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  self[name] = value
Score Gs As Ds Ts Team Opponent url
0 14 13 12 6 13 Ironside Prairie Fire Match Report Link
4 14 14 13 8 17 Ironside Ring of Fire Match Report Link
12 15 14 14 4 9 Revolver Patrol Match Report Link
16 15 15 14 12 21 Revolver Doublewide Match Report Link
20 16 15 15 2 11 Revolver Sockeye Match Report Link
26 15 14 14 1 5 Madison Club Dig Match Report Link
30 10 8 7 10 26 Madison Club HIGH FIVE Match Report Link
31 8 7 8 11 32 HIGH FIVE Madison Club Match Report Link
33 11 10 11 8 24 Madison Club Truck Stop Match Report Link
38 14 14 13 4 20 Chicago Machine Furious George Match Report Link
40 15 14 15 6 18 Johnny Bravo Furious George Match Report Link
41 10 10 9 9 23 Furious George Johnny Bravo Match Report Link
43 13 13 11 3 16 H.I.P Chicago Machine Match Report Link
53 13 12 13 4 10 Revolver Ring of Fire Match Report Link
61 15 14 15 5 9 Revolver Furious George Match Report Link
76 11 11 9 3 12 Furious George Patrol Match Report Link
80 12 12 11 6 18 Madison Club Patrol Match Report Link
83 7 6 6 7 30 Dig H.I.P Match Report Link
84 14 13 14 9 16 HIGH FIVE Doublewide Match Report Link
94 7 0 0 0 0 Madison Club Chicago Machine Match Report Link
95 6 0 0 0 0 Chicago Machine Madison Club Match Report Link
96 14 11 11 7 16 Madison Club Dig Match Report Link
104 11 10 11 8 19 HIGH FIVE Prairie Fire Match Report Link
0 11 10 11 8 24 AMP Ambiguous Grey Match Report Link
1 9 7 8 9 25 Ambiguous Grey AMP Match Report Link
6 11 11 10 11 32 Metro North Ambiguous Grey Match Report Link
10 14 13 11 10 39 Ambiguous Grey Blackbird Match Report Link
14 12 12 11 14 45 Alloy Public Enemy Match Report Link
20 15 14 15 7 18 Slow White Alloy Match Report Link
28 15 15 14 23 32 Drag'n Thrust No Touching! Match Report Link
30 14 14 13 11 30 Steamboat Love Tractor Match Report Link
32 14 13 14 13 33 Drag'n Thrust Steamboat Match Report Link
34 9 9 8 4 19 Love Tractor No Touching! Match Report Link
35 15 15 12 4 22 No Touching! Love Tractor Match Report Link
37 15 14 14 7 7 shame. Seattle Mixtape Match Report Link
43 11 11 10 6 22 shame. Mischief Match Report Link
50 14 14 13 9 14 Metro North Mischief Match Report Link
54 15 15 14 8 19 Metro North NOISE Match Report Link
69 13 12 13 7 24 AMP shame. Match Report Link
73 10 9 10 7 18 NOISE AMP Match Report Link
76 13 13 12 14 10 NOISE Blackbird Match Report Link
77 15 14 14 9 5 Blackbird NOISE Match Report Link
78 15 15 14 6 9 Blackbird No Touching! Match Report Link
84 12 12 11 7 16 Ambiguous Grey Seattle Mixtape Match Report Link
87 13 12 13 6 16 Ambiguous Grey Public Enemy Match Report Link
89 11 10 10 14 31 Love Tractor Public Enemy Match Report Link
96 10 10 9 12 26 shame. Alloy Match Report Link
97 15 15 14 9 21 Alloy shame. Match Report Link
102 14 14 13 11 21 Alloy G-Unit Match Report Link
105 11 10 11 6 21 Blackbird Love Tractor Match Report Link
1 3 3 2 8 23 Iris Seattle Riot Match Report Link
10 6 5 7 8 39 Iris Heist Match Report Link
11 10 9 8 19 44 Heist Iris Match Report Link
12 14 13 14 19 49 Brute Squad Showdown Match Report Link
18 12 10 11 7 12 Wildfire Showdown Match Report Link
19 13 12 11 6 7 Showdown Wildfire Match Report Link
22 13 11 11 10 23 Showdown Rival Match Report Link
24 12 11 12 10 34 Molly Brown Phoenix Match Report Link
28 15 15 14 4 22 Molly Brown Green Means Go Match Report Link
33 10 10 8 10 44 Traffic Molly Brown Match Report Link
34 15 12 14 5 13 Phoenix Green Means Go Match Report Link
44 14 13 14 14 35 Fury Scandal Match Report Link
56 7 5 6 2 16 Showdown Fury Match Report Link
63 15 14 15 11 24 Traffic Rival Match Report Link
64 14 13 12 10 24 Showdown Phoenix Match Report Link
65 10 10 8 10 30 Phoenix Showdown Match Report Link
69 15 14 15 7 25 Nightlock Schwa Match Report Link
73 8 5 8 8 11 Showdown Scandal Match Report Link
75 12 12 11 11 33 Traffic Nightlock Match Report Link
76 11 10 7 3 11 Showdown Ozone Match Report Link
78 7 6 7 6 25 Phoenix Ozone Match Report Link
79 13 13 12 8 22 Ozone Phoenix Match Report Link
80 14 14 13 16 35 Phoenix Iris Match Report Link
89 10 9 10 15 27 Green Means Go Rival Match Report Link
94 15 15 14 18 20 Iris Wildfire Match Report Link
95 10 10 9 10 23 Wildfire Iris Match Report Link
99 10 10 9 14 18 Wildfire Green Means Go Match Report Link
101 13 13 12 13 14 Green Means Go Schwa Match Report Link
103 11 10 10 21 32 Green Means Go Wildfire Match Report Link
105 9 8 9 9 26 Iris Ozone Match Report Link

There are a total of 69 unreported scorers and 86 unreported assisters (although its possible some of those 17 scores were callahans). At a quick glance a lot of these missing results are from less important games, such as the Machine-Madison Club placement game.


In [6]:
(missing_tallies["Score"] - missing_tallies["Gs"]).sum(), (missing_tallies["Score"] - missing_tallies["As"]).sum()


Out[6]:
(69, 86)

All games had reported turnovers:


In [7]:
men_matches = usau.reports.club_nats_men_2016.match_results
mixed_matches = usau.reports.club_nats_mixed_2016.match_results
women_matches = usau.reports.club_nats_women_2016.match_results
display_url_column(pd.concat([men_matches[(men_matches.Ts == 0) & (men_matches.Gs > 0)],
                              mixed_matches[(mixed_matches.Ts == 0) & (mixed_matches.Gs > 0)],
                              women_matches[(women_matches.Ts == 0) & (women_matches.Gs > 0)]])
                   [["Score", "Gs", "As", "Ds", "Ts", "Team", "Opponent", "url"]])


Score Gs As Ds Ts Team Opponent url

In [ ]: