In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
In [3]:
nit_df = pd.read_csv("../data/nit-participants-2010-2014.csv")
In [4]:
nit_df.head(1)
Out[4]:
In [5]:
nit_df["Conference"].value_counts()[:10]
Out[5]:
In [6]:
nit_df["School"].value_counts()[:10]
Out[6]:
In [7]:
at_large_df = nit_df[nit_df["Berth type"] == "At-Large"]
In [8]:
at_large_df["RPI"].describe()
Out[8]:
In [9]:
at_large_df["RPI"].hist()
pass
The mean RPI for an at-large berth to the NIT is ~68 and 95% of all bids should fall between RPIs of 36 and 100. Let's check out the big outliers.
In [10]:
at_large_df[at_large_df["RPI"] < 36]
Out[10]:
In [11]:
at_large_df[at_large_df["RPI"] > 100]
Out[11]:
In [12]:
at_large_df[at_large_df["Conference"].str.contains("Ivy")]
Out[12]:
Woah! The Ivy League has only had one at-large bid in the past 5 seasons? Yes, but... Best RPIs of non-winner in the other 4 seasons:
Alright, now I feel a little better about Yale or Harvard's chances this season.
In [13]:
at_large_df[at_large_df["Mid-Major"] == "Y"]["RPI"].describe()
Out[13]:
In [14]:
at_large_df[at_large_df["Mid-Major"] == "Y"]["RPI"].hist()
pass
In [15]:
at_large_df[at_large_df["Mid-Major"] == "N"]["RPI"].describe()
Out[15]:
In [16]:
at_large_df[at_large_df["Mid-Major"] == "N"]["RPI"].hist()
pass
In [17]:
berth_composition = nit_df.groupby(["Berth type", "Year"]).count()["Seed"].unstack().T
berth_composition
Out[17]:
In [18]:
round(berth_composition["Automatic"].mean(), 1)
Out[18]:
In [19]:
level_composition = nit_df.groupby(["Mid-Major", "Year"]).count()["Seed"].unstack().T
level_composition
Out[19]:
In [20]:
round(level_composition["Y"].mean(), 1)
Out[20]:
An average of 11 automatic berths have been given out each season. Around half the overall field is typically filled by mid-majors (but remember that includes the 11 bids).
In [ ]: