Analyzing NIT Bracketology

Analyzing RPI trends in NIT bracketology for the past five tournaments.



In [1]:

    
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Load the data:

I collected this data from:

Wikipedia (Seed, School, Conference, Record, Wins, Losses, Berth type, Year)
Basketball State (RPI)
My own classification (Mid-Major)

Note: that Atlantic 10 is not included in the "Mid-Major" field.



In [3]:

    
nit_df = pd.read_csv("../data/nit-participants-2010-2014.csv")



In [4]:

    
nit_df.head(1)









    Out[4]:






  
    
      
      Seed
      School
      Conference
      Record
      Wins
      Losses
      Berth type
      Year
      Wins-Losses
      RPI
      Mid-Major
    
  
  
    
      0
       1
       Florida State
       ACC
       19–13
       19
       13
       At-Large
       2014
       6
       51
       N

Initial Analysis:

What conferences have had the most NIT participants?
What is the average RPI of an at-large berth?
What was the distribution of RPIs for at-large berths?



In [5]:

    
nit_df["Conference"].value_counts()[:10]









    Out[5]:





SEC            15
ACC            12
Big East       10
Big Ten         9
Pac-12          9
Atlantic 10     9
C-USA           7
MVC             6
Horizon         6
WAC             5
dtype: int64



In [6]:

    
nit_df["School"].value_counts()[:10]









    Out[6]:





Mississippi        3
St. John's         3
Stony Brook        3
Dayton             3
Northwestern       3
Illinois           2
Cleveland State    2
Kent State         2
Arizona State      2
Iowa               2
dtype: int64



In [7]:

    
at_large_df = nit_df[nit_df["Berth type"] == "At-Large"]



In [8]:

    
at_large_df["RPI"].describe()









    Out[8]:





count    104.000000
mean      67.913462
std       15.753015
min       31.000000
25%       58.000000
50%       68.000000
75%       76.250000
max      121.000000
dtype: float64



In [9]:

    
at_large_df["RPI"].hist()
pass

The mean RPI for an at-large berth to the NIT is ~68 and 95% of all bids should fall between RPIs of 36 and 100. Let's check out the big outliers.



In [10]:

    
at_large_df[at_large_df["RPI"] < 36]









    Out[10]:






  
    
      
      Seed
      School
      Conference
      Record
      Wins
      Losses
      Berth type
      Year
      Wins-Losses
      RPI
      Mid-Major
    
  
  
    
      35 
       1
       Southern Mississippi
       C-USA
       25–9
       25
       9
       At-Large
       2013
       16
       31
       Y
    
    
      119
       6
                    Harvard
         Ivy
       23–6
       23
       6
       At-Large
       2011
       17
       35
       Y



In [11]:

    
at_large_df[at_large_df["RPI"] > 100]









    Out[11]:






  
    
      
      Seed
      School
      Conference
      Record
      Wins
      Losses
      Berth type
      Year
      Wins-Losses
      RPI
      Mid-Major
    
  
  
    
      88 
       7
                 Iowa
               Big Ten
       17–16
       17
       16
       At-Large
       2012
       1
       121
       N
    
    
      90 
       7
       Illinois State
       Missouri Valley
       20–13
       20
       13
       At-Large
       2012
       7
       109
       Y
    
    
      152
       7
         Northwestern
               Big Ten
       20-13
       20
       13
       At-Large
       2010
       7
       112
       N

More On The Ivy League:

Wow, a six seed for a team with a 35 RPI? Does the Ivy League always need that impressive a performance to get an at-large? (Remember the league can't get an automatic bid due to how its NCAA berth is determined.)



In [12]:

    
at_large_df[at_large_df["Conference"].str.contains("Ivy")]









    Out[12]:






  
    
      
      Seed
      School
      Conference
      Record
      Wins
      Losses
      Berth type
      Year
      Wins-Losses
      RPI
      Mid-Major
    
  
  
    
      119
       6
       Harvard
       Ivy
       23–6
       23
       6
       At-Large
       2011
       17
       35
       Y

Woah! The Ivy League has only had one at-large bid in the past 5 seasons? Yes, but... Best RPIs of non-winner in the other 4 seasons:

124 - Princeton (13-14)
121 - Princeton (12-13)
86 - Princeton (11-12)
100 - Harvard (09-10)

Alright, now I feel a little better about Yale or Harvard's chances this season.

Mid-Majors Need To Be (Slightly) Better:



In [13]:

    
at_large_df[at_large_df["Mid-Major"] == "Y"]["RPI"].describe()









    Out[13]:





count     28.000000
mean      62.357143
std       17.299570
min       31.000000
25%       50.250000
50%       65.500000
75%       72.000000
max      109.000000
dtype: float64



In [14]:

    
at_large_df[at_large_df["Mid-Major"] == "Y"]["RPI"].hist()
pass



In [15]:

    
at_large_df[at_large_df["Mid-Major"] == "N"]["RPI"].describe()









    Out[15]:





count     76.000000
mean      69.960526
std       14.740819
min       36.000000
25%       59.000000
50%       69.000000
75%       79.000000
max      121.000000
dtype: float64



In [16]:

    
at_large_df[at_large_df["Mid-Major"] == "N"]["RPI"].hist()
pass

Bracket Distribution:

How many automatic bids are there typically in the NIT?
How many "mid-majors" play in the NIT each season?



In [17]:

    
berth_composition = nit_df.groupby(["Berth type", "Year"]).count()["Seed"].unstack().T
berth_composition



In [18]:

    
round(berth_composition["Automatic"].mean(), 1)









    Out[18]:





11.2



In [19]:

    
level_composition = nit_df.groupby(["Mid-Major", "Year"]).count()["Seed"].unstack().T
level_composition



In [20]:

    
round(level_composition["Y"].mean(), 1)









    Out[20]:





16.6

An average of 11 automatic berths have been given out each season. Around half the overall field is typically filled by mid-majors (but remember that includes the 11 bids).



In [ ]:

	Seed	School	Conference	Record	Wins	Losses	Berth type	Year	Wins-Losses	RPI	Mid-Major
35	1	Southern Mississippi	C-USA	25–9	25	9	At-Large	2013	16	31	Y
119	6	Harvard	Ivy	23–6	23	6	At-Large	2011	17	35	Y

	Seed	School	Conference	Record	Wins	Losses	Berth type	Year	Wins-Losses	RPI	Mid-Major
88	7	Iowa	Big Ten	17–16	17	16	At-Large	2012	1	121	N
90	7	Illinois State	Missouri Valley	20–13	20	13	At-Large	2012	7	109	Y
152	7	Northwestern	Big Ten	20-13	20	13	At-Large	2010	7	112	N