Analyzing NIT Bracketology

Analyzing RPI trends in NIT bracketology for the past five tournaments.


In [1]:
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Load the data:

I collected this data from:

  • Wikipedia (Seed, School, Conference, Record, Wins, Losses, Berth type, Year)
  • Basketball State (RPI)
  • My own classification (Mid-Major)

Note: that Atlantic 10 is not included in the "Mid-Major" field.


In [3]:
nit_df = pd.read_csv("../data/nit-participants-2010-2014.csv")

In [4]:
nit_df.head(1)


Out[4]:
Seed School Conference Record Wins Losses Berth type Year Wins-Losses RPI Mid-Major
0 1 Florida State ACC 19–13 19 13 At-Large 2014 6 51 N

Initial Analysis:

  • What conferences have had the most NIT participants?
  • What is the average RPI of an at-large berth?
  • What was the distribution of RPIs for at-large berths?

In [5]:
nit_df["Conference"].value_counts()[:10]


Out[5]:
SEC            15
ACC            12
Big East       10
Big Ten         9
Pac-12          9
Atlantic 10     9
C-USA           7
MVC             6
Horizon         6
WAC             5
dtype: int64

In [6]:
nit_df["School"].value_counts()[:10]


Out[6]:
Mississippi        3
St. John's         3
Stony Brook        3
Dayton             3
Northwestern       3
Illinois           2
Cleveland State    2
Kent State         2
Arizona State      2
Iowa               2
dtype: int64

In [7]:
at_large_df = nit_df[nit_df["Berth type"] == "At-Large"]

In [8]:
at_large_df["RPI"].describe()


Out[8]:
count    104.000000
mean      67.913462
std       15.753015
min       31.000000
25%       58.000000
50%       68.000000
75%       76.250000
max      121.000000
dtype: float64

In [9]:
at_large_df["RPI"].hist()
pass


The mean RPI for an at-large berth to the NIT is ~68 and 95% of all bids should fall between RPIs of 36 and 100. Let's check out the big outliers.


In [10]:
at_large_df[at_large_df["RPI"] < 36]


Out[10]:
Seed School Conference Record Wins Losses Berth type Year Wins-Losses RPI Mid-Major
35 1 Southern Mississippi C-USA 25–9 25 9 At-Large 2013 16 31 Y
119 6 Harvard Ivy 23–6 23 6 At-Large 2011 17 35 Y

In [11]:
at_large_df[at_large_df["RPI"] > 100]


Out[11]:
Seed School Conference Record Wins Losses Berth type Year Wins-Losses RPI Mid-Major
88 7 Iowa Big Ten 17–16 17 16 At-Large 2012 1 121 N
90 7 Illinois State Missouri Valley 20–13 20 13 At-Large 2012 7 109 Y
152 7 Northwestern Big Ten 20-13 20 13 At-Large 2010 7 112 N

More On The Ivy League:

Wow, a six seed for a team with a 35 RPI? Does the Ivy League always need that impressive a performance to get an at-large? (Remember the league can't get an automatic bid due to how its NCAA berth is determined.)


In [12]:
at_large_df[at_large_df["Conference"].str.contains("Ivy")]


Out[12]:
Seed School Conference Record Wins Losses Berth type Year Wins-Losses RPI Mid-Major
119 6 Harvard Ivy 23–6 23 6 At-Large 2011 17 35 Y

Woah! The Ivy League has only had one at-large bid in the past 5 seasons? Yes, but... Best RPIs of non-winner in the other 4 seasons:

  • 124 - Princeton (13-14)
  • 121 - Princeton (12-13)
  • 86 - Princeton (11-12)
  • 100 - Harvard (09-10)

Alright, now I feel a little better about Yale or Harvard's chances this season.

Mid-Majors Need To Be (Slightly) Better:


In [13]:
at_large_df[at_large_df["Mid-Major"] == "Y"]["RPI"].describe()


Out[13]:
count     28.000000
mean      62.357143
std       17.299570
min       31.000000
25%       50.250000
50%       65.500000
75%       72.000000
max      109.000000
dtype: float64

In [14]:
at_large_df[at_large_df["Mid-Major"] == "Y"]["RPI"].hist()
pass



In [15]:
at_large_df[at_large_df["Mid-Major"] == "N"]["RPI"].describe()


Out[15]:
count     76.000000
mean      69.960526
std       14.740819
min       36.000000
25%       59.000000
50%       69.000000
75%       79.000000
max      121.000000
dtype: float64

In [16]:
at_large_df[at_large_df["Mid-Major"] == "N"]["RPI"].hist()
pass


Bracket Distribution:

  • How many automatic bids are there typically in the NIT?
  • How many "mid-majors" play in the NIT each season?

In [17]:
berth_composition = nit_df.groupby(["Berth type", "Year"]).count()["Seed"].unstack().T
berth_composition


Out[17]:
Berth type At-Large Automatic
Year
2010 24 8
2011 18 14
2012 21 11
2013 22 10
2014 19 13

In [18]:
round(berth_composition["Automatic"].mean(), 1)


Out[18]:
11.2

In [19]:
level_composition = nit_df.groupby(["Mid-Major", "Year"]).count()["Seed"].unstack().T
level_composition


Out[19]:
Mid-Major N Y
Year
2010 16 16
2011 14 18
2012 17 15
2013 16 16
2014 14 18

In [20]:
round(level_composition["Y"].mean(), 1)


Out[20]:
16.6

An average of 11 automatic berths have been given out each season. Around half the overall field is typically filled by mid-majors (but remember that includes the 11 bids).


In [ ]: