The goal of this project is to analyze the correlation between a basketball player’s time in college and his performance at a professional level. Prior to a collective bargaining agreement in 2005, high schoolers, granted they were talented enough, could be drafted into the NBA without having been enrolled in college or without being a year removed from high school graduation. Thereon, there has been a recent trend of players who play one year in college and then transition into the NBA. This project will assess whether preparing for a longer unpaid period leads to further success in a player's professional career.
This project compares two data sources: all the NBA drafts begining in 1988 onwards and All-NBA team selections dating back to 1988. It will only take into account first-round draft selections and will count all All-NBA selections even if a player has multiple ones.
In [35]:
import sys
import pandas as pd
import matplotlib.pyplot as plt
import datetime as dt
import numpy as np
import html5lib
from bs4 import BeautifulSoup
import seaborn.apionly as sns
from pandas.io import data, wb
from plotly.offline import iplot, iplot_mpl
import plotly.graph_objs as go
import plotly
import cufflinks as cf
cf.set_config_file(offline=True, offline_show_link=False)
%matplotlib inline
plotly.offline.init_notebook_mode()
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)
print('Plotly version: ', plotly.__version__)
print('Today: ', dt.date.today())
Players selected into All-NBA teams:
By using the read_html function of html5lib, we are able to read the data from Basketball-Reference.com exactly as it is displayed on the website. We then convert the data into a pd.DataFrame and adjust the columns and shape up the variables.
In [4]:
# Code reading allnba team selections since the 1988-1989 season.
url = 'http://www.basketball-reference.com/awards/all_league.html'
allnba = pd.read_html(url)
allnba = allnba[0]
allnba.columns = ['Year','League','All-NBA Team','Player1','Player2','Player3','Player4','Player5']
allnba = allnba.drop('League',1)
allnba = allnba.head(106)
allnba
Out[4]:
In [9]:
# Combine all 'Player' columns into one and remove extra information at the end of the names such as C,F, and G
allnbaM = pd.melt(allnba, id_vars=['All-NBA Team'],value_vars=['Player1','Player2','Player3','Player4','Player5'],
value_name='Player')
allnbaM['Player'] = allnbaM['Player'].str.replace("(C|F|G)\s*$", "")
allnbaM['Player'] = allnbaM['Player'].str.strip()
allnbaM = allnbaM.drop('variable',1)
allnbaM.tail()
Out[9]:
Players Drafted:
Having read the All-NBA teams from Basketball-Reference.com, we do the same with the draft data from RealGM.com. The data is on multiple pages so we develop a loop that will read every page based on its url's distinguishable ending. Afterward we us the pd.concat function to convert the lists returned by the loop into a DF.
In [38]:
# Code for reading players drafted in every first round of the nba draft since 1989
draft = []
for number in range (27):
Year = str(2015 - (number))
url1 = 'http://basketball.realgm.com/nba/draft/past_drafts/'
thisdraft = pd.read_html(url1 + Year)[0]
print('Number:', number)
print('Type: ', type(thisdraft))
print(thisdraft.head())
draft.append(thisdraft)
In [10]:
# Code for placing all players drafted in the first round into a pandas dataframe
alldrafts = pd.concat(draft)
alldrafts = alldrafts.drop('Team',1)
alldrafts.drop(alldrafts.columns[[2]], axis=1, inplace=True)
alldrafts.loc[alldrafts.Class.str.contains("DOB"), "Class"] = "N/A"
alldrafts['Class'] = alldrafts['Class'].str.replace("*", "")
alldrafts['Class'] = alldrafts['Class'].str.strip()
alldrafts.tail(10)
Out[10]:
In [30]:
# Get means of draft positions based on college class
print('Avergage draft position for freshmen in the first round is:',
alldrafts[alldrafts['Class'].str.contains('Fr')]['Pick'].mean())
print('Avergage draft position for Sophomores in the first round is:',
alldrafts[alldrafts['Class'].str.contains('So')]['Pick'].mean())
print('Avergage draft position for Juniors in the first round is:',
alldrafts[alldrafts['Class'].str.contains('Jr')]['Pick'].mean())
print('Avergage draft position for Seniors in the first round is:',
alldrafts[alldrafts['Class'].str.contains('Sr')]['Pick'].mean())
print('Avergage draft position for those who didnt attend college in the first round is:',
alldrafts[alldrafts['Class'].str.contains('N/A')]['Pick'].mean())
In [33]:
# Gives us the distribution of selections by class ordered by pick number
# But first we must reorder the classes by Fr, So, Jr, Sr, N/A
Classes = ['Fr', 'So', 'Jr', 'Sr', 'N/A']
mapping = {Class: i for i, Class in enumerate(Classes)}
key = alldrafts['Class'].map(mapping)
alldrafts_in_order = alldrafts.iloc[key.argsort()]
ax = sns.swarmplot(x="Class", y="Pick", data=alldrafts_in_order)
ax.set_title('Number of Draft Selections Per Class by Pick Number')
ax.set_ylim(0)
Out[33]:
In [34]:
#code for number of players drafted in each class => Seniors are most prevalent followed by juniors
clAss = ['Fr','So','Jr','Sr','N/A']
alldraftsC = alldrafts.copy()
grades = []
for x in clAss:
grades.append(x)
alldraftsC[x] = alldraftsC['Class'].str.contains(x)*1
classes = alldraftsC[grades]
classes_counts = classes.sum()
print(classes_counts)
fig, ax = plt.subplots()
classes_counts.plot(ax=ax, legend=False, kind = 'bar', color=['blue','green','red','turquoise','purple'])
ax.set_xlabel("Player's Class")
ax.set_ylabel('Number of Players Drafted')
ax.set_title('Players Drafted by College Class')
Out[34]:
Merge both data tables:
In [25]:
# Merge on Player (Add an 'All-NBA Team' column)
draftallnba = pd.merge(alldrafts, allnbaM,
how='left',
on='Player')
draftallnba.tail(30)
Out[25]:
In [26]:
# Count the number of All-NBA selections by time spent in college
counts = draftallnba['All-NBA Team'].groupby([draftallnba['Class'], draftallnba['All-NBA Team']]).count()
counts = pd.DataFrame(counts)
counts.columns = ['Number of Selections']
counts = counts.unstack(level=0)
counts = counts['Number of Selections']
counts = counts[['Fr', 'So', 'Jr', 'Sr', 'N/A']]
countsT = counts.T
countsT['Total'] = [24,51,45,56,87]
counts = countsT.transpose()
counts
Out[26]:
In [28]:
# Plot All-NBA selections by time spent in college
fig, ax = plt.subplots()
counts.plot(ax=ax, legend=True, kind = 'barh')
ax.set_xlabel('Number of All-NBA Selections')
ax.set_title('All-NBA Selections by College Class')
Out[28]:
Ultimately, while the results show us that freshmen are drafted with higher priority picks than the rest, closely followed by sophomores, seniors are the most commonly drafted players in the first round. Nevertheless, it is those who decided to jump straight to the NBA from high school or who played a year or more abroad before joining the NBA that have garnered the highest amount of All-NBA honors.
In [ ]: