Aditya Garg, Sachin Vishwanathan, Sami Siddiqui
Data Bootcamp
Fall 2016
Our project will attempt to solve the age-old debate of whether Lebron James or Michael Jordan was the superior basketball player. We attempt to look at both traditional and advanced statistics to take our crack at answering this question. We look at the players’ respective Player Efficiency Ratings as well as their Win Shares, both offensive and defensive (and total). We will go into further detail below about these advanced statistics as well as our efforts to account for the fact that Jordan and James played in different eras, and thus played against very different competition. Based on our analysis, it seems that Michael Jordan was the better player though the HHI and our own qualitative research suggests that there may be other factors also at play affecting our results.
Michael Jordan took basketball to heights it had never previously reached, especially on the global scale. Jordan was famously cut from his high school’s varsity basketball team his sophomore year but used the snub as motivation to go on to become one of the greatest basketball players of all time. He played college basketball at UNC-Chapel Hill before being drafted in 1984 as the third overall pick by the Chicago Bulls. Jordan went on to become a 14-time all star, 5-time NBA regular season MVP and 6-time NBA Champion and Finals MVP. Despite this greatness, his success did not come without struggles; he missed almost his entire second season due to a broken foot, he retired (at the time) and played minor league baseball during parts of the 1993-1994 and 1994-1995 seasons. He again “retired” after the 1998 NBA Finals only to come back for the 2001-2002 season and officially retired for good after 2002-2003. He is often regarded as the Greatest of All Time because of his dominance on the court as well as in off-court ventures, but we needn’t get into that for the purpose of our project.
Lebron James was drafted straight out of St. Vincent-St. Mary’s High School in 2003 as the first overall draft pick by the Cleveland Cavaliers, his hometown team. Lebron went on to lead the Cavaliers to the NBA Finals in 2007 almost singlehandedly, where they fell to the San Antonio Spurs in four games. James failed to win a championship with the Cavaliers and left for the Miami Heat in 2010, where he spent four years. He led the Heat to the NBA Finals each year he spent in Miami, winning two championships and two Finals MVP awards along the way. He returned to Cleveland and led the Cavaliers to another NBA Finals appearance before falling to the Golden State Warriors. This past season he was able to lead the Cavaliers to their first NBA Title, capturing another NBA Finals MVP award along the way. Though Lebron still has a lot of his career remaining, he has most certainly reached his peak, accumulating an impressive twelve (and still counting) all star appearances, four regular season MVPs, and three Finals MVPs. Lebron has been widely regarded as the best basketball player since Jordan, but our project will aim to answer the question of whether he was actually better than Jordan as a lot of his fans claim.
The data we looked at covered 13 seasons for both players. For Lebron James we looked at the time period from 2003-2016 while for Michael Jordan we looked at the period from 1984-2003. It is important to note that we excluded the following years for Jordan: 1985-86, 1993-94, 1994-95, 1998-2001 because Jordan was either injured or did not play that season (was retired). To read the data, we first saved it as a CSV file for both players and then imported it into Jupyter. The specific code we used to call this information included:
In [2]:
import sys # system module
import pandas as pd # data package
import matplotlib.pyplot as plt # graphics module
import datetime as dt # date and time module
import numpy as np # foundation for Pandas
import seaborn.apionly as sns # fancy matplotlib graphics (no styling)
from pandas_datareader import wb, data as web # worldbank data
# plotly imports
from plotly.offline import iplot, iplot_mpl # plotting functions
import plotly.graph_objs as go
import plotly # just to print version and init notebook
import cufflinks as cf # gives us df.iplot that feels like df.plot
cf.set_config_file(offline=True, offline_show_link=False)
# these lines make our graphics show up in the notebook
%matplotlib inline
plotly.offline.init_notebook_mode(connected=True)
# check versions
print('Python version:', sys.version)
print('Pandas version: ', pd.__version__)
print('Plotly version: ', plotly.__version__)
print('Today: ', dt.date.today())
path = 'data/Jordan_Per_Game.xlsx'
df_J_I = pd.read_excel(path)
path_1 = 'data/Jordan_Advanced.xlsx'
df_J_A = pd.read_excel(path_1)
path_2 = 'data/Lebron_Per_Game.xlsx'
df_L_I = pd.read_excel(path_2)
path_3 = 'data/Lebron_Advanced_Stats.xlsx'
df_L_A = pd.read_excel(path_3)
The data shown below are Michael Jordan's "traditional" career statistics for the years we are looking at for the purposes of this project. This excludes much of the advanced analytics we more seriously weigh for our project, but we include them because all advanced statistics are based off these more basic stats to some degree. Looking further below, we start to delve deeper by looking at the more advanced statistics as well, such as Win Shares and VORP. We provide the same data for James looking at his thirteen seasons in the league so far.
In [3]:
df_J_I #Jordan's traditional statistics
Out[3]:
In [4]:
df_J_A #Jordan's advanced statistics
Out[4]:
In [5]:
df_L_I #James traditional statistics
Out[5]:
In [6]:
df_L_A #James advanced statistics
Out[6]:
The Performance Efficieny Rating (PER) aims to provide a comprehensive measure of every possible contribution a player can make on a basketball court (shown in the formula below) and simplify the statistic into one number.
Over the years, many fans have attempted to create an impartial formula of statistics to compare various legends from different eras, but the most thorough may be ESPN’s greatest solution of creating an online competition for sports fans: Fantasy Basketball. The current formula used to score each individual per game incorporates various personal statistics such as points scored, blocks, steals, assists, rebounds, turnovers, field goals, and free throws. Using historical data accumulated on the links above, we can create a similar equation to make various comparisons between full careers, performance on different teams throughout careers, and solely starting seasons, among others.
Formula: Score = Points + Blocks + Steals + Assists + Rebounds + Turnovers + Field Goals Made - Field Goals Attempted + Free Throws Made - Free Throws Attempted
In [7]:
#Calculating PER using above formula for Michael Jordan
df_J_I['PER'] = (df_J_I['PTS'] + df_J_I['BLK'] + df_J_I['STL'] + df_J_I['AST'] + df_J_I['TRB'] + df_J_I['FG']
+ df_J_I['FT'] - df_J_I['TOV'] - df_J_I['FGA'] - df_J_I['FTA'])
df_J_I
Out[7]:
In [8]:
#Calculating PER using above formula for Lebron James
df_L_I['PER'] = (df_L_I['PTS'] + df_L_I['BLK'] + df_L_I['STL'] + df_L_I['AST'] + df_L_I['TRB']
+ df_L_I['FG'] + df_L_I['FT'] - df_L_I['TOV'] - df_L_I['FGA'] - df_L_I['FTA'])
df_L_I
Out[8]:
In [9]:
#finding average PER for Lebron
s = 0
c = 0
for i in range(13):
s = s + df_L_I['PER'][i]
c = c + 1
Lebron_avg_PER = s / c
#finding average PER for Jordan
s = 0
c = 0
for i in range(13):
s = s + df_J_I['PER'][i]
c = c + 1
Jordan_avg_PER = s / c
print("Jordan's Average PER is: ", round(Jordan_avg_PER,2))
print("Lebron's Average PER is: ", round(Lebron_avg_PER,2))
avg_PER = pd.DataFrame({'Name':['Lebron James', 'Michael Jordan'], 'Average PER':[Lebron_avg_PER, Jordan_avg_PER]})
avg_PER = avg_PER.set_index(['Name'])
fig, ax = plt.subplots()
plt.style.use('bmh')
avg_PER.plot(ax=ax, legend=False, kind = 'bar',color = ['blue','purple'], alpha = 0.65,rot = 0)
ax.set_xlabel("Players", fontsize = 14)
ax.set_ylabel('Average PER',fontsize = 14)
ax.set_title('Average PERs for Players', fontsize = 14)
ax.set_ylim(0,40)
Out[9]:
As the above information shows, Michael Jordan had the higher PER, indicating a higher "efficieny" and better performance on average throughout the thirteen seasons.
In [10]:
#List of Lebron PER
list_Lebron_PER = []
for i in range(13):
list_Lebron_PER.append(df_L_I['PER'][i])
#List of Jordan PER
list_Jordan_PER = []
for i in range(13):
list_Jordan_PER.append(df_J_I['PER'][i])
#x-axis values
list_Seasons = [Season for Season in range(1, 14)]
plt.plot(list_Seasons, list_Lebron_PER, label = "Lebron James")
plt.plot(list_Seasons, list_Jordan_PER, label = "Michael Jordan")
plt.xlabel("Season")
plt.ylabel("Performance Efficiency Rating")
plt.title("Comparing Performance Efficiency Ratings across Seasons")
plt.legend()
plt.show()
Even though Michael Jordan had the higher average PER throughout his career, Lebron James has had the more stable rating throughout his thirteen measured seasons. This can be attributed to a variety of factors, most notably the fact that we are excluding several of Jordan's seasons. Since Lebron was drafted out of high school, he began his NBA career four years earlier in life than Jordan. Additionally, Jordan "retiring" twice before actually ending his career resulted in hiatuses from the NBA for years at a time, meaning he was actually significantly older at the end of the period in his career we are looking at than Lebron currently is. Age, especially in the NBA, is an important factor to consider when analyzing individual statistics, particular one that is as holistic as the Player Efficiency Rating.
In [11]:
#Finding Lebron's average OWS
s = 0
c = 0
for i in range(13):
s = s + df_L_A['OWS'][i]
c = c + 1
Lebron_avg_OWS = s / c
#Finding Jordan's average OWS
d = 0
e = 0
for i in range(13):
d = d + df_J_A['OWS'][i]
e = e + 1
Jordan_avg_OWS = d / e
print("Jordan's Average OWS is: ", round(Jordan_avg_OWS, 2))
print("Lebron's Average OWS is: ", round(Lebron_avg_OWS, 2))
avg_OWS = pd.DataFrame({'Name':['Lebron James', 'Michael Jordan'], 'Average PER':[Lebron_avg_OWS, Jordan_avg_OWS]})
avg_OWS = avg_OWS.set_index(['Name'])
fig, ax = plt.subplots()
plt.style.use('bmh')
avg_OWS.plot(ax=ax, legend=False, kind = 'bar',color = ['blue','purple'], alpha = 0.65,rot = 0)
ax.set_xlabel("Players", fontsize = 14)
ax.set_ylabel('Average OWS',fontsize = 14)
ax.set_title('Average OWS for Players', fontsize = 14)
ax.set_ylim(0,15)
Out[11]:
This would appear to indicate Jordan was the superior offensive player.
In [12]:
#List of Lebron OWS
list_Lebron_OWS = []
for i in range(13):
list_Lebron_OWS.append(df_L_A['OWS'][i])
#List of Jordan OWS
list_Jordan_OWS = []
for i in range(13):
list_Jordan_OWS.append(df_J_A['OWS'][i])
#x-axis values
list_Seasons = [Season for Season in range(1, 14)]
plt.plot(list_Seasons, list_Lebron_OWS, label = "Lebron James")
plt.plot(list_Seasons, list_Jordan_OWS, label = "Michael Jordan")
plt.xlabel("Season")
plt.ylabel("Offensive Win Shares")
plt.title("Comparing Offensive Win Shares across Seasons")
plt.legend()
plt.show()
As is the case above with PER, Jordan experiences a significant decline towards the tail end of his career for the reasons mentioned previously. There are periods of time where Lebron temproarily has a higher OWS, especially more recently, however Jordan, on average, appears to have been the superior offensive talent.
In [13]:
#Finding Lebron's average DWS
s = 0
c = 0
for i in range(13):
s = s + df_L_A['DWS'][i]
c = c + 1
Lebron_avg_DWS = s / c
#Finding Jordan's average DWS
g = 0
h = 0
for i in range(13):
g = g + df_J_A['DWS'][i]
h = h + 1
Jordan_avg_DWS = g / h
print("Jordan's Average DWS: ", round(Jordan_avg_DWS,2))
print("Lebron's Average DWS: ", round(Lebron_avg_DWS, 2))
avg_DWS = pd.DataFrame({'Name':['Lebron James', 'Michael Jordan'], 'Average PER':[Lebron_avg_DWS, Jordan_avg_DWS]})
avg_DWS = avg_DWS.set_index(['Name'])
fig, ax = plt.subplots()
plt.style.use('bmh')
avg_DWS.plot(ax=ax, legend=False, kind = 'bar',color = ['blue','purple'], alpha = 0.65,rot = 0)
ax.set_xlabel("Players", fontsize = 14)
ax.set_ylabel('Average DWS',fontsize = 14)
ax.set_title('Average DWS for Players', fontsize = 14)
ax.set_ylim(0,6)
Out[13]:
Again, Jordan is the superior player in this statistic. This was a little more of a surprise as Lebron is known as much for his defense as he is for anything while Jordan was more prominent for his offensive prowess. The statistics here show Jordan to be the better defensive player, though statistics can be misleading (more on this below).
In [14]:
#List of Lebron DWS
list_Lebron_DWS = []
for i in range(13):
list_Lebron_DWS.append(df_L_A['DWS'][i])
#List of Jordan DWS
list_Jordan_DWS = []
for i in range(13):
list_Jordan_DWS.append(df_J_A['DWS'][i])
#x-axis values
list_Seasons = [Season for Season in range(1, 14)]
plt.plot(list_Seasons, list_Lebron_DWS, label = "Lebron James")
plt.plot(list_Seasons, list_Jordan_DWS, label = "Michael Jordan")
plt.xlabel("Season")
plt.ylabel("Defensive Win Shares")
plt.title("Comparing Defensive Win Shares across Seasons")
plt.legend()
plt.show()
The DWS for both players appear to trend along the same lines, though again there is a sharp drop for Jordan towards the end of his career.
In [15]:
#Finding Lebron Average WS
s = 0
c = 0
for i in range(13):
s = s + df_L_A['WS'][i]
c = c + 1
Lebron_avg_WS = s / c
#Finding Jordan Average WS
s = 0
c = 0
for i in range(13):
s = s + df_J_A['WS'][i]
c = c + 1
Jordan_avg_WS = s / c
print("Jordan's average WS is: ", round(Jordan_avg_WS,2))
print("James's average WS is: ", round(Lebron_avg_WS,2))
avg_WS = pd.DataFrame({'Name':['Lebron James', 'Michael Jordan'], 'Average PER':[Lebron_avg_WS, Jordan_avg_WS]})
avg_WS = avg_WS.set_index(['Name'])
fig, ax = plt.subplots()
plt.style.use('bmh')
avg_WS.plot(ax=ax, legend=False, kind = 'bar',color = ['blue','purple'], alpha = 0.65,rot = 0)
ax.set_xlabel("Players", fontsize = 14)
ax.set_ylabel('Average WS',fontsize = 14)
ax.set_title('Average WS for Players', fontsize = 14)
ax.set_ylim(0,18)
Out[15]:
As was expected based on the previous data of OWS and DWS, Jordan would appear to be responsible for an average of almost 1.5 wins more than Lebron, making him the superior player based off this.
In [16]:
#List of Lebron WS
list_Lebron_WS = []
for i in range(13):
list_Lebron_WS.append(df_L_A['WS'][i])
#List of Jordan WS
list_Jordan_WS = []
for i in range(13):
list_Jordan_WS.append(df_J_A['WS'][i])
#x-axis values
list_Seasons = [Season for Season in range(1, 14)]
plt.plot(list_Seasons, list_Lebron_WS, label = "Lebron James")
plt.plot(list_Seasons, list_Jordan_WS, label = "Michael Jordan")
plt.xlabel("Season")
plt.ylabel("Win Shares")
plt.title("Comparing Win Shares across Seasons")
plt.legend()
plt.show()
The data confirms the other statistis we computed earlier. As we look at the trend over time, we see that again they seem to follow along closely except for at the end of Jordan's career where we see a sharp drop.
In [17]:
path = 'data/NBA_Champions_Lebron.xlsx'
df_NBA_Champ_Lebron = pd.read_excel(path)
df_NBA_Champ_Lebron
Out[17]:
In [18]:
dict_Lebron = {}
for i in range(13):
if df_NBA_Champ_Lebron['Champion'][i] in dict_Lebron.keys(): #check and see if in dictionary -- if so, add one to value
dict_Lebron[df_NBA_Champ_Lebron['Champion'][i]] = 1 + dict_Lebron.get(df_NBA_Champ_Lebron['Champion'][i])
else: #if not in dictionary, add in dictionary
dict_Lebron[df_NBA_Champ_Lebron['Champion'][i]] = 1
total_Lebron = 0 #calculating HHI
for i in dict_Lebron.values():
total_Lebron = total_Lebron + i*i
HHI_Lebron = total_Lebron / 13
print("The HHI during the years Lebron played is: ", round(HHI_Lebron, 2))
In [19]:
path = 'data/NBA_Champions_Jordan.xlsx'
df_NBA_Champ_Jordan = pd.read_excel(path)
df_NBA_Champ_Jordan
Out[19]:
In [20]:
dict_Jordan = {}
for i in range(13):
if df_NBA_Champ_Jordan['Champion'][i] in dict_Jordan.keys(): #check and see if in dictionary -- if not, add one to value
dict_Jordan[df_NBA_Champ_Jordan['Champion'][i]] = 1 + dict_Jordan.get(df_NBA_Champ_Jordan['Champion'][i])
else: # add to dictionary if not in it already
dict_Jordan[df_NBA_Champ_Jordan['Champion'][i]] = 1
total_Jordan = 0 #calculating HHI
for i in dict_Jordan.values():
total_Jordan = total_Jordan + i*i
HHI_Jordan = total_Jordan / 13
print("The HHI during the years Jordan played is: ", round(HHI_Jordan, 2))
HHIs = pd.DataFrame({'Name':['Lebron James Era', 'Michael Jordan Era'], 'HHI':[HHI_Lebron, HHI_Jordan]})
HHIs = HHIs.set_index(['Name'])
fig, ax = plt.subplots()
plt.style.use('bmh')
HHIs.plot(ax=ax, legend=False, kind = 'bar',color = ['blue','purple'], alpha = 0.65,rot = 0)
ax.set_xlabel("Players", fontsize = 14)
ax.set_ylabel('HHI',fontsize = 14)
ax.set_title('HHI Across Eras', fontsize = 14)
Out[20]:
The HHI during Jordan's years was much higher than that during Lebron's years which suggests that talent was concentrated in a few select teams during Jordan's time. This would allow him to have better statistics during his era as the other teams he played in the league were not nearly up to par. James, on the other hand, faces a much more even playing field and thus it is harder for him, compared to Jordan, to accumulate the same statistics. This leads us to believe that perhaps Lebon's statistics and the narrative that is painted by the above analysis may be somewhat understating Lebron's abilities.
In [ ]: