NBA player Russell Westbrook who plays for the Oklahoma City Thunder just finished an historic NBA basketball season as he became the second basketball player in NBA history to average a triple double for an entire season. A triple double entails having atleast 3 of the stat totals of points, assists, rebounds, steals, and blocks to be in double figures. A triple double is most commonly obtained through points, rebounds, and assists. During the 2017 NBA regular season, Westbrook averaged 31.6 points per game, 10.4 assists per game, and 10.7 rebounds per game.
Former NBA basketball player Oscar Robertson who used to play for the Cincinatti Royals is the only other player to average a triple double for an entire regular season as he did so 55 years ago. During the 1962 NBA regular season, Oscar Robertson averaged 30.8 points per game, 11.4 assists per game, and 12.5 rebounds per game. Many thought no one would ever average a triple double for an entire season ever again.
My project is going to compare the 2 seasons. Since it has been 55 years in between the 2 seasons, much has changed about the NBA and how basketball is played. I want to compare the differences in the way the game is played by examining their respective seasons in order to obtain a better understanding of who had the more impressive season.
In [46]:
# importing packages
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
In [47]:
# all data is obtained through basketball-reference.com
# http://www.basketball-reference.com/teams/OKC/2017.html
# http://www.basketball-reference.com/teams/CIN/1962.html
# http://www.basketball-reference.com/leagues/NBA_stats.html
In [48]:
# all 2017 okc thunder player per game stats
okc = pd.read_csv('/Users/rohanpatel/Downloads/Per_Game_OKC_2017.csv')
okc.head()
Out[48]:
In [49]:
# all 1962 cincinatti royals player per game stats
cin = pd.read_csv('/Users/rohanpatel/Downloads/Per_Game_CincRoy_1962.csv')
cin.head()
Out[49]:
In [50]:
# only russell westbrook's points, rebounds, assists, and minutes per game
RW = okc.loc[:0]
RW = RW[['PTS/G', 'TRB', 'AST', 'MP']]
RW = RW.rename(columns={'PTS/G': 'PTS'})
RW
Out[50]:
In [51]:
# only oscar robertson's points, rebounds, assists, and minutes per game
OR = cin.loc[:0]
OR = OR[['PTS', 'TRB', 'AST', 'MP']]
OR
Out[51]:
In [52]:
# robertson played a considerable amount of more minutes than westbrook
# adjusting per game stats by 36 minutes played
rw_min_factor = 36/RW['MP']
or_min_factor = 36/OR['MP']
In [53]:
RW[['PTS', 'TRB', 'AST']] = RW[['PTS', 'TRB', 'AST']].apply(lambda x: x*rw_min_factor)
RW_36 = RW[['PTS', 'TRB', 'AST']]
print(RW_36)
In [54]:
OR[['PTS', 'TRB', 'AST']] = OR[['PTS', 'TRB', 'AST']].apply(lambda x: x*or_min_factor)
OR_36 = OR[['PTS', 'TRB', 'AST']]
print(OR_36)
In [55]:
# difference between Westbrook and Robertson's per 36 minute stats
RW_36 - OR_36
Out[55]:
In [56]:
# 2017 NBA stats
df_2017 = pd.read_csv('/Users/rohanpatel/Downloads/2017_NBA_Stats.csv')
df_2017
# 2017 okc thunder stats
okc_2017 = df_2017.loc[9]
okc_2017
Out[56]:
In [57]:
# 1962 NBA stats
df_1962 = pd.read_csv('/Users/rohanpatel/Downloads/1962_NBA_Stats.csv')
df_1962
# 1962 cincinatti royal stats
cin_1962 = df_1962.loc[4]
cin_1962
Out[57]:
There is a noticable difference in the 'pace' stat between the 2017 Thunder and 1962 Royals. The pace stat measures how many possessions per game that a team plays per 48 minutes. The higher the pace total, the more possessions per game that the team plays. The 1962 Cincinatti Royals played about 125 possessions per game while the 2017 Oklahoma City Thunder played about 98 possessions per game. The number of possessions in a game would seem to have an impact on the stat totals of players. It would be estimated that the more possessions a team plays with, the more totals of stats such as points, rebounds, and assists would accumulate. I am going to see how the pace of teams has changed over time and how well that correlates with the number of points, rebounds, and assists that have been accumulated over time to see if Westbrook and Robertson's stats should be adjusted for the number of possessions played.
In [58]:
# nba averages per game for every season
nba_avgs = pd.read_csv('/Users/rohanpatel/Downloads/NBA_Averages_Over_Time.csv')
nba_avgs = nba_avgs[['Pace', 'PTS', 'AST', 'TRB', 'FGA']]
# pace values after the 44th row are missing
nba_avgs = nba_avgs.iloc[:44]
print(nba_avgs)
In [59]:
# scatterplots of stats against number of possessions
fig, ax = plt.subplots(nrows = 4, ncols = 1, sharex = True, figsize=(10, 20))
ax[0].scatter(nba_avgs['Pace'], nba_avgs['PTS'], color = 'green')
ax[1].scatter(nba_avgs['Pace'], nba_avgs['TRB'], color = 'blue')
ax[2].scatter(nba_avgs['Pace'], nba_avgs['AST'], color = 'red')
ax[3].scatter(nba_avgs['Pace'], nba_avgs['FGA'], color = 'orange')
ax[0].set_ylabel('POINTS', fontsize = 18)
ax[1].set_ylabel('REBOUNDS', fontsize = 18)
ax[2].set_ylabel('ASSISTS', fontsize = 18)
ax[3].set_ylabel('SHOT ATTEMPTS', fontsize = 18)
ax[3].set_xlabel('NUMBER OF POSSESSIONS', fontsize = 18)
plt.suptitle('STAT TOTALS VS NUMBER OF POSSESSIONS (PER GAME)', fontsize = 22)
plt.show()
In [60]:
import statsmodels.api as sm
from pandas.tools.plotting import scatter_matrix
y = np.matrix(nba_avgs['PTS']).transpose()
x1 = np.matrix(nba_avgs['Pace']).transpose()
X = sm.add_constant(x1)
model = sm.OLS(y,X)
f = model.fit()
print(f.summary())
In [61]:
y = np.matrix(nba_avgs['AST']).transpose()
x1 = np.matrix(nba_avgs['Pace']).transpose()
X = sm.add_constant(x1)
model = sm.OLS(y,X)
f = model.fit()
print(f.summary())
In [62]:
# adjusting both player's per 36 minute points, rebounds, and assists per 100 team possessions
rw_pace_factor = 100/okc_2017['Pace']
or_pace_factor = 100/cin_1962['Pace']
In [63]:
RW_36_100 = RW_36.apply(lambda x: x*rw_pace_factor)
print(RW_36_100)
In [64]:
OR_36_100 = OR_36.apply(lambda x: x*or_pace_factor)
print(OR_36_100)
In [65]:
print(RW_36_100 - OR_36_100)
In [66]:
# westbrook's per 36 minute stats adjusted for 1962 Cincinatti Royals pace
RW_36_1962 = RW_36 * (cin_1962['Pace']/okc_2017['Pace'])
print(RW_36_1962)
In [67]:
# robertson's per 36 minute stats adjusted for 2017 OKC Thunder Pace
OR_36_2017 = OR_36 * (okc_2017['Pace']/cin_1962['Pace'])
print(OR_36_2017)
In [68]:
# difference between the two if westbrook played at 1962 robertson's pace per 36 minutes
print(RW_36_1962 - OR_36)
In [69]:
# difference between the two if robertson played at 2017 westbrook's pace per 36 minutes
print(RW_36 - OR_36_2017)
In [70]:
# huge advantages for westbrook after adjusting for possessions
PACE MATTERS. Pace is something that almost never gets mentioned in any basketball debate when comparing across eras. Per-game statistics are what is mainly used when comparing players to see who is better. But as we saw, the number of possessions a player plays with varies largely between teams and plays a major factor in the amount of total statistics he is able to accumulate. Pace has steadily slowed down as the NBA has gotten older other than the slight recent surge the past few years. In Robertson's time, a team playing with 130 possessions per game was not unusual, but 130 possessions per game today is unheard of. It now does not seem like a coincidence that 1962 was also the the season that Wilt Chamberlain mythically averaged 50 points per game and scored a 100 points in a single game
This goes to show how impressive Westbrook's 2017 NBA season actually was. Many recognize the greatness in that it's only the 2nd time ever someone has averaged a triple double for an entire season. But people do not realize how much better it was than Robertson's season. As most people do, just glancing at the per game statistics makes the seasons look similar and might even give Robertson an edge. But as we broke down the statistics and adjusted for pace and minutes played, Westbrook averaged about 13.5 more points, 3.25 more rebounds, and 3.65 more assists than Robertson did per 36 minutes and 100 possessions. Robertson's averages of about 20 points, 8 rebounds, and 7.5 assists per game after adjusting for pace and minutes is about what you'd expect from a regular all-star player today. The numbers are still very good, but not what they seem by just looking at the box score. The typical NBA box-score is limited because it does not take into account many factors that lead to the accumulation of statistics as per-game numbers can be very misleading.