NBA Top Scoring Trends

Josh Ottensoser, Ben Rapaport, Alexander Stadtmauer, Jacob Sternberg

12/21/16

Professor Lyon & Professor Coleman

Data Bootcamp - UG Fall 2016

December 2016

Abstract

As NBA fans, the eye test over recent years has seemed to demonstrate a shift in player scoring with an increased emphasis on the 3 Pointer. With Stephen Curry as a poster child of this shift, having obliterated the record for most 3 Pointers scored in a single season, we felt the need to find out whether the eye test was true – that Steph was in fact part of a greater trend - or if he was just an outlier. Therefore, in our project, we aimed to uncover evidence for any sort of scoring trends as it relates to distance attempted.

We accomplished this by retrieving data from basketball-reference.com and ESPN's database on the top 5 scorers from the past 15 years (ever since such data has been recorded). We used data of each season’s top 5 scorers rather than the league average for a few reasons: 1. Only individual player data rather than league-wide data is readily available for these metrics, 2. We see the top 5 scorers as proxies for scoring trends, as these are the players for whom offensive game planning revolves around; their attempts are indicative of what NBA offenses are trying to accomplish, and 3. Examining actual players rather than faceless averages allows us to truly see the drivers behind the data, making it easy to spot potential outliers rather than forcing us to guess whether certain players are driving the data.

After compiling the data and analyzing the basic points per game statistics, we delved deeper into the reasons behind the trends by organizing the data in various informative charts and graphs. The graphics are powered by our arranging of investigative formulas, such as points per 36 minutes, HHI, and distance from the basket to determine exactly where the top NBA players are scoring from on the court.

The results of our research can be found in the charts and explanations below. They portray the changes that the NBA game has undergone and what the Association might trend towards in the future; however, we understand that the league is always shifting and adapting to new strategies and changes so we acknowledge that the current trends may not prevail for long.



In [1]:

    
##imports
import sys                             # system module
import pandas as pd                    # data package
import matplotlib as mpl               # graphics package
import matplotlib.pyplot as plt         # pyplot module
%matplotlib inline                      
import datetime as dt                   # datetime module
import seaborn as sns                   # import seaborn module

##read the csv and save it as a dataframe
path = 'https://raw.githubusercontent.com/joshuaott3/DBProject/master/DBData.csv'
df= pd.read_csv(path)

##rename columns
df.columns=['Season','Name', 'Age','Team', 'League','Position','Games Played','Minutes Played', 'PPG','FGA','FG%','Average Shot Distance','%FGA 2P','%FGA 0-3','%FGA 3-10','%FGA 10-16','%FGA 16<3','%FGA 3P','FG% 2P','FG% 0-3','FG% 3-10','FG% 10-16','FG% 16<3','FG% 3P','%AST 2P','%FGA Dunks', 'Dunks Made','%ASTD 3P','%3PA Corner','3P% Corner', '3P Heaves Attempt','3P Heaves Made','OUT']

##drop column that isnt necessary
df = df.drop('OUT', 1)

##set season as the index
df = df.set_index('Season')

#drop the two rows that we do not need now (one row that is useless*, one redundant season row)
##*We got rid of this row by deleting the row that had the word 'Dunks' in the column 'Dunks Made' (the row we didn't want)
df=df.drop(['Season'])
df = df[df['Dunks Made'] != 'Dunks']

##Convert the appropriate rows from strings to floats
tofloats= ['Age','Games Played','Minutes Played', 'PPG','FGA','FG%','Average Shot Distance','%FGA 2P','%FGA 0-3','%FGA 3-10','%FGA 10-16','%FGA 16<3','%FGA 3P','FG% 2P','FG% 0-3','FG% 3-10','FG% 10-16','FG% 16<3','FG% 3P','%AST 2P','%FGA Dunks', 'Dunks Made','%ASTD 3P','%3PA Corner','3P% Corner']
for i in tofloats:
    df[i] = df[i].astype(float)
    
##Create variables that will be necesasry for proper weighting
df['FGA 2P']=df['%FGA 2P'] * df['FGA']
df['FGA 0-3']=df['%FGA 0-3'] * df['FGA']
df['FGA 3-10']=df['%FGA 3-10'] * df['FGA']
df['FGA 10-16']=df['%FGA 10-16'] * df['FGA']
df['FGA 16 <3']=df['%FGA 16<3'] * df['FGA']
df['FGA 3P']=df['%FGA 3P'] * df['FGA']

##Creates Minute per Game variable
df['MPG'] = df['Minutes Played']/df['Games Played']

##Creates Points per Minute variable
df['PPM']= (df['PPG']*df['Games Played'])/df['Minutes Played']

##Creats Points per 36 Minutes variable
df['PP36']=df['PPM']*36

##getting the average for each year, will be useful for PPG and MPG
mean_years=df.groupby(df.index).mean()









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/matplotlib/__init__.py:878: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))



In [2]:

    
##Checking to make sure the dataframe is there how we like it
df.head(5)









    Out[2]:






  
    
      
      Name
      Age
      Team
      League
      Position
      Games Played
      Minutes Played
      PPG
      FGA
      FG%
      ...
      3P Heaves Made
      FGA 2P
      FGA 0-3
      FGA 3-10
      FGA 10-16
      FGA 16 <3
      FGA 3P
      MPG
      PPM
      PP36
    
    
      Season
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2015-16
      Stephen Curry
      27.0
      GSW
      NBA
      PG
      79.0
      2700.0
      30.1
      1598.0
      0.504
      ...
      2\
      712.708
      359.550
      134.232
      76.704
      140.624
      885.292
      34.177215
      0.880704
      31.705333
    
    
      2015-16
      James Harden
      26.0
      HOU
      NBA
      SG
      82.0
      3125.0
      29.0
      1617.0
      0.439
      ...
      0\
      960.498
      397.782
      200.508
      169.785
      194.040
      656.502
      38.109756
      0.760960
      27.394560
    
    
      2015-16
      LeBron James
      31.0
      CLE
      NBA
      SF
      76.0
      2709.0
      25.3
      1416.0
      0.520
      ...
      0\
      1134.216
      649.944
      168.504
      131.688
      182.664
      281.784
      35.644737
      0.709782
      25.552159
    
    
      2015-16
      Kevin Durant
      27.0
      OKC
      NBA
      SF
      72.0
      2578.0
      28.2
      1381.0
      0.505
      ...
      0\
      900.412
      292.772
      135.338
      251.342
      222.341
      480.588
      35.805556
      0.787587
      28.353142
    
    
      2015-16
      DeMarcus Cousins
      25.0
      SAC
      NBA
      C
      65.0
      2246.0
      26.9
      1332.0
      0.451
      ...
      0\
      1121.544
      479.520
      380.952
      46.620
      214.452
      210.456
      34.553846
      0.778495
      28.025824
    
  

5 rows × 40 columns



In [3]:

    
##Making sure all of the data is in the format we need 
##*There will be no need for 3-Point Heave data so we did not change it
df.dtypes









    Out[3]:





Name                      object
Age                      float64
Team                      object
League                    object
Position                  object
Games Played             float64
Minutes Played           float64
PPG                      float64
FGA                      float64
FG%                      float64
Average Shot Distance    float64
%FGA 2P                  float64
%FGA 0-3                 float64
%FGA 3-10                float64
%FGA 10-16               float64
%FGA 16<3                float64
%FGA 3P                  float64
FG% 2P                   float64
FG% 0-3                  float64
FG% 3-10                 float64
FG% 10-16                float64
FG% 16<3                 float64
FG% 3P                   float64
%AST 2P                  float64
%FGA Dunks               float64
Dunks Made               float64
%ASTD 3P                 float64
%3PA Corner              float64
3P% Corner               float64
3P Heaves Attempt         object
3P Heaves Made            object
FGA 2P                   float64
FGA 0-3                  float64
FGA 3-10                 float64
FGA 10-16                float64
FGA 16 <3                float64
FGA 3P                   float64
MPG                      float64
PPM                      float64
PP36                     float64
dtype: object



In [4]:

    
##Making sure the mean dataframe is there and how we'd like it
mean_years.tail(5)









    Out[4]:






  
    
      
      Age
      Games Played
      Minutes Played
      PPG
      FGA
      FG%
      Average Shot Distance
      %FGA 2P
      %FGA 0-3
      %FGA 3-10
      ...
      3P% Corner
      FGA 2P
      FGA 0-3
      FGA 3-10
      FGA 10-16
      FGA 16 <3
      FGA 3P
      MPG
      PPM
      PP36
    
    
      Season
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2011-12
      25.8
      61.4
      2316.0
      26.52
      1225.4
      0.4724
      12.88
      0.7944
      0.2600
      0.1536
      ...
      0.3720
      973.9336
      312.0770
      188.5060
      197.4066
      276.2368
      251.4664
      37.778565
      0.701616
      25.258163
    
    
      2012-13
      27.4
      76.0
      2895.2
      27.36
      1441.6
      0.4850
      13.38
      0.7364
      0.2748
      0.1338
      ...
      0.3518
      1062.4448
      391.7522
      192.3648
      192.6238
      285.9748
      379.1552
      38.060730
      0.719194
      25.891001
    
    
      2013-14
      26.4
      77.0
      2916.0
      27.60
      1462.0
      0.4870
      13.90
      0.6948
      0.2678
      0.1280
      ...
      0.4252
      1021.9350
      383.7124
      186.4180
      197.5046
      254.3000
      440.0650
      37.864913
      0.728605
      26.229768
    
    
      2014-15
      25.2
      68.8
      2448.8
      25.86
      1297.0
      0.4712
      11.10
      0.8286
      0.3522
      0.1530
      ...
      0.2238
      1056.3142
      452.0788
      190.1542
      160.4626
      253.1496
      240.6858
      35.502540
      0.728963
      26.242660
    
    
      2015-16
      27.2
      74.8
      2671.6
      27.90
      1468.8
      0.4838
      13.04
      0.6670
      0.3004
      0.1422
      ...
      0.3408
      965.8756
      435.9136
      203.9068
      135.2278
      190.8242
      502.9244
      35.658222
      0.783506
      28.206204
    
  

5 rows × 34 columns

PPG

We first seek to examine whether or not there has been a trend in scoring patterns over the course of our dataset. in order to do so we take the average of each season and graph points statistics over time. The following graph shows the points per game trend for the top 5 scorers over our dataset.



In [5]:

    
##set seaborn
sns.set()

##create subplot
fig, ax = plt.subplots() 

##plot out the PPG from mean years
mean_years['PPG'].plot(ax=ax,legend=None,color='teal', linewidth=5,ls='dashdot')

##title it, place the legend and decide the style
plt.title('PPG by Year', color='Navy',fontsize='18', fontweight='bold')

#sets label titles and style
ax.set_xlabel('NBA Season', fontsize='14', fontweight='bold')
ax.set_ylabel('Points Per Game', fontsize='14', fontweight='bold')

##gets labels
locs, labels = plt.xticks()

##sets rotation of the x labels
plt.setp(labels, rotation=90)









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/matplotlib/__init__.py:878: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))






    Out[5]:





[None, None, None, None, None, None, None, None, None]

PPG to PP36

From our PPG graph above, we failed to make a conclusion one way or another as to what trend, if any, was occuring in the past 15 years. Therefore, we choose to delve deeper and to see if the minutes played per game have played a role as well.



In [6]:

    
##set seaborn
sns.set()

##create subplot
fig, ax = plt.subplots()   

##plot Minutes Player per game from mean_years
mean_years['PP36'].plot(ax=ax,legend=None,color='green', linewidth=5, ls=':')

#title it
plt.title('Points per 36 Minutes Over the Years', color='Navy',fontsize='14', fontweight='bold')

##sets label titles and style
ax.set_xlabel('NBA Season', fontsize='14', fontweight='bold')
ax.set_ylabel('Points Per 36 Minutes', fontsize='14', fontweight='bold')


##gets labels
locs, labels = plt.xticks()

##sets rotation of the x labels
plt.setp(labels, rotation=90)









    Out[6]:





[None, None, None, None, None, None, None, None, None]

Here, we see that while points per game on a raw basis decreased, points per 36 minutes among the top scorers increased. This indicates to us that there is in fact a trend towards increased scoring efficiency. Now that we have established a trend of increased scoring efficiency, we seek to discover what is driving the increase in efficiency.

Calculate HHI

We calculate HHI to determine whether concentration of shot type (by location) is a factor in increased socirng efficiency.

HHI is a measure of how concentrated the distribution is. With our formula, the maximum HHI is 1. If a player shoots 100 shots and they are all in the 3-10 foot range, his HHI will be calculated as [(100 3-10 shots)/(100 total shots)]^2=1. Had this shooter taken 50 shots in the 3-10 foot range and 50 shots in the 10-16 foot range, his HHI would be calcualted as [(50 3-10 shots)/(100 total shots)]^2 + [(50 10-16 shots)/(100 total shots)]^2 = .5.



In [6]:

    
##create sum_years table, will be of use for everything else
sum_years=df.groupby(df.index).sum()

##create HHI column
sum_years['HHI']=0

##create list that will be vital for HHI calculation
HHI_list=['FGA 0-3','FGA 3-10', 'FGA 10-16', 'FGA 16 <3', 'FGA 3P']

##Create a for loop to get the HHI for each year
for i in HHI_list:
    sum_years['HHI']+=(sum_years[i]/sum_years['FGA'])**2
    
#create list that will be vital to determine shot distribution
FGA_list=['%FGA 0-3','%FGA 3-10','%FGA 10-16','%FGA 16<3','%FGA 3P']

##for loop that will give accurate number for percentage of shots on each location
        ##before, it was a sum of all of the players %, which made it greater than 1. We want the % of all of the players shots.
for i in range(0,5):
    sum_years[FGA_list[i]]=sum_years[HHI_list[i]]/sum_years['FGA'] 
    
##create MoreyBall column that will be sum of % of shots that are 0-3 feet and 3P
sum_years['MoreyBall']=sum_years['%FGA 0-3']+sum_years['%FGA 3P']



In [7]:

    
##Making sure the sum dataframe is there and how we'd like it
sum_years.tail(5)









    Out[7]:






  
    
      
      Age
      Games Played
      Minutes Played
      PPG
      FGA
      FG%
      Average Shot Distance
      %FGA 2P
      %FGA 0-3
      %FGA 3-10
      ...
      FGA 0-3
      FGA 3-10
      FGA 10-16
      FGA 16 <3
      FGA 3P
      MPG
      PPM
      PP36
      HHI
      MoreyBall
    
    
      Season
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2011-12
      129.0
      307.0
      11580.0
      132.6
      6127.0
      2.362
      64.4
      3.972
      0.254674
      0.153832
      ...
      1560.385
      942.530
      987.033
      1381.184
      1257.332
      188.892827
      3.508078
      126.290814
      0.207403
      0.459885
    
    
      2012-13
      137.0
      380.0
      14476.0
      136.8
      7208.0
      2.425
      66.9
      3.682
      0.271748
      0.133438
      ...
      1958.761
      961.824
      963.119
      1429.874
      1895.776
      190.303648
      3.595972
      129.455003
      0.218033
      0.534758
    
    
      2013-14
      132.0
      385.0
      14580.0
      138.0
      7310.0
      2.435
      69.5
      3.474
      0.262457
      0.127509
      ...
      1918.562
      932.090
      987.523
      1271.500
      2200.325
      189.324566
      3.643023
      131.148842
      0.224249
      0.563459
    
    
      2014-15
      126.0
      344.0
      12244.0
      129.3
      6485.0
      2.356
      55.5
      4.143
      0.348557
      0.146611
      ...
      2260.394
      950.771
      802.313
      1265.748
      1203.429
      177.512698
      3.644814
      131.213299
      0.230825
      0.534128
    
    
      2015-16
      136.0
      374.0
      13358.0
      139.5
      7344.0
      2.419
      65.2
      3.335
      0.296782
      0.138825
      ...
      2179.568
      1019.534
      676.139
      954.121
      2514.622
      178.291110
      3.917528
      141.031018
      0.249948
      0.639187
    
  

5 rows × 36 columns



In [8]:

    
##set seaborn
sns.set()

##create subplot
fig, ax = plt.subplots()    

##plot out the HHI from sum_years
sum_years['HHI'].plot(ax=ax,legend=False,color='Red', linewidth=5,linestyle='--')

##title it and format
plt.title('HHI by Year', color='Navy',fontsize='16',fontweight='bold')

##Title y and x label and format
ax.set_ylabel('HHI Level (max of 1)',fontsize='14',fontweight='bold')
ax.set_xlabel('NBA Season',fontsize='14',fontweight='bold')

#gets labels
locs, labels = plt.xticks()

#sets rotation of the x labels
plt.setp(labels, rotation=90)









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/matplotlib/__init__.py:878: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))






    Out[8]:





[None, None, None, None, None, None, None, None, None]

As can be seen in the graph above, there has been a recent upward trend in HHI. This signifies to us that there is more concentration in where shots are being taken from, however, we are still unaware of where these shots are in fact coming from. It is clear players are choosing to shoot from the same locations more often, but where from?

Test 1: Average Shot Distance

Having established a recent upward trend in HHI, we would like to more closely examine which shot distance is being favored and is therefore leading to an increase in shot concentration. Our first test that we have chosen to conduct is to chart the average shot distance for the top 5 scorers over the years. Perhaps looking at the trend regarding what distance these players are shooting from will help us understand what changes have been made.



In [9]:

    
##set seaborn
sns.set()

##create subplot
fig, ax = plt.subplots()   

##plot Average Shot Distance from mean_years
mean_years['Average Shot Distance'].plot(ax=ax,legend=None,color='purple', linewidth=5, linestyle = 'solid')

##title it and format
plt.title('Average Shot Distance by Year', color='Navy',fontsize='16', fontweight='bold')

##title x and y label and format
ax.set_ylabel('Average Shot Distance',fontsize='14', fontweight='bold')
ax.set_xlabel('NBA Season',fontsize='14', fontweight='bold')

##gets labels
locs, labels = plt.xticks()

##sets rotation of the x labels
plt.setp(labels, rotation=90)









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/matplotlib/__init__.py:878: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))






    Out[9]:





[None, None, None, None, None, None, None, None, None]

Test 2: Shot Distribution

We had hypthesized that a shot concentration increase would be a result of a rise in concentration of 3 Pointers attempted, and that the average shot distance would therefore rise considerably. However, the data shows that the story is more complicated than that. Since the 2011-2012 season, the average shot distance has hardly risen. This means that if 3 Pointers attempted has increased, close-range shots may have also increased to counterbalance the effect of more 3 Pointers attempted.

We will further examine the breakdown of shots attempted by distance type in order to see why exactly the average shot ditance has not changed despite our perception of an increase in 3 Pointers.



In [10]:

    
##Set seaborn
sns.set()

##create a subplot
fig, ax = plt.subplots()  

##plot the %FGA from each distance (with seperate linewidths for 16<3 and 3P)
sum_years[['%FGA 0-3','%FGA 3-10','%FGA 10-16']].plot(ax=ax,ls='-.')
sum_years[['%FGA 16<3', '%FGA 3P']].plot(ax=ax,linewidth=5)


##title it and format
plt.title('Shot Distribution by Year', color='Maroon',fontsize='16',fontweight='bold')

#title x and y label and format
ax.set_ylabel('% of Total Shots [.30 = 30%]',fontsize='14',fontweight='bold')
ax.set_xlabel('NBA Season',fontsize='14',fontweight='bold')

##place legend
plt.legend(loc='center left', bbox_to_anchor=(1, 0.5))

##gets labels
locs, labels = plt.xticks()

##sets rotation of the x labels
plt.setp(labels, rotation=90)









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/matplotlib/__init__.py:878: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))






    Out[10]:





[None, None, None, None, None, None, None, None, None]

Looking closely at this graph, we can see that 3 Pointers and FGA 0-3 have both been on the rise, with these shot locations dominated the league in 2015-2016 (combined ~65% of total shots).

Additionally, this graph can further help explain why the average shot distance has not changed much. The most diminishing shot type is FGA 16 < 3. Because the effect we are seeing essentially replaces FGA 16 < 3 with 3 Pointers on a percentage basis, and because these two shot types are close in distance, the average distance does not change much (especially being coupled with the increase in FGA 0-3).

We'd like to crystalize this trend towards layups and 3 Pointers in a visual that focuses on displaying the prominence of these two shot forms.

Test 3: MoreyBall

In this graph, we examine the proportion of the total field goal attempts of the top 5 scorers that have come from 0-3 ft. away from the hoop and beyond the 3 point line. 'Moreyball' gets its name from Houston Rockets General Manager Daryl Morey, who popularized the theory that long-range two-pointers are the least efficient shots in the game, and therefore pushes his team to shoot from only 0-3 ft or 3P (~70% of their shots in recent years).

This graph shows a generally positive trend in Moreyball even since 2000 but especially since 2011-2012 - right around where we saw HHI take off.



In [11]:

    
##set seaborn
sns.set()

##create subplot
fig, ax = plt.subplots()   

##plot Moreyball from sum_years
sum_years['MoreyBall'].plot(ax=ax,legend=False, linewidth=5)

##title it
plt.title('Growth of MoreyBall', color='Blue', fontsize='16',fontweight='bold')

##title x and y label
ax.set_ylabel('% of FGA [.50 = 50%]',fontsize='14',fontweight='bold')
ax.set_xlabel('NBA Season',fontsize='14',fontweight='bold')

#gets labels
locs, labels = plt.xticks()

#sets rotation of the x labels
plt.setp(labels, rotation=90)









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/matplotlib/__init__.py:878: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))






    Out[11]:





[None, None, None, None, None, None, None, None, None]

To further look into this trend and how much the game has changed, we will now show two pie charts side by side that will display how shot location has changed since the 2000-2001 season compared to the 2015-2016 season. Anyone who watches basketball knows that they are watching a different game today than they had 15 years ago, but how different has it become? The results below tell a shocking tale.



In [12]:

    
##Set seaborn
sns.set()

##create subplots
fig, ax = plt.subplots(2)

##create an explode list to we can seperate 0-3 and 3P from the pie chart
explode = (0.1, 0, 0, 0,0.1)

##create a dataframe that only has the data from 2000-2001
twothousand=mean_years.loc['2000-01']

##only keep the data we want here (using the HHI list we made earlier)
twothousand=twothousand[HHI_list]

##plot it as a pie chart on the top plot (and set a startangle that we prefer, and display percentage breakdown)
twothousand.plot(ax=ax[0],kind='pie',legend=None,autopct='%1.1f%%',explode=explode, startangle=180)

##supertitle it
fig.suptitle('Total Distributions: 2000-01 and 2015-16 Seasons', fontsize='16', fontweight='bold')

##create a dataframe that only has the data from 2015-2016
twothousandfifteen=mean_years.loc['2015-16']

##only keep the data we want here (using the HHI list we made earlier)
twothousandfifteen=twothousandfifteen[HHI_list]

##plot it as a pie chart (and set a startangle that we prefer)
twothousandfifteen.plot(ax=ax[1],kind='pie',autopct='%1.1f%%',explode=explode,startangle=180)









    



/Users/sglyon/anaconda3/lib/python3.5/site-packages/matplotlib/__init__.py:878: UserWarning: axes.color_cycle is deprecated and replaced with axes.prop_cycle; please use the latter.
  warnings.warn(self.msg_depr % (key, alt_key))






    Out[12]:





<matplotlib.axes._subplots.AxesSubplot at 0x11add5f98>

As you can clearly see, over this time frame, the percentage of 3 Pointers attempted has more than doubled and FGA 16 < 3 has more than halved. Other 2-pointers - FGA 10-16 and FGA 3-10 - have also decreased while FGA 0-3 has risen and has continued to be a critical component of the game.

We now know that while HHI and Moreyball have been on a pretty straightforward upward trend since 2011-2012, the new millennium before that season appears to have been far more volitile. This may mean that the trend has not been around as long as this side-by-side analysis might lead one to believe.

What is clear, however, is that the game of basketball has changed dramtically over the years, and the league is clearly beginning to catch on to something.

Conclusion

Ultimately, after analyzing the results from the PPG, PP36, HHI, shot distance, shot distribution, and specialized shot distribution, we have come to the conclusion that while it has taken the league some time to figure out, the NBA has ultimately seen a clear shift away from shots in the midrange, a doubling of 3 Pointers attempted, and an increase in layups / dunks as well. This seems to strengthen Daryl Morey's argument.

Our findings from the HHI make it clear that fewer shot types are responsible for most of the total shots among the tops scorers in the league. Our pie charts and distance graphs prove that this must be from a spike in 3 Pointers at the expense of long 2 Pointers, along with an increase in layups / dunks. This comes together to strengthen our original hypothesis that the league truly is searching for an edge in efficiency with its shot selection. Top scorers are playing fewer minutes and scoring just as many points as in the past.

It is important to point out that our data set is limited and one way of making this report more efficient is an increase of data points. Taking the top 10 or 15 scorers per season, compared to top 5, would likely yield different results and we wonder how different our conclusion would be. Although we found a trend in our project, we know that our limited data set of top 5 scorers per season could have been stronger had we retrieved more data points. For example, in the last season, the immense 3-point tendency in Stephen Curry had a significant pull on the season’s data which played a large role in our project, Increasing our data would relieve the dependency on a single player's statistics.

So Moreyball is on the rise, but has it proven to be a winning strategy on a team level? As a preliminary step towards where our analysis may lead researchers next, the data seems to show that since Moreyball has been on the rise, the top scorers with the lowest Moreyball scores are increasingly showing up on better teams:

-In 2015-2016, Demarcus Cousins scored lowest on the Moreyball scale. His team did in fact have the fewest wins of the five, and was the only of the season's top 5 scorers to fail to make the playoffs.

-In 2014-2015, Cousins, Anthony Davis and Russell Westbrook scored the lowest and their teams also had the three fewest wins of the five.

-In 2013-2014, Carmelo Anthony's Moreyball score was the lowest and his team's wins were the lowest.

-In 2012-2013, the data is scattered.

-In 2011-2012 the lower Moreyball scores actually appear on teams with more wins, with the highest of the season, Kevin Love, appearing on by far the least winning team.

We are ultimately left with the question of whether Moreyball is peaking or whether the league will catch on and reverse the trend outlined above. As technology and advanced statistics increasingly proliferate the Association, we predict that this current flourishing of exciting and smart basketball will continue in the near-term and evolve to account for counter-strategies and newly found evidence in the long-term. We look forward to see where the leading basketball minds of our time take the game next.



In [ ]:

	Name	Age	Team	League	Position	Games Played	Minutes Played	PPG	FGA	FG%	...	3P Heaves Made	FGA 2P	FGA 0-3	FGA 3-10	FGA 10-16	FGA 16 <3	FGA 3P	MPG	PPM	PP36
Season
2015-16	Stephen Curry	27.0	GSW	NBA	PG	79.0	2700.0	30.1	1598.0	0.504	...	2\	712.708	359.550	134.232	76.704	140.624	885.292	34.177215	0.880704	31.705333
2015-16	James Harden	26.0	HOU	NBA	SG	82.0	3125.0	29.0	1617.0	0.439	...	0\	960.498	397.782	200.508	169.785	194.040	656.502	38.109756	0.760960	27.394560
2015-16	LeBron James	31.0	CLE	NBA	SF	76.0	2709.0	25.3	1416.0	0.520	...	0\	1134.216	649.944	168.504	131.688	182.664	281.784	35.644737	0.709782	25.552159
2015-16	Kevin Durant	27.0	OKC	NBA	SF	72.0	2578.0	28.2	1381.0	0.505	...	0\	900.412	292.772	135.338	251.342	222.341	480.588	35.805556	0.787587	28.353142
2015-16	DeMarcus Cousins	25.0	SAC	NBA	C	65.0	2246.0	26.9	1332.0	0.451	...	0\	1121.544	479.520	380.952	46.620	214.452	210.456	34.553846	0.778495	28.025824

	Age	Games Played	Minutes Played	PPG	FGA	FG%	Average Shot Distance	%FGA 2P	%FGA 0-3	%FGA 3-10	...	3P% Corner	FGA 2P	FGA 0-3	FGA 3-10	FGA 10-16	FGA 16 <3	FGA 3P	MPG	PPM	PP36
Season
2011-12	25.8	61.4	2316.0	26.52	1225.4	0.4724	12.88	0.7944	0.2600	0.1536	...	0.3720	973.9336	312.0770	188.5060	197.4066	276.2368	251.4664	37.778565	0.701616	25.258163
2012-13	27.4	76.0	2895.2	27.36	1441.6	0.4850	13.38	0.7364	0.2748	0.1338	...	0.3518	1062.4448	391.7522	192.3648	192.6238	285.9748	379.1552	38.060730	0.719194	25.891001
2013-14	26.4	77.0	2916.0	27.60	1462.0	0.4870	13.90	0.6948	0.2678	0.1280	...	0.4252	1021.9350	383.7124	186.4180	197.5046	254.3000	440.0650	37.864913	0.728605	26.229768
2014-15	25.2	68.8	2448.8	25.86	1297.0	0.4712	11.10	0.8286	0.3522	0.1530	...	0.2238	1056.3142	452.0788	190.1542	160.4626	253.1496	240.6858	35.502540	0.728963	26.242660
2015-16	27.2	74.8	2671.6	27.90	1468.8	0.4838	13.04	0.6670	0.3004	0.1422	...	0.3408	965.8756	435.9136	203.9068	135.2278	190.8242	502.9244	35.658222	0.783506	28.206204

	Age	Games Played	Minutes Played	PPG	FGA	FG%	Average Shot Distance	%FGA 2P	%FGA 0-3	%FGA 3-10	...	FGA 0-3	FGA 3-10	FGA 10-16	FGA 16 <3	FGA 3P	MPG	PPM	PP36	HHI	MoreyBall
Season
2011-12	129.0	307.0	11580.0	132.6	6127.0	2.362	64.4	3.972	0.254674	0.153832	...	1560.385	942.530	987.033	1381.184	1257.332	188.892827	3.508078	126.290814	0.207403	0.459885
2012-13	137.0	380.0	14476.0	136.8	7208.0	2.425	66.9	3.682	0.271748	0.133438	...	1958.761	961.824	963.119	1429.874	1895.776	190.303648	3.595972	129.455003	0.218033	0.534758
2013-14	132.0	385.0	14580.0	138.0	7310.0	2.435	69.5	3.474	0.262457	0.127509	...	1918.562	932.090	987.523	1271.500	2200.325	189.324566	3.643023	131.148842	0.224249	0.563459
2014-15	126.0	344.0	12244.0	129.3	6485.0	2.356	55.5	4.143	0.348557	0.146611	...	2260.394	950.771	802.313	1265.748	1203.429	177.512698	3.644814	131.213299	0.230825	0.534128
2015-16	136.0	374.0	13358.0	139.5	7344.0	2.419	65.2	3.335	0.296782	0.138825	...	2179.568	1019.534	676.139	954.121	2514.622	178.291110	3.917528	141.031018	0.249948	0.639187