By: Nathan Ho

Basketball 'Clutch' - Kobe Bryant vs. Michael Jordan

Introduction: What is 'Clutch'?

In sports, a 'clutch' play is typically a play which occurs in the key moment of a game that puts the outcome of the game in question. This could be a hail-mary pass in football, an 11th inning double play in baseball, or a save on a penalty kick in soccer.

However, in basketball, where the game often revolves around the selective few superstars in the league, 'clutch'ness describes more the player than the play. A 'clutch' player is a player who thrives in high pressure situations and becomes a more versatile and efficient player when the game is on the line. 'Clutch' plays are explained as the handiwork of 'clutch' players.

It's easy for most basketball fans and sports commentator to talk about 'clutch' qualitatively. Conversations about the strength of a player's mentality can get very opionated as it is unquantifiable. In this project, we will explore the quantitative side of 'clutch', and will be analyzing the career statistics of two widely alleged 'clutch' players: Kobe Bryant and Michael Jordan. Comparisons have often been drawn between the two players, and it will be interesting to take a statistical approach in evaluating them.

We will be specifically looking at:

Distributions of points scored, assists, rebounds, and steals in Regular and Post Season
Linear regressions of the two players' production efficiency, measured by the Points Per Minute(PPM) metric
Comparisons between two players to evaluate the effects of 'clutch' on each of their careers

We will begin by importing the relevant libraries:



In [1]:

    
#importing data organization and graphics packages
import pandas as pd                  
import matplotlib.pyplot as plt       
%matplotlib inline 
import numpy as np
import scipy.stats as stats
import datetime as dt
import sys
import seaborn as sb
import warnings
print('Today is', dt.date.today())
print('What version of Python are we running? \n', sys.version, sep='') 
warnings.filterwarnings('ignore')









    



Today is 2016-12-22
What version of Python are we running? 
3.5.2 |Anaconda 4.2.0 (x86_64)| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)]

NBA Python Module

The data we will be using for this project will be coming from Savvastj's nbashots module. This module accesses the stats.nba.com API through specialized methods.

To install, please run on your Terminal/Command Prompt:

pip install nbashots

We will be importing this package for the project:



In [3]:

    
#NBA module
import nbashots as nba

The module organizes player statistics by Player IDs. Here we will take a brief look at Kobe's ID and game statistics from the most recent season, 2015-16:



In [4]:

    
kobe_id = nba.get_player_id("Bryant, Kobe")[0]
kobe_id









    Out[4]:





977



In [5]:

    
kobe_gamelogs = nba.PlayerLog(kobe_id)
kobe_df = kobe_gamelogs.get_game_logs()
kobe_df.head()









    Out[5]:






  
    
      
      SEASON_ID
      Player_ID
      Game_ID
      GAME_DATE
      MATCHUP
      WL
      MIN
      FGM
      FGA
      FG_PCT
      ...
      DREB
      REB
      AST
      STL
      BLK
      TOV
      PF
      PTS
      PLUS_MINUS
      VIDEO_AVAILABLE
    
  
  
    
      0
      22015
      977
      0021501228
      2016-04-13
      LAL vs. UTA
      W
      42
      22
      50
      0.440
      ...
      4
      4
      4
      1
      1
      2
      1
      60
      7
      1
    
    
      1
      22015
      977
      0021501209
      2016-04-11
      LAL @ OKC
      L
      19
      4
      12
      0.333
      ...
      1
      1
      0
      1
      0
      2
      2
      13
      -24
      1
    
    
      2
      22015
      977
      0021501195
      2016-04-10
      LAL @ HOU
      L
      27
      10
      22
      0.455
      ...
      2
      2
      1
      2
      0
      4
      1
      35
      -10
      1
    
    
      3
      22015
      977
      0021501184
      2016-04-08
      LAL @ NOP
      L
      22
      4
      15
      0.267
      ...
      3
      3
      4
      1
      0
      2
      0
      14
      -16
      1
    
    
      4
      22015
      977
      0021501172
      2016-04-06
      LAL vs. LAC
      L
      28
      6
      19
      0.316
      ...
      3
      3
      1
      1
      0
      1
      1
      17
      -15
      1
    
  

5 rows × 27 columns



In [6]:

    
print(kobe_df.dtypes)
print()
print(kobe_df.shape)









    



SEASON_ID                  object
Player_ID                   int64
Game_ID                    object
GAME_DATE          datetime64[ns]
MATCHUP                    object
WL                         object
MIN                         int64
FGM                         int64
FGA                         int64
FG_PCT                    float64
FG3M                        int64
FG3A                        int64
FG3_PCT                   float64
FTM                         int64
FTA                         int64
FT_PCT                    float64
OREB                        int64
DREB                        int64
REB                         int64
AST                         int64
STL                         int64
BLK                         int64
TOV                         int64
PF                          int64
PTS                         int64
PLUS_MINUS                  int64
VIDEO_AVAILABLE             int64
dtype: object

(66, 27)

Thankfully, the data returned is mostly in int and float form, making it easier to analyze the data. The dataframe shape shows that Kobe played 66 games during the 2015-2016 regular season.

Now we take a brief look at Jordan's ID and statistics:



In [7]:

    
jordan_id = nba.get_player_id("Jordan, Michael")[0]
jordan_id









    Out[7]:





893



In [8]:

    
jordan_gamelogs = nba.PlayerLog(jordan_id)
jordan_df = jordan_gamelogs.get_game_logs()
jordan_df









    Out[8]:






  
    
      
      SEASON_ID
      Player_ID
      Game_ID
      GAME_DATE
      MATCHUP
      WL
      MIN
      FGM
      FGA
      FG_PCT
      ...
      DREB
      REB
      AST
      STL
      BLK
      TOV
      PF
      PTS
      PLUS_MINUS
      VIDEO_AVAILABLE
    
  
  
  

0 rows × 27 columns

The lack of data is due to the PlayerLog method being defaulted to the 2015-16 season. Michael Jordan, who retired in 2003, obviously did not play then.

According to the module creator, Savvastj, the PlayerLog method has four parameters:

player_id
league_id,
- default "'00'"
Season
- default "2015-16"
Season_Type
- default "Regular Season"

This can be corrected by changing the Season parameter to the 1995-96 season instead. The 1995-96 is a very famous season for the Chicago Bulls, when Jordan's Bulls earned 72 wins and 10 losses, becoming the only team in NBA history to win over 70 games and the NBA title in the same season.



In [9]:

    
jordan_gamelogs = nba.PlayerLog(jordan_id, '00', '1995-96', 'Regular Season')
jordan_df = jordan_gamelogs.get_game_logs()
jordan_df.head()









    Out[9]:






  
    
      
      SEASON_ID
      Player_ID
      Game_ID
      GAME_DATE
      MATCHUP
      WL
      MIN
      FGM
      FGA
      FG_PCT
      ...
      DREB
      REB
      AST
      STL
      BLK
      TOV
      PF
      PTS
      PLUS_MINUS
      VIDEO_AVAILABLE
    
  
  
    
      0
      21995
      893
      0029501188
      1996-04-21
      CHI @ WAS
      W
      24
      11
      22
      0.500
      ...
      3
      4
      4
      4
      0
      1
      0
      26
      None
      0
    
    
      1
      21995
      893
      0029501175
      1996-04-20
      CHI vs. IND
      L
      31
      9
      23
      0.391
      ...
      2
      2
      6
      1
      1
      3
      4
      24
      None
      0
    
    
      2
      21995
      893
      0029501158
      1996-04-18
      CHI vs. DET
      W
      24
      11
      18
      0.611
      ...
      4
      5
      1
      1
      0
      2
      1
      30
      None
      0
    
    
      3
      21995
      893
      0029501144
      1996-04-16
      CHI @ MIL
      W
      42
      9
      27
      0.333
      ...
      8
      9
      4
      2
      1
      4
      1
      22
      None
      0
    
    
      4
      21995
      893
      0029501134
      1996-04-14
      CHI @ CLE
      W
      35
      11
      18
      0.611
      ...
      9
      12
      3
      1
      0
      3
      0
      32
      None
      0
    
  

5 rows × 27 columns

By changing the Season_Type parameter to 'Playoffs', we can pull Jordan's playoff games instead.



In [10]:

    
jordan_gamelogs1 = nba.PlayerLog(jordan_id, '00', '1995-96', 'Playoffs')
jordan_df1 = jordan_gamelogs1.get_game_logs()
jordan_df1.head()









    Out[10]:






  
    
      
      SEASON_ID
      Player_ID
      Game_ID
      GAME_DATE
      MATCHUP
      WL
      MIN
      FGM
      FGA
      FG_PCT
      ...
      DREB
      REB
      AST
      STL
      BLK
      TOV
      PF
      PTS
      PLUS_MINUS
      VIDEO_AVAILABLE
    
  
  
    
      0
      41995
      893
      0049500068
      1996-06-16
      CHI vs. SEA
      W
      43
      5
      19
      0.263
      ...
      6
      9
      7
      2
      0
      5
      3
      22
      None
      0
    
    
      1
      41995
      893
      0049500067
      1996-06-14
      CHI @ SEA
      L
      43
      11
      22
      0.500
      ...
      2
      4
      1
      1
      0
      2
      3
      26
      None
      0
    
    
      2
      41995
      893
      0049500066
      1996-06-12
      CHI @ SEA
      L
      41
      6
      19
      0.316
      ...
      2
      3
      2
      1
      0
      4
      3
      23
      None
      0
    
    
      3
      41995
      893
      0049500065
      1996-06-09
      CHI @ SEA
      W
      41
      11
      23
      0.478
      ...
      1
      3
      5
      2
      0
      3
      1
      36
      None
      0
    
    
      4
      41995
      893
      0049500064
      1996-06-07
      CHI vs. SEA
      W
      43
      9
      22
      0.409
      ...
      4
      6
      8
      2
      0
      2
      3
      29
      None
      0
    
  

5 rows × 27 columns

We can do some simple analysis on the data we pull by applying some pandas methods.

According to the shape and mean of the jordan_df dataframe, Jordan played 82 games, or the full season, in the 1995-96 regular season, and averaged 30 points, 4 assists, 2 steals and almost 7 rebounds a game:



In [11]:

    
print(jordan_df.shape)



In [12]:

    
jordan_df.mean()









    Out[12]:





Player_ID          893.000000
MIN                 37.682927
FGM                 11.170732
FGA                 22.560976
FG_PCT               0.494646
FG3M                 1.353659
FG3A                 3.170732
FG3_PCT              0.342037
FTM                  6.682927
FTA                  8.012195
FT_PCT               0.829329
OREB                 1.804878
DREB                 4.817073
REB                  6.621951
AST                  4.292683
STL                  2.195122
BLK                  0.512195
TOV                  2.402439
PF                   2.378049
PTS                 30.378049
VIDEO_AVAILABLE      0.000000
dtype: float64

Data Compilation

To gather the appropriate data to analyze Kobe and Jordan, we need to first aggregate their statistics over the seasons. We will start by aggregating Kobe's career games.

Kobe played for 18 active seasons. The variable k_active_seasons is assigned for this statistic:



In [13]:

    
k_active_seasons = ['1996-97','1997-98','1998-99','1999-00','2000-01','2001-02','2002-03','2003-04','2004-05','2005-06','2006-07','2007-08','2008-09','2009-10','2010-11','2011-12','2012-13','2013-14','2014-15','2015-16']

Then we loop through the seasons to aggregate all of his career games during the regular season:



In [14]:

    
def total_reg(player_id, active_seasons):
    a = {}
    a = pd.DataFrame(a)
    for i in active_seasons:
        gamelogs = nba.PlayerLog(player_id, '00', i, 'Regular Season')
        df = gamelogs.get_game_logs()
        a = a.append(df, ignore_index = True)
    return a



In [15]:

    
agg_kobe_reg = total_reg(kobe_id, k_active_seasons)
agg_kobe_reg.shape









    Out[15]:





(1346, 27)

Since for any season which there were no games played returns an empty DataFrame, we can apply the same loop for Kobe's playoff games:



In [16]:

    
def total_po(player_id, active_seasons):
    a = {}
    a = pd.DataFrame(a)
    for i in active_seasons:
        gamelogs = nba.PlayerLog(player_id, '00', i, 'Playoffs')
        df = gamelogs.get_game_logs()
        a = a.append(df, ignore_index = True)
    return a



In [17]:

    
agg_kobe_po = total_po(kobe_id, k_active_seasons)
agg_kobe_po.shape









    Out[17]:





(220, 27)

We assign the variable agg_kobe with all game statistics, regardless of regular season or playoffs.



In [18]:

    
agg_kobe = agg_kobe_reg.append(agg_kobe_po)
agg_kobe.shape









    Out[18]:





(1566, 27)

By assigning Jordan's active seasons to a variable, we can aggregate Jordan's regular season games and playoff games as well:



In [19]:

    
j_active_seasons = ['1984-85','1985-86','1986-87','1987-88','1988-89', '1989-90','1990-91','1991-92','1992-93','1993-94','1994-95','1995-96','1996-97','1997-98','2001-02','2002-03']



In [20]:

    
agg_jordan_reg = total_reg(jordan_id, j_active_seasons)
agg_jordan_reg.shape









    Out[20]:





(1072, 27)



In [21]:

    
agg_jordan_po = total_po(jordan_id, j_active_seasons)
agg_jordan_po.shape









    Out[21]:





(179, 27)



In [22]:

    
agg_jordan = agg_jordan_reg.append(agg_jordan_po, ignore_index = True)
agg_jordan.shape









    Out[22]:





(1251, 27)

We can call on the career average stats for Kobe and Jordan through:



In [23]:

    
agg_kobe.mean()









    Out[23]:





Player_ID          977.000000
MIN                 36.565773
FGM                  8.769476
FGA                 19.603448
FG_PCT               0.441227
FG3M                 1.353129
FG3A                 4.104725
FG3_PCT              0.279711
FTM                  6.192848
FTA                  7.425287
FT_PCT               0.782725
OREB                 1.104087
DREB                 4.110473
REB                  5.214559
AST                  4.690932
STL                  1.439336
BLK                  0.500639
TOV                  2.973819
PF                   2.562580
PTS                 25.084930
PLUS_MINUS           3.010217
VIDEO_AVAILABLE      0.114943
dtype: float64

Going through the agg_jordan dataframe, I've noticed that there are 'None' entries for when the appropriate data is '0'.

We will use the .fillna(0) method to help fix that:



In [24]:

    
agg_jordan.head()









    Out[24]:






  
    
      
      SEASON_ID
      Player_ID
      Game_ID
      GAME_DATE
      MATCHUP
      WL
      MIN
      FGM
      FGA
      FG_PCT
      ...
      DREB
      REB
      AST
      STL
      BLK
      TOV
      PF
      PTS
      PLUS_MINUS
      VIDEO_AVAILABLE
    
  
  
    
      0
      21984
      893.0
      0028400936
      1985-04-13
      CHI @ NJN
      L
      0.0
      9.0
      24.0
      0.375
      ...
      None
      2.0
      9.0
      None
      NaN
      None
      NaN
      29.0
      None
      0.0
    
    
      1
      21984
      893.0
      0028400930
      1985-04-12
      CHI vs. ATL
      L
      0.0
      11.0
      16.0
      0.688
      ...
      None
      3.0
      6.0
      None
      NaN
      None
      NaN
      28.0
      None
      0.0
    
    
      2
      21984
      893.0
      0028400903
      1985-04-08
      CHI @ IND
      L
      0.0
      6.0
      20.0
      0.300
      ...
      None
      8.0
      5.0
      None
      NaN
      None
      NaN
      22.0
      None
      0.0
    
    
      3
      21984
      893.0
      0028400892
      1985-04-06
      CHI @ ATL
      W
      0.0
      12.0
      18.0
      0.667
      ...
      None
      7.0
      4.0
      None
      NaN
      None
      NaN
      33.0
      None
      0.0
    
    
      4
      21984
      893.0
      0028400890
      1985-04-05
      CHI @ PHL
      L
      38.0
      13.0
      21.0
      0.619
      ...
      None
      1.0
      5.0
      None
      0.0
      None
      6.0
      40.0
      None
      0.0
    
  

5 rows × 27 columns



In [25]:

    
agg_jordan.fillna(0).mean()









    Out[25]:





Player_ID          893.000000
MIN                 36.386890
FGM                 11.494804
FGA                 23.202238
FG_PCT               0.494349
FG3M                 0.582734
FG3A                 1.737810
FG3_PCT              0.188791
FTM                  7.026379
FTA                  8.423661
FT_PCT               0.806135
OREB                 1.443645
DREB                 4.383693
REB                  6.254197
AST                  5.319744
STL                  2.153477
BLK                  0.786571
TOV                  2.541167
PF                   2.458833
PTS                 30.598721
PLUS_MINUS           1.390887
VIDEO_AVAILABLE      0.000000
dtype: float64

Data Analysis

To begin comparing the two players, we can fist look at the disparity between the average career stats of Kobe and Jordan:



In [26]:

    
agg_disparity = agg_kobe.mean() - agg_jordan.fillna(0).mean()
agg_disparity









    Out[26]:





Player_ID          84.000000
MIN                 0.178882
FGM                -2.725328
FGA                -3.598790
FG_PCT             -0.053121
FG3M                0.770395
FG3A                2.366916
FG3_PCT             0.090920
FTM                -0.833531
FTA                -0.998374
FT_PCT             -0.023410
OREB               -0.339558
DREB               -0.273221
REB                -1.039637
AST                -0.628812
STL                -0.714141
BLK                -0.285932
TOV                 0.432652
PF                  0.103747
PTS                -5.513791
PLUS_MINUS          1.619330
VIDEO_AVAILABLE     0.114943
dtype: float64

According to the data, Jordan contributed more on a per-game basis compared to Kobe Bryant. Jordan averaged around 5 more points, 1 more rebound, 0.62 more assists, and 0.71 more steals per game in his career than Kobe.

For players at the calibre of Jordan and Kobe, such a difference is relatively insignificant. The disparity is likey due to Kobe playing alongside Hall-of-Fame Center Shaquille O'Neal for a significant portion of his career, thus having to share playing time and the ball more often than Jordan.

We then look at the distribution of their career averages. Focusing on the statistical distribution of the statistics may be a good way to understand their consistency throughout their career performing on the court. We will be looking specifically at points scored, assists, rebounds, and steals as those are typically the key stats evaluated for the guard position.

We can graph a histogram of their career stats in a normal distribution to see if there are any significant differences:



In [27]:

    
fig, ax = plt.subplots(2,4)
fig.set_size_inches(9.5, 5.5)
fig.suptitle('Statistic Distribution Comparison: Kobe vs. MJ ')
agg_kobe['PTS'].plot(ax = ax[0,0],
                     kind='hist',
                     bins = 50,
                     sharex= True,
                     title = 'Kobe Pt Dist.',
                     fontsize=8,
                     normed=True)

agg_jordan['PTS'].plot(ax = ax[1,0],
                     kind='hist',
                     bins = 50,
                     sharex = True,
                     sharey = True,
                     title = 'MJ Pt Dist.',
                     fontsize=8,
                     normed=True)

agg_kobe['AST'].plot(ax = ax[0,1],
                     kind='hist',
                     bins = 50,
                     sharey = True,
                     title = 'Kobe Assist Dist.',
                     fontsize=8,
                     normed=True)

agg_jordan['AST'].plot(ax = ax[1,1],
                     kind='hist',
                     bins = 50,
                     sharex= True,
                     sharey=True,
                     title = 'MJ Assist Dist.',
                     fontsize=8,
                     normed=True)

agg_kobe['REB'].plot(ax = ax[0,2],
                     kind='hist',
                     bins = 50,
                     sharey = True,
                     title = 'Kobe Reb Dist.',
                     fontsize=8,
                     normed=True)

agg_jordan['REB'].plot(ax = ax[1,2],
                     kind='hist',
                     bins = 50,
                     sharex= True,
                     sharey=True,
                     title = 'MJ Reb Dist.',
                     fontsize=8,
                     normed=True)

agg_kobe['STL'].plot(ax = ax[0,3],
                     kind='hist',
                     bins = 50,
                     sharey = True,
                     title = 'Kobe Stls Dist.',
                     fontsize=8,
                     normed=True)

agg_jordan['STL'].plot(ax = ax[1,3],
                     kind='hist',
                     bins = 50,
                     sharex= True,
                     sharey=True,
                     title = 'MJ Stls Dist.',
                     fontsize=8,
                     normed=True)









    Out[27]:





<matplotlib.axes._subplots.AxesSubplot at 0x11b40d898>

Besides Jordan's points scored distribution having a higher mean than that of Kobe and a tighter distribution for Jordan's career rebounds, we do not see any significant differences.

We then take a more numeric approach. Through the .std() method, we can see the standard deviations of the career statistics of Kobe and Jordan to measure how much each player fluctuates in their career stats:



In [28]:

    
print(agg_kobe.std())
print()
print(agg_jordan.std())









    



Player_ID           0.000000
MIN                 8.647248
FGM                 3.949019
FGA                 7.401612
FG_PCT              0.132591
FG3M                1.463009
FG3A                2.882893
FG3_PCT             0.262201
FTM                 4.017196
FTA                 4.595786
FT_PCT              0.251492
OREB                1.156185
DREB                2.424830
REB                 2.792723
AST                 2.788888
STL                 1.275409
BLK                 0.783006
TOV                 1.872950
PF                  1.423977
PTS                10.605674
PLUS_MINUS         12.702289
VIDEO_AVAILABLE     0.319054
dtype: float64

Player_ID           0.000000
MIN                10.806914
FGM                 3.841533
FGA                 6.005853
FG_PCT              0.107084
FG3M                0.993655
FG3A                1.869167
FTM                 4.121566
FTA                 4.703144
FT_PCT              0.204070
REB                 2.992680
AST                 2.755083
BLK                 1.011970
PF                  1.404895
PTS                 9.691751
VIDEO_AVAILABLE     0.000000
dtype: float64



In [29]:

    
print(agg_kobe.std()-agg_jordan.fillna(0).std())









    



Player_ID          0.000000
MIN               -2.159666
FGM                0.107486
FGA                1.395759
FG_PCT             0.025507
FG3M               0.469355
FG3A               1.018274
FG3_PCT           -0.043419
FTM               -0.104369
FTA               -0.107358
FT_PCT             0.045047
OREB              -0.275116
DREB              -0.277736
REB               -0.199957
AST                0.033805
STL               -0.419414
BLK               -0.217799
TOV                0.063943
PF                -0.071785
PTS                0.913923
PLUS_MINUS         5.704469
VIDEO_AVAILABLE    0.319054
dtype: float64

We see that Jordan is more consistent with his scoring throughout his career than Kobe with a standard deviation of 9.69 in points compared to a standard deviation of 10.60 to Kobe as well as assists, with a standard deviation of 2.76 for Jordan to Kobe's 2.79. However, Kobe has stronger career consistency in terms of rebounds and steals.

These differences are not significant enough to draw any conclusions for comparing the two players.

We move on to evaluate specifically how playing in the playoffs affect both player's performance statsitcs, or their respective 'clutch'. We will name the percent difference in performance statstics during the post season 'clutch factor'.

First, we compile the Kobe's statistical averages for the regular and post season into one dataframe:



In [30]:

    
kobe_rmean_df = pd.DataFrame(agg_kobe_reg.mean())
kobe_rmean_df.columns = ['kobe_reg_mean']
kobe_rmean_df = kobe_rmean_df.reset_index()
kobe_rmean_df.tail()









    Out[30]:






  
    
      
      index
      kobe_reg_mean
    
  
  
    
      17
      TOV
      2.979198
    
    
      18
      PF
      2.491085
    
    
      19
      PTS
      24.994799
    
    
      20
      PLUS_MINUS
      3.087667
    
    
      21
      VIDEO_AVAILABLE
      0.133730



In [31]:

    
kobe_pmean_df = pd.DataFrame(agg_kobe_po.mean())
kobe_pmean_df.columns = ['kobe_po_mean']
kobe_pmean_df = kobe_pmean_df.reset_index()
kobe_pmean_df.tail()









    Out[31]:






  
    
      
      index
      kobe_po_mean
    
  
  
    
      17
      TOV
      2.940909
    
    
      18
      PF
      3.000000
    
    
      19
      PTS
      25.636364
    
    
      20
      PLUS_MINUS
      2.536364
    
    
      21
      VIDEO_AVAILABLE
      0.000000



In [32]:

    
kobe_clutch = pd.merge(kobe_rmean_df, kobe_pmean_df, how = 'right')
kobe_clutch.tail()









    Out[32]:






  
    
      
      index
      kobe_reg_mean
      kobe_po_mean
    
  
  
    
      17
      TOV
      2.979198
      2.940909
    
    
      18
      PF
      2.491085
      3.000000
    
    
      19
      PTS
      24.994799
      25.636364
    
    
      20
      PLUS_MINUS
      3.087667
      2.536364
    
    
      21
      VIDEO_AVAILABLE
      0.133730
      0.000000

Then we generate a new column showing the percent improvement by generating a new column:



In [33]:

    
kobe_clutch['k_clutch_factor'] = 100*(kobe_clutch['kobe_po_mean'] - kobe_clutch['kobe_reg_mean'])/kobe_clutch['kobe_reg_mean']
kobe_clutch.tail()









    Out[33]:






  
    
      
      index
      kobe_reg_mean
      kobe_po_mean
      k_clutch_factor
    
  
  
    
      17
      TOV
      2.979198
      2.940909
      -1.285196
    
    
      18
      PF
      2.491085
      3.000000
      20.429466
    
    
      19
      PTS
      24.994799
      25.636364
      2.566791
    
    
      20
      PLUS_MINUS
      3.087667
      2.536364
      -17.855018
    
    
      21
      VIDEO_AVAILABLE
      0.133730
      0.000000
      -100.000000

We apply the same to Jordan's data:



In [34]:

    
jordan_rmean_df = pd.DataFrame(agg_jordan_reg.fillna(0).mean())
jordan_rmean_df.columns = ['jordan_reg_mean']
jordan_rmean_df = jordan_rmean_df.reset_index()


jordan_pmean_df = pd.DataFrame(agg_jordan_po.mean())
jordan_pmean_df.columns = ['jordan_po_mean']
jordan_pmean_df = jordan_pmean_df.reset_index()

jordan_clutch = pd.merge(jordan_rmean_df, jordan_pmean_df, how = 'right')
jordan_clutch['j_clutch_factor'] = 100*(jordan_clutch['jordan_po_mean'] - jordan_clutch['jordan_reg_mean'])/jordan_clutch['jordan_reg_mean']
jordan_clutch.tail()









    Out[34]:






  
    
      
      index
      jordan_reg_mean
      jordan_po_mean
      j_clutch_factor
    
  
  
    
      15
      BLK
      0.770522
      0.882682
      14.556251
    
    
      16
      TOV
      2.456157
      3.050279
      24.189117
    
    
      17
      PF
      2.364739
      3.022346
      27.808888
    
    
      18
      PTS
      30.123134
      33.446927
      11.034021
    
    
      19
      VIDEO_AVAILABLE
      0.000000
      0.000000
      NaN

We can then merge the two dataframes, and drop all the rows which are unrelated to the points scored, assists, and rebound categories.



In [35]:

    
clutch_factor = pd.merge(kobe_clutch, jordan_clutch)
clutch_factor=clutch_factor.set_index('index')
clutch_factor.shape









    Out[35]:





(20, 6)



In [36]:

    
clutch_factor_t = clutch_factor.drop(clutch_factor.index[0:12])
clutch_factor_tt = clutch_factor_t.drop(clutch_factor_t.index[3:6])
clutch_factor_tt = clutch_factor_tt.drop(clutch_factor_tt.index[4])
clutch_factor_f = clutch_factor_tt



In [37]:

    
clutch_factor_f









    Out[37]:






  
    
      
      kobe_reg_mean
      kobe_po_mean
      k_clutch_factor
      jordan_reg_mean
      jordan_po_mean
      j_clutch_factor
    
    
      index
      
      
      
      
      
      
    
  
  
    
      REB
      5.235513
      5.086364
      -2.848794
      6.223881
      6.435754
      3.404204
    
    
      AST
      4.684993
      4.727273
      0.902459
      5.254664
      5.709497
      8.655796
    
    
      STL
      1.444279
      1.409091
      -2.436401
      2.162313
      2.100559
      -2.855958
    
    
      PTS
      24.994799
      25.636364
      2.566791
      30.123134
      33.446927
      11.034021

To understand the differences in percent difference, we can create a graphical representation of 'clutch factor' through the following plot:



In [38]:

    
fig, ax = plt.subplots()
clutch_factor_f['k_clutch_factor'].plot(kind = 'line',
                                       color = 'blue',
                                       legend = True)
clutch_factor_f['j_clutch_factor'].plot(kind = 'line',
                                     color = 'red',
                                     legend = True)
clutch_factor_f['k_clutch_factor'].plot(kind = 'bar',
                                       color = 'blue',
                                        alpha = .5,
                                       legend = True)
clutch_factor_f['j_clutch_factor'].plot(kind = 'bar',
                                     color = 'red',
                                        alpha = .5,
                                     legend = True)
ax.set_ylabel('Clutch Factor%')
ax.set_xlabel('Statistic')
ax.set_title('Kobe vs. Jordan: Clutch Comparison')









    Out[38]:





<matplotlib.text.Text at 0x11b93e940>

The difference between the two players are more obvious in this figure. In every statistic other than steals, Michael Jordan's statistics improve during playoff time. Although Kobe does perform better in terms of assists and points scored in the post-season, he rebounds less and makes fewer steals. Jordan's improvements in assists and points scored are also to a much greater degree than to that of Kobe.

We will try to extend the definition of 'clutch' to encapsulate efficiency improvements during playoff time as well. Both Kobe and Jordan are Shooting Guards (SG). Their role on a basketball team is heavily based on the points they contribute to the game. To understand their production efficiency, we can look at their points per minute metric. We will run a regression on the data between points scored and minutes played, and will compare the differences in the slopes of the regression between the regular and post season for the two players.

We will be using NumPy's .polyfit method to apply a least-square regression on the data. We've noticed that some of Michael Jordan's data is lacking the minutes data, thus we will be dropping those data points for the case of this regression.



In [39]:

    
fig, ax = plt.subplots(2)
fig.suptitle('OLS Regression Points by Minute', horizontalalignment= 'right', verticalalignment = 'bottom', size = 12)
agg_jordan_m = agg_jordan.drop(agg_jordan.index[agg_jordan['MIN']==0.0],axis = 0) #dropping missing or broken data


k_coeff = np.polyfit(x = agg_kobe['MIN'], y = agg_kobe['PTS'], deg =1)
k_ffit = np.poly1d(k_coeff)
k_linreg = k_ffit(agg_kobe['MIN'])
k_linreg = pd.DataFrame(k_linreg)
k_linreg.columns = ['VAL']

j_coeff = np.polyfit(x = agg_jordan_m['MIN'], y = agg_jordan_m['PTS'], deg =1)
j_ffit = np.poly1d(j_coeff)
j_linreg = j_ffit(agg_jordan_m['MIN'])
j_linreg = pd.DataFrame(j_linreg)
j_linreg.columns = ['VAL']

k_linreg.plot(ax=ax[0],
           kind = 'line',
           x = agg_kobe['MIN'],
           legend = False,
           color = 'red')

j_linreg.plot(ax=ax[1],
           kind = 'line',
           x = agg_jordan_m['MIN'],
           legend = False,
           color = 'red')

agg_kobe.plot(ax=ax[0],
              kind = 'scatter',
              x = 'MIN',
              y = 'PTS',
              color = 'blue',
              alpha = 0.5,
              title = 'Kobe Bryant',
              sharex = True,
              ylim = (0,85))

agg_jordan_m.plot(ax=ax[1],
              kind = 'scatter',
              x = 'MIN',
              y = 'PTS',
              color = 'blue',
              alpha = 0.5,
              title = 'Michael Jordan',
              ylim = (0,85))

ax[0].set_xlim(0,60)
ax[1].set_xlim(0,60)









    Out[39]:





(0, 60)

According to the slope of the ordianry least-square regression, the career points per minute for Kobe and Jordan are as follows:



In [40]:

    
kobe_career_ppm = k_ffit[1]
kobe_career_ppm









    Out[40]:





0.78405926634808731



In [41]:

    
jordan_career_ppm = j_ffit[1]
jordan_career_ppm









    Out[41]:





0.81381823309935786

Overall, Jordan is more efficient than Kobe in terms of points production by minute by looking at the slope of the OLS regression.

Now we may separate their statistics from Regular Season and Playoff Season to see the difference in the points per minute metric, or their production efficiency 'clutch', when it comes to the postseason.

We generate separate OLS regression for each section of the data:



In [42]:

    
agg_jordan_reg1 = agg_jordan_reg.drop(agg_jordan_reg.index[agg_jordan_reg['MIN']==0.0],axis = 0) 

k_reg_coeff = np.polyfit(x = agg_kobe_reg['MIN'], y = agg_kobe_reg['PTS'], deg =1)
k_reg_ffit = np.poly1d(k_reg_coeff)
k_reg_linreg = k_ffit(agg_kobe_reg['MIN'])
k_reg_linreg = pd.DataFrame(k_reg_linreg)
k_reg_linreg.columns = ['VAL']

k_po_coeff = np.polyfit(x = agg_kobe_po['MIN'], y = agg_kobe_po['PTS'], deg =1)
k_po_ffit = np.poly1d(k_po_coeff)
k_po_linreg = k_ffit(agg_kobe_po['MIN'])
k_po_linreg = pd.DataFrame(k_po_linreg)
k_po_linreg.columns = ['VAL']

j_reg_coeff = np.polyfit(x = agg_jordan_reg1['MIN'], y = agg_jordan_reg1['PTS'], deg =1)
j_reg_ffit = np.poly1d(j_reg_coeff)
j_reg_linreg = j_reg_ffit(agg_jordan_reg1['MIN'])
j_reg_linreg = pd.DataFrame(j_reg_linreg)
j_reg_linreg.columns = ['VAL']

j_po_coeff = np.polyfit(x = agg_jordan_po['MIN'], y = agg_jordan_po['PTS'], deg =1)
j_po_ffit = np.poly1d(j_po_coeff)
j_po_linreg = j_po_ffit(agg_jordan_po['MIN'])
j_po_linreg = pd.DataFrame(j_po_linreg)
j_po_linreg.columns = ['VAL']

Using the regressions and career statistics data we had previously compiled, we can plot the data into a 2 x 2 subplot:



In [43]:

    
fig, ax = plt.subplots(2,2)

agg_jordan_reg1.plot(ax=ax[0,1],
              kind = 'scatter',
              x = 'MIN',
              y = 'PTS',
              color = 'blue',
              alpha = 0.5,
              title = 'MJ Regular',
              ylim = (0,85),
              sharex = True,
              sharey = True)

j_reg_linreg.plot(ax=ax[0,1],
           kind = 'line',
           x = agg_jordan_reg1['MIN'],
           legend = False,
           color = 'red')

agg_jordan_po.plot(ax=ax[1,1],
              kind = 'scatter',
              x = 'MIN',
              y = 'PTS',
              color = 'blue',
              alpha = 0.5,
              title = 'MJ Post',
              ylim = (0,85))

j_po_linreg.plot(ax=ax[1,1],
           kind = 'line',
           x = agg_jordan_po['MIN'],
           legend = False,
           color = 'red')

agg_kobe_reg.plot(ax=ax[0,0],
              kind = 'scatter',
              x = 'MIN',
              y = 'PTS',
              color = 'blue',
              alpha = 0.5,
              title = 'Kobe Regular',
              ylim = (0,85))

k_reg_linreg.plot(ax=ax[0,0],
           kind = 'line',
           x = agg_kobe_reg['MIN'],
           legend = False,
           color = 'red')


agg_kobe_po.plot(ax=ax[1,0],
              kind = 'scatter',
              x = 'MIN',
              y = 'PTS',
              color = 'blue',
              alpha = 0.5,
              title = 'Kobe Post',
              ylim = (0,85))

k_po_linreg.plot(ax=ax[1,0],
           kind = 'line',
           x = agg_kobe_po['MIN'],
           legend = False,
           color = 'red')

ax[0,0].set_xlim(0,60)
ax[0,1].set_xlim(0,60)
ax[1,0].set_xlim(0,60)
ax[1,1].set_xlim(0,60)









    Out[43]:





(0, 60)

Looking at the scatter data, Jordan's Post season production efficiency stats seem to have a steeper slope than that of Kobe's, with their regular season efficiencies looking relatively similar. The efficiency 'clutch' improvement is calculated as follows:



In [44]:

    
k_reg_ppm = k_reg_ffit[1]
k_po_ppm = k_po_ffit[1]
print('Kobe:',k_reg_ppm, 'points per minute during the Regular Season')
print('Kobe:',k_po_ppm, 'points per minute during the Post Season')
print('Production Efficiency Clutch:', ((k_po_ppm-k_reg_ppm)/k_reg_ppm)*100,'%')









    



Kobe: 0.811275214269 points per minute during the Regular Season
Kobe: 0.693349959295 points per minute during the Post Season
Production Efficiency Clutch: -14.5357891995 %



In [45]:

    
j_reg_ppm = j_reg_ffit[1]
j_po_ppm = j_po_ffit[1]
print('Jordan:',j_reg_ppm, 'points per minute during the Regular Season')
print('Jordan:',j_po_ppm, 'points per minute during the Post Season')
print('Production Efficiency Clutch:', ((j_po_ppm-j_reg_ppm)/j_reg_ppm)*100,'%')









    



Jordan: 0.805195007487 points per minute during the Regular Season
Jordan: 0.838323050257 points per minute during the Post Season
Production Efficiency Clutch: 4.11428814903 %

Looking at clutch in terms of production efficiency, Jordan does better than Kobe, as Kobe actually scores over 10% fewer points per minute in the post season than in the regular season. This indicates that our previous analysis of Kobe having more points scored in the playoffs may simply be a product of him having more minutes on the court during his career postseasons.

Conclusion: Jordan is More 'Clutch'

Looking at the results from the data we pulled from Kobe and Jordan's career stats, we see that Jordan does hold the edge in terms of overall improvement during the playoffs over the regular season, or as what we defined as 'clutch'. Although there were not an enormous difference in their raw career stats in quantity and consistency, the disparity between regular season and the post season is observable for both key stats such as points scored, assists, rebounds, and steals, as well as production efficiency in terms of points per minute.

This difference in 'clutch' provides some quantitative reasoning behind the difference in awards and accolades Kobe and Jordan had garnered respectively in their careers. Both are obviously great players, but this may be part of the reason why many diehard fans as well as mainstream sports analysts such as ESPN consider Jordan to be the G.O.A.T. (Greatest of All Time) and Kobe just a legend in passing (check No.12).

Potential errors in this data report could be issues with multicollinearity on the regression for the production per minute OLS regression as well as the accounting for minor adjustments in averages due to Kobe playing 20 seasons in the NBA compared to Jordan's 15 (as he took a year break to pursue professional baseball, as well as a hiatus before joining the Washington Wizards).

This is the end of my project. I hope you had as much fun reading it as I had writing it.

	SEASON_ID	Player_ID	Game_ID	GAME_DATE	MATCHUP	WL	MIN	FGM	FGA	FG_PCT	...	DREB	REB	AST	STL	BLK	TOV	PF	PTS	PLUS_MINUS	VIDEO_AVAILABLE
0	22015	977	0021501228	2016-04-13	LAL vs. UTA	W	42	22	50	0.440	...	4	4	4	1	1	2	1	60	7	1
1	22015	977	0021501209	2016-04-11	LAL @ OKC	L	19	4	12	0.333	...	1	1	0	1	0	2	2	13	-24	1
2	22015	977	0021501195	2016-04-10	LAL @ HOU	L	27	10	22	0.455	...	2	2	1	2	0	4	1	35	-10	1
3	22015	977	0021501184	2016-04-08	LAL @ NOP	L	22	4	15	0.267	...	3	3	4	1	0	2	0	14	-16	1
4	22015	977	0021501172	2016-04-06	LAL vs. LAC	L	28	6	19	0.316	...	3	3	1	1	0	1	1	17	-15	1

	SEASON_ID	Player_ID	Game_ID	GAME_DATE	MATCHUP	WL	MIN	FGM	FGA	FG_PCT	...	DREB	REB	AST	STL	BLK	TOV	PF	PTS	PLUS_MINUS
0	21995	893	0029501188	1996-04-21	CHI @ WAS	W	24	11	22	0.500	...	3	4	4	4	0	1	0	26	None
1	21995	893	0029501175	1996-04-20	CHI vs. IND	L	31	9	23	0.391	...	2	2	6	1	1	3	4	24	None
2	21995	893	0029501158	1996-04-18	CHI vs. DET	W	24	11	18	0.611	...	4	5	1	1	0	2	1	30	None
3	21995	893	0029501144	1996-04-16	CHI @ MIL	W	42	9	27	0.333	...	8	9	4	2	1	4	1	22	None
4	21995	893	0029501134	1996-04-14	CHI @ CLE	W	35	11	18	0.611	...	9	12	3	1	0	3	0	32	None

	SEASON_ID	Player_ID	Game_ID	GAME_DATE	MATCHUP	WL	MIN	FGM	FGA	FG_PCT	...	DREB	REB	AST	STL	TOV	PF	PTS	PLUS_MINUS
0	41995	893	0049500068	1996-06-16	CHI vs. SEA	W	43	5	19	0.263	...	6	9	7	2	5	3	22	None
1	41995	893	0049500067	1996-06-14	CHI @ SEA	L	43	11	22	0.500	...	2	4	1	1	2	3	26	None
2	41995	893	0049500066	1996-06-12	CHI @ SEA	L	41	6	19	0.316	...	2	3	2	1	4	3	23	None
3	41995	893	0049500065	1996-06-09	CHI @ SEA	W	41	11	23	0.478	...	1	3	5	2	3	1	36	None
4	41995	893	0049500064	1996-06-07	CHI vs. SEA	W	43	9	22	0.409	...	4	6	8	2	2	3	29	None

	SEASON_ID	Player_ID	Game_ID	GAME_DATE	MATCHUP	WL	MIN	FGM	FGA	FG_PCT	...	DREB	REB	AST	STL	BLK	TOV	PF	PTS	PLUS_MINUS
0	21984	893.0	0028400936	1985-04-13	CHI @ NJN	L	0.0	9.0	24.0	0.375	...	None	2.0	9.0	None	NaN	None	NaN	29.0	None
1	21984	893.0	0028400930	1985-04-12	CHI vs. ATL	L	0.0	11.0	16.0	0.688	...	None	3.0	6.0	None	NaN	None	NaN	28.0	None
2	21984	893.0	0028400903	1985-04-08	CHI @ IND	L	0.0	6.0	20.0	0.300	...	None	8.0	5.0	None	NaN	None	NaN	22.0	None
3	21984	893.0	0028400892	1985-04-06	CHI @ ATL	W	0.0	12.0	18.0	0.667	...	None	7.0	4.0	None	NaN	None	NaN	33.0	None
4	21984	893.0	0028400890	1985-04-05	CHI @ PHL	L	38.0	13.0	21.0	0.619	...	None	1.0	5.0	None	0.0	None	6.0	40.0	None

	index	kobe_reg_mean
17	TOV	2.979198
18	PF	2.491085
19	PTS	24.994799
20	PLUS_MINUS	3.087667
21	VIDEO_AVAILABLE	0.133730

	index	kobe_po_mean
17	TOV	2.940909
18	PF	3.000000
19	PTS	25.636364
20	PLUS_MINUS	2.536364
21	VIDEO_AVAILABLE	0.000000

	index	kobe_reg_mean	kobe_po_mean	k_clutch_factor
17	TOV	2.979198	2.940909	-1.285196
18	PF	2.491085	3.000000	20.429466
19	PTS	24.994799	25.636364	2.566791
20	PLUS_MINUS	3.087667	2.536364	-17.855018
21	VIDEO_AVAILABLE	0.133730	0.000000	-100.000000

	index	jordan_reg_mean	jordan_po_mean	j_clutch_factor
15	BLK	0.770522	0.882682	14.556251
16	TOV	2.456157	3.050279	24.189117
17	PF	2.364739	3.022346	27.808888
18	PTS	30.123134	33.446927	11.034021
19	VIDEO_AVAILABLE	0.000000	0.000000	NaN

	kobe_reg_mean	kobe_po_mean	k_clutch_factor	jordan_reg_mean	jordan_po_mean	j_clutch_factor
index
REB	5.235513	5.086364	-2.848794	6.223881	6.435754	3.404204
AST	4.684993	4.727273	0.902459	5.254664	5.709497	8.655796
STL	1.444279	1.409091	-2.436401	2.162313	2.100559	-2.855958
PTS	24.994799	25.636364	2.566791	30.123134	33.446927	11.034021