KNN exercise with NBA player data

Introduction

  • NBA player statistics from 2014-2015 (partial season): data, data dictionary
  • Goal: Predict player position using assists, steals, blocks, turnovers, and personal fouls

Step 1: Read the data into Pandas


In [1]:
# read the data into a DataFrame
import pandas as pd
url = 'https://raw.githubusercontent.com/kjones8812/DAT4-students/master/kerry/Final/NBA_players_2015.csv'
nba = pd.read_csv(url, index_col=0)
nba.head()


Out[1]:
season_end player pos age bref_team_id g gs mp fg fga ... TOV% USG% OWS DWS WS WS/48 OBPM DBPM BPM VORP
0 2015 Quincy Acy F 24 NYK 52 21 19.2 2.2 4.6 ... 15.1 14.7 0.6 0.5 1.0 0.050 -2.6 -0.7 -3.4 -0.3
1 2015 Jordan Adams G 20 MEM 18 0 7.3 1.0 2.1 ... 15.9 17.7 0.0 0.2 0.2 0.076 -2.3 1.8 -0.5 0.0
2 2015 Steven Adams C 21 OKC 51 50 24.2 3.0 5.5 ... 19.2 14.8 1.0 1.8 2.8 0.109 -2.0 2.0 -0.1 0.6
3 2015 Jeff Adrien F 28 MIN 17 0 12.6 1.1 2.6 ... 12.9 14.1 0.2 0.2 0.4 0.093 -2.6 0.8 -1.8 0.0
4 2015 Arron Afflalo G 29 TOT 60 54 32.5 5.0 11.8 ... 10.9 19.6 1.4 0.7 2.1 0.051 -0.2 -1.4 -1.6 0.2

5 rows × 49 columns


In [2]:
# examine the columns

In [3]:
# examine the positions

Step 2: Create X and y

Use the following features: assists, steals, blocks, turnovers, personal fouls


In [4]:
# map positions to numbers

In [5]:
# create feature matrix (X) (use fields: 'ast', 'stl', 'blk', 'tov', 'pf')

In [6]:
# create response vector (y)

Step 3: Train a KNN model (K=5)


In [7]:
# import class

In [8]:
# instantiate with K=5

In [9]:
# fit with data

Step 4: Predict player position and calculate predicted probability of each position

Predict for a player with these statistics: 1 assist, 1 steal, 0 blocks, 1 turnover, 2 personal fouls


In [10]:
# create a list to represent a player

In [11]:
# make a prediction

In [12]:
# calculate predicted probabilities

Step 5: Repeat steps 3 and 4 using K=50


In [13]:
# repeat for K=50

In [14]:
# calculate predicted probabilities

Bonus: Explore the features to decide which ones are predictive


In [15]:
# allow plots to appear in the notebook
%matplotlib inline
import matplotlib.pyplot as plt

# increase default figure and font sizes for easier viewing
plt.rcParams['figure.figsize'] = (6, 4)
plt.rcParams['font.size'] = 14