# KNN exercise with NBA player data

## Introduction

• NBA player statistics from 2014-2015 (partial season): data, data dictionary
• Goal: Predict player position using assists, steals, blocks, turnovers, and personal fouls

## Step 1: Read the data into Pandas

``````

In [1]:

# read the data into a DataFrame
import pandas as pd
url = 'https://raw.githubusercontent.com/kjones8812/DAT4-students/master/kerry/Final/NBA_players_2015.csv'

``````
``````

Out[1]:

season_end
player
pos
age
bref_team_id
g
gs
mp
fg
fga
...
TOV%
USG%
OWS
DWS
WS
WS/48
OBPM
DBPM
BPM
VORP

0
2015
Quincy Acy
F
24
NYK
52
21
19.2
2.2
4.6
...
15.1
14.7
0.6
0.5
1.0
0.050
-2.6
-0.7
-3.4
-0.3

1
2015
G
20
MEM
18
0
7.3
1.0
2.1
...
15.9
17.7
0.0
0.2
0.2
0.076
-2.3
1.8
-0.5
0.0

2
2015
C
21
OKC
51
50
24.2
3.0
5.5
...
19.2
14.8
1.0
1.8
2.8
0.109
-2.0
2.0
-0.1
0.6

3
2015
F
28
MIN
17
0
12.6
1.1
2.6
...
12.9
14.1
0.2
0.2
0.4
0.093
-2.6
0.8
-1.8
0.0

4
2015
Arron Afflalo
G
29
TOT
60
54
32.5
5.0
11.8
...
10.9
19.6
1.4
0.7
2.1
0.051
-0.2
-1.4
-1.6
0.2

5 rows × 49 columns

``````
``````

In [2]:

# examine the columns

``````
``````

In [3]:

# examine the positions

``````

## Step 2: Create X and y

Use the following features: assists, steals, blocks, turnovers, personal fouls

``````

In [4]:

# map positions to numbers

``````
``````

In [5]:

# create feature matrix (X) (use fields: 'ast', 'stl', 'blk', 'tov', 'pf')

``````
``````

In [6]:

# create response vector (y)

``````

## Step 3: Train a KNN model (K=5)

``````

In [7]:

# import class

``````
``````

In [8]:

# instantiate with K=5

``````
``````

In [9]:

# fit with data

``````

## Step 4: Predict player position and calculate predicted probability of each position

Predict for a player with these statistics: 1 assist, 1 steal, 0 blocks, 1 turnover, 2 personal fouls

``````

In [10]:

# create a list to represent a player

``````
``````

In [11]:

# make a prediction

``````
``````

In [12]:

# calculate predicted probabilities

``````

## Step 5: Repeat steps 3 and 4 using K=50

``````

In [13]:

# repeat for K=50

``````
``````

In [14]:

# calculate predicted probabilities

``````

## Bonus: Explore the features to decide which ones are predictive

``````

In [15]:

# allow plots to appear in the notebook
%matplotlib inline
import matplotlib.pyplot as plt

# increase default figure and font sizes for easier viewing
plt.rcParams['figure.figsize'] = (6, 4)
plt.rcParams['font.size'] = 14

``````