notebook.community

Edit and run



In [22]:

    
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import MultinomialNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
import pandas as pd



In [23]:

    
models = {'svm': LinearSVC(), 
          'log_reg': LogisticRegression(), 
          'naive_baives': MultinomialNB(), 
          'knn': KNeighborsClassifier(),
          'dec_tree': DecisionTreeClassifier()}

Read in the Kobe Bryant shooting data [https://www.kaggle.com/c/kobe-bryant-shot-selection]



In [3]:

    
kobe = pd.read_csv('../data/kobe.csv')

For now, use just the numerical datatypes. They are below as num_columns



In [20]:

    
[(col, dtype) for col, dtype in zip(kobe.columns, kobe.dtypes) if dtype != 'object']
num_columns = [col for col, dtype in zip(kobe.columns, kobe.dtypes) if dtype != 'object']
num_columns









    Out[20]:





['game_event_id',
 'game_id',
 'lat',
 'loc_x',
 'loc_y',
 'lon',
 'minutes_remaining',
 'period',
 'playoffs',
 'seconds_remaining',
 'shot_distance',
 'shot_made_flag',
 'team_id',
 'shot_id']

The `shot_made_flag` is the result (0 or 1) of the shot that Kobe took. Some of the values are missing (e.g. `NaN`). Drop them.



In [21]:

    
kobe = kobe

Use the `num_columns`, the `kobe` dataframe to `fit()` the `models`. Choose one or more of the entries in `num_columns` as features. These models are used to predict whether Kobe will make or miss a shot given the certain input parameters provided.

Get the accuracy of each model with respect to the data used to fit the model.



In [27]:

    
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
sns.set(font_scale=1.5)

The following is a reminder of how the SciKit-Learn Models can be interfaced



In [37]:

    
# fit a linear regression model and store the predictions
example = pd.DataFrame({'a':[1,2,3,4,5,6], 'b':[1,1,0,0,0,1]})
feature_cols = ['a']
X = example[feature_cols]
y = example.b
from sklearn.linear_model import LinearRegression
linreg = LinearRegression()
linreg.fit(X, y)
example['pred'] = linreg.predict(X)
# scatter plot that includes the regression line
plt.scatter(example.a, example.b)
plt.plot(example.a, example.pred, color='red')
plt.xlabel('a')
plt.ylabel('b')

from sklearn.metrics import accuracy_score
accuracy_score(example.b, example.pred.astype(int))









    Out[37]:





0.5



In [ ]:

The shot_made_flag is the result (0 or 1) of the shot that Kobe took. Some of the values are missing (e.g. NaN). Drop them.

Use the num_columns, the kobe dataframe to fit() the models. Choose one or more of the entries in num_columns as features. These models are used to predict whether Kobe will make or miss a shot given the certain input parameters provided.

Get the accuracy of each model with respect to the data used to fit the model.

The following is a reminder of how the SciKit-Learn Models can be interfaced

The `shot_made_flag` is the result (0 or 1) of the shot that Kobe took. Some of the values are missing (e.g. `NaN`). Drop them.

Use the `num_columns`, the `kobe` dataframe to `fit()` the `models`. Choose one or more of the entries in `num_columns` as features. These models are used to predict whether Kobe will make or miss a shot given the certain input parameters provided.