Testing a Model

Based on Kevin Markham's video series: Introduction to machine learning with scikit-learn

jupyter notebook 05_model_evaluation_ta.ipynb

In [ ]:
# read in the iris data
from sklearn.datasets import load_iris
iris = load_iris()

# create X (features) and y (response)
X = iris.data
y = iris.target

Logistic regression


In [ ]:
# import the class
from sklearn.linear_model import LogisticRegression

# instantiate the model (using the default parameters)
logreg = LogisticRegression()

# fit the model with data
logreg.fit(X, y)

# predict the response values for the observations in X
y_pred = logreg.predict(X)
print(y_pred)
print("{0} predictions".format(len(y_pred)))

Classification accuracy:

  • Proportion of correct predictions
  • Common evaluation metric for classification problems

In [ ]:
# compute classification accuracy for the logistic regression model
from sklearn import metrics
print metrics.accuracy_score(y, y_pred)

Generating an Optimal KNN classifier

Look back at 04_model_training and see how high an accuracy you can achieve for different values of n_neighbors. Try to understand why different values do better than others in terms of the pictures we saw in 04_model_training.

You can change feature1 and feature2 in the cell below to visualize different projections of the data.


In [ ]:
feature1 = 1 # feature on x axis
feature2 = 3 # feature on y axis

data = X
f1vals = X[:,feature1]
f2vals = X[:,feature2]

import numpy as np
targets = dict(zip(range(3), iris.target_names))
features = dict(zip(range(4), iris.feature_names))
%matplotlib inline
import matplotlib.pyplot as plt
colors = ['g', 'r', 'b']
fig = plt.figure(figsize=(8,8))
ax = plt.subplot()
for species in targets.keys():
    f1 = f1vals[np.where(y==species)]
    f2 = f2vals[np.where(y==species)]
    ax.scatter(f1, f2, c=colors[species], label=targets[species], s=40)
    ax.set(xlabel=features[feature1], ylabel=features[feature2])
    ax.legend()