Evaluating Models

Evaluating using metrics

Confusion matrix - visually inspect quality of a classifier's predictions (more here) - very useful to see if a particular class is problematic

Here, we will process some data, classify it with SVM (see [here](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC) for more info), and view the quality of the classification with a confusion matrix.



In [ ]:

    
import pandas as pd

# import model algorithm and data
from sklearn import svm, datasets

# import splitter
from sklearn.cross_validation import train_test_split

# import metrics
from sklearn.metrics import confusion_matrix

# feature data (X) and labels (y)
iris = datasets.load_iris()
X, y = iris.data, iris.target

# split data into training and test sets
X_train, X_test, y_train, y_test = \
    train_test_split(X, y, train_size = 0.70, random_state = 42)



In [ ]:

    
# perform the classification step and run a prediction on test set from above
clf = svm.SVC(kernel = 'linear', C = 0.01)
y_pred = clf.fit(X_train, y_train).predict(X_test)

pd.DataFrame({'Prediction': iris.target_names[y_pred],
    'Actual': iris.target_names[y_test]})



In [ ]:

    
# accuracy score
clf.score(X_test, y_test)



In [ ]:

    
# Define a plotting function confusion matrices 
#  (from http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html)

import matplotlib.pyplot as plt

def plot_confusion_matrix(cm, target_names, title = 'The Confusion Matrix', cmap = plt.cm.YlOrRd):
    plt.imshow(cm, interpolation = 'nearest', cmap = cmap)
    plt.tight_layout()
    
    # Add feature labels to x and y axes
    tick_marks = np.arange(len(target_names))
    plt.xticks(tick_marks, target_names, rotation=45)
    plt.yticks(tick_marks, target_names)
    
    plt.ylabel('True Label')
    plt.xlabel('Predicted Label')
    
    plt.colorbar()

Numbers in confusion matrix:

on-diagonal - counts of points for which the predicted label is equal to the true label
off-diagonal - counts of mislabeled points



In [ ]:

    
%matplotlib inline

cm = confusion_matrix(y_test, y_pred)

# see the actual counts
print(cm)

# visually inpsect how the classifier did of matching predictions to true labels
plot_confusion_matrix(cm, iris.target_names)

Classification reports - a text report with important classification metrics (e.g. precision, recall)



In [ ]:

    
from sklearn.metrics import classification_report

# Using the test and prediction sets from above
print(classification_report(y_test, y_pred, target_names = iris.target_names))



In [ ]:

    
# Another example with some toy data

y_test = ['cat', 'dog', 'mouse', 'mouse', 'cat', 'cat']
y_pred = ['mouse', 'dog', 'cat', 'mouse', 'cat', 'mouse']

# How did our predictor do?
print(classification_report(y_test, ___, target_names = ___)) # <-- fill in the blanks

QUICK QUESTION: Is it better to have too many false positives or too many false negatives?

Evaluating Models and Under/Over-Fitting

Over-fitting or under-fitting can be visualized as below and tuned as we will see later with GridSearchCV paramter tuning
A validation curve gives one an idea of the relationship of model complexity to model performance.
For this examination it would help to understand the idea of the bias-variance tradeoff.
A learning curve helps answer the question of if there is an added benefit to adding more training data to a model. It is also a tool for investigating whether an estimator is more affected by variance error or bias error.

PARTING THOUGHT: Does a parameter when increased/decreased cause overfitting or underfitting? What are the implications of those cases?

Created by a Microsoft Employee.