1. Create Dummy Data for Classification
  2. Classify Dummy Data
  3. Breakdown of Metrics Included in Classification Report
  4. List of Other Classification Metrics Available in sklearn.metrics

1. Create Dummy Data for Classification


In [1]:
import seaborn as sns 
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.datasets import make_blobs

data, labels = make_blobs(n_samples=100, n_features=2, centers=2,cluster_std=4,random_state=2)

plt.scatter(data[:,0], data[:,1], c = labels, cmap='coolwarm');


2. Classify Data


In [2]:
#Import LinearSVC
from sklearn.svm import LinearSVC

#Create instance of Support Vector Classifier
svc = LinearSVC()

#Fit estimator to 70% of the data
svc.fit(data[:70], labels[:70])

#Predict final 30%
y_pred = svc.predict(data[70:])

#Establish true y values
y_true = labels[70:]

3. Breakdown of Metrics Included in Classification Report

Precision Score

TP - True Positives
FP - False Positives

Precision - Accuracy of positive predictions.
Precision = TP/(TP + FP)


In [3]:
from sklearn.metrics import precision_score

print("Precision score: {}".format(precision_score(y_true,y_pred)))


Precision score: 0.9285714285714286

Recall Score

FN - False Negatives

Recall (aka sensitivity or true positive rate): Fraction of positives That were correctly identified.
Recall = TP/(TP+FN)


In [4]:
from sklearn.metrics import recall_score

print("Recall score: {}".format(recall_score(y_true,y_pred)))


Recall score: 0.8666666666666667

F1 Score

F1 Score (aka F-Score or F-Measure) - A helpful metric for comparing two classifiers. F1 Score takes into account precision and the recall. It is created by finding the the harmonic mean of precision and recall.

F1 = 2 x (precision x recall)/(precision + recall)


In [5]:
from sklearn.metrics import f1_score

print("F1 Score: {}".format(f1_score(y_true,y_pred)))


F1 Score: 0.896551724137931

Classification Report

Report which includes Precision, Recall and F1-Score.


In [6]:
from sklearn.metrics import classification_report

print(classification_report(y_true,y_pred))


             precision    recall  f1-score   support

          0       0.88      0.93      0.90        15
          1       0.93      0.87      0.90        15

avg / total       0.90      0.90      0.90        30

Confusion Matrix

Confusion matrix allows you to look at the particular misclassified examples yourself and perform any further calculations as desired.


In [7]:
from sklearn.metrics import confusion_matrix
import pandas as pd

confusion_df = pd.DataFrame(confusion_matrix(y_true,y_pred),
             columns=["Predicted Class " + str(class_name) for class_name in [0,1]],
             index = ["Class " + str(class_name) for class_name in [0,1]])

print(confusion_df)


         Predicted Class 0  Predicted Class 1
Class 0                 14                  1
Class 1                  2                 13

4. List of Other Classification Metrics Available in sklearn.metrics

  • accuracy_score
  • auc
  • average_precision_score
  • brier_score_loss
  • cohen_kappa_score
  • dcg_score
  • fbeta_score
  • hamming_loss
  • hinge_loss
  • jaccard_similarity_score
  • log_loss
  • matthews_corrcoef
  • ndcg_score
  • precision_recall_curve
  • precision_recall_fscore_support
  • roc_auc_score
  • roc_curve
  • zero_one_loss

sklearn.metrics also offers Regression Metrics, Model Selection Scorer, Multilabel ranking metrics, Clustering Metrics, Biclustering metrics, and Pairwise metrics.