How to measure a model? How to find out that the model is doing well or just predicting useless?
This job is done by metrics. There are a bunch of metrics explained in scikit-learn documentation about model evaluation and we are going to see some of them.
In [1]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
X, y = load_iris(return_X_y=True)
print('X.shape =', X.shape)
print('y.shape =', y.shape)
print()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, shuffle=True)
print('X_train.shape =', X_train.shape)
print('y_train.shape =', y_train.shape)
print('X_test.shape =', X_test.shape)
print('y_test.shape =', y_test.shape)
Accuracy is the fraction of the correct predictions:
$$ accuracy(y, \hat{y}) = \frac{1}{m} \sum_{i=1}^m 1(y^{(i)} = \hat{y}^{(i)}) $$where $ 1(x) $ is the indicator function and $ m $ is the number of samples.
In [2]:
import numpy as np
from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]
y_true = [0, 1, 2, 3]
accuracy_score(y_true, y_pred)
Out[2]:
For multiclass classification:
and for binary classification:
In [3]:
import numpy as np
from sklearn.metrics import confusion_matrix
y_true = [0, 0, 0, 1, 1, 1, 1, 1]
y_pred = [0, 1, 0, 1, 0, 1, 0, 1]
tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
np.array([[tn, fp],
[fn, tp]])
Out[3]:
In [4]:
from sklearn.metrics import precision_score, recall_score, f1_score
y_pred = [0, 1, 0, 0]
y_true = [0, 1, 0, 1]
print('[[tn fn]\n [fp, tp]]')
print(confusion_matrix(y_true, y_pred))
print()
print('Recall =', recall_score(y_true, y_pred))
print('Precision =', precision_score(y_true, y_pred))
print('F1 =', f1_score(y_true, y_pred))
In [5]:
from sklearn.metrics import classification_report
y_true = [0, 1, 2, 2, 0]
y_pred = [0, 0, 2, 1, 0]
target_names = ['class 0', 'class 1', 'class 2']
print(classification_report(y_true, y_pred, target_names=target_names))
In [6]:
from sklearn.metrics import log_loss
y_true = [0, 0, 1, 1]
y_pred = [[.9, .1], [.8, .2], [.3, .7], [.01, .99]]
log_loss(y_true, y_pred)
Out[6]:
In [7]:
from sklearn.metrics import mean_absolute_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
mean_absolute_error(y_true, y_pred)
Out[7]:
In [8]:
from sklearn.metrics import mean_squared_error
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
mean_squared_error(y_true, y_pred)
Out[8]:
From scikit-learn documentation:
$$ R^2(y, \hat{y}) = 1 - \frac{\sum_{i=1}^m (y^{(i)} - \hat{y}^{(i)})^2}{\sum_{i=1}^m (y^{(i)} - \bar{y})^2} $$The $ R^2 $ score is the coefficient of determination. It provides a measure of how well future samples are likely to be predicted by the model. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse). A constant model that always predicts the expected value of y, disregarding the input features, would get a $ R^2 $ score of 0.0.