Title: Calibrate Predicted Probabilities In SVC
Slug: calibrate_predicted_probabilities_in_svc
Summary: How to calibrate predicted probabilities in support vector classifier in Scikit-Learn
Date: 2017-09-22 12:00
Category: Machine Learning
Tags: Support Vector Machines
Authors: Chris Albon
SVC's use of a hyperplane to create decision regions do not naturally output a probability estimate that an observation is a member of a certain class. However, we can in fact output calibrated class probabilities with a few caveats. In an SVC, Platt scaling can be used, wherein first the SVC is trained, then a separate cross-validated logistic regression is trained to map the SVC outputs into probabilities:
$$P(y=1 \mid x)={\frac {1}{1+e^{(A*f(x)+B)}}}$$where $A$ and $B$ are parameter vectors and $f$ is the $i$th observation's signed distance from the hyperplane. When we have more than two classes, an extension of Platt scaling is used.
In scikit-learn, the predicted probabilities must be generated when the model is being trained. This can be done by setting SVC's probability to True. After the model is trained, we can output the estimated probabilities for each class using predict_proba.
In [1]:
# Load libraries
from sklearn.svm import SVC
from sklearn import datasets
from sklearn.preprocessing import StandardScaler
import numpy as np
In [2]:
# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target
In [3]:
# Standarize features
scaler = StandardScaler()
X_std = scaler.fit_transform(X)
In [4]:
# Create support vector classifier object
svc = SVC(kernel='linear', probability=True, random_state=0)
# Train classifier
model = svc.fit(X_std, y)
In [5]:
# Create new observation
new_observation = [[.4, .4, .4, .4]]
In [6]:
# View predicted probabilities
model.predict_proba(new_observation)
Out[6]: