In [1]:
%matplotlib inline
import warnings # avoid a bunch of warnings that we'll ignore
warnings.filterwarnings("ignore")
This example is a small modification of the sciki-learn tutorial test. Comparison of different linear SVM classifiers on a 2D projection of the iris dataset. Here I consider only two features of the dataset:
This example shows how to plot the decision surface for four SVM classifiers with different kernels.
The linear models LinearSVC()
and SVC(kernel='linear')
yield slightly
different decision boundaries. This can be a consequence of the following
differences:
LinearSVC
minimizes the squared hinge loss while SVC
minimizes the
regular hinge loss.
LinearSVC
uses the One-vs-All (also known as One-vs-Rest) multiclass
reduction while SVC
uses the One-vs-One multiclass reduction.
Use pandas to read a database.
In this example, I use the popular seed's database at the UCL site. In the UCL web site (https://archive.ics.uci.edu/ml/datasets.html), one can find many useful academic and real databases.
In [2]:
print(__doc__)
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm, datasets
import pandas as pd
#I use this dataset because this has clearly separated cathegories,
#Read the database using pandas,
#Note that bad lines are omitted with error_bad_lines=False
df = pd.read_csv('https://archive.ics.uci.edu/ml/'
'machine-learning-databases/00236/seeds_dataset.txt', header=None, sep="\t", error_bad_lines=False)
#The headers are not given in the dataset, so we give them afterwords:
#1. area A,
#2. perimeter P,
#3. compactness C = 4*pi*A/P^2,
#4. length of kernel,
#5. width of kernel,
#6. asymmetry coefficient
#7. length of kernel groove.
#8. Class: 1=Kama, 2=Rosa, 3=Canadian
df.columns = ["area","perimeter","compactness","kernel-length","kernel-width",
"asymmetry","kernel-groove-length","class"]
#This shows the header of the database:
df.head()
Out[2]:
We take only two classes from the dataset and we standarize features.
Standarization is a common practice in machine learning algorithms to give the same weight to all features.
To standarize the values of a given feature, just use:
X_i = (X_i - M) / D
Where X_i is a given entry, M is the statistical mean and D is the standard deviation (https://en.wikipedia.org/wiki/Standard_deviation).
These functions are provided in numpy: see mean() and std().
In this particular example, there are 3 classes of seeds.
In [3]:
import numpy as np
#This sets class=2 to 0 and 3 to 1:
y = df.loc[:,'class']
#Extract some cathegories:
X=df.loc[:,["area","asymmetry"]]
#This is to convert the csv dictionary into a numpy matrix to later standarize:
X=X.as_matrix()
# standardize features
X_std = np.copy(X)
X_std[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
X_std[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()
In [4]:
h = .02 # step size in the mesh
# we create an instance of SVM and fit out data. We do not scale our
# data since we want to plot the support vectors
C = 1.0 # SVM regularization parameter
svc = svm.SVC(kernel='linear', C=C).fit(X, y)
rbf_svc = svm.SVC(kernel='rbf', gamma=0.7, C=C).fit(X, y)
poly_svc = svm.SVC(kernel='poly', degree=3, C=C).fit(X, y)
lin_svc = svm.LinearSVC(C=C).fit(X, y)
# create a mesh to plot in
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),
np.arange(y_min, y_max, h))
# title for the plots
titles = ['SVC with linear kernel',
'LinearSVC (linear kernel)',
'SVC with RBF kernel',
'SVC with polynomial (degree 3) kernel']
for i, clf in enumerate((svc, lin_svc, rbf_svc, poly_svc)):
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, x_max]x[y_min, y_max].
plt.subplot(2, 2, i + 1)
plt.subplots_adjust(wspace=0.4, hspace=0.4)
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
plt.contourf(xx, yy, Z, cmap=plt.cm.coolwarm, alpha=0.8)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.coolwarm)
plt.xlabel('Area')
plt.ylabel('Asymmetry')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.title(titles[i])
plt.show()
The linear models LinearSVC()
and SVC(kernel='linear')
yield slightly
different decision boundaries. This can be a consequence of the following
differences:
LinearSVC
minimizes the squared hinge loss while SVC
minimizes the
regular hinge loss.
LinearSVC
uses the One-vs-All (also known as One-vs-Rest) multiclass
reduction while SVC
uses the One-vs-One multiclass reduction.