(Maximal margin classifiers)
Support Vector Machines (SVM) separates classes of data by maximizing the "space" (margin) between pairs of these groups. Classification for multiple classes is then supported by a one-vs-all method (just like we previously did for Logistic Regression for Multi-class classification).
A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples.
In which sense is the hyperplane obtained optimal? Let’s consider the following simple problem:
We'll start by imagining a situation in which we want to seperate a training set with two classes. We have two classes in our set, blue and red. We plot them out in the feature space and we try to place a green line that seperates both classes.
In [3]:
from IPython.display import Image
Image(url="http://docs.opencv.org/2.4/_images/separating-lines.png")
Out[3]:
In the above picture you can see that there exists multiple lines that offer a solution to the problem. Is any of them better than the others? We can intuitively define a criterion to estimate the worth of the lines:
A line is bad if it passes too close to the points because it will be noise sensitive and it will not generalize correctly. Therefore, our goal should be to find the line passing as far as possible from all points. Then, the operation of the SVM algorithm is based on finding the hyperplane that gives the largest minimum distance to the training examples. Twice, this distance receives the important name of margin within SVM’s theory. Therefore, the optimal separating hyperplane maximizes the margin of the training data.
In [4]:
Image(url="http://docs.opencv.org/2.4/_images/optimal-hyperplane.png")
Out[4]:
In machine learning, support vector machines (SVMs) are supervised learning models with associated learning algorithms that analyze data and recognize patterns, used for classification and regression analysis. Given a set of training examples, each marked for belonging to one of two categories, an SVM training algorithm builds a model that assigns new examples into one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall on.
The advantages of support vector machines are:
The disadvantages of support vector machines include:
So how do we actually mathematically compute that optimal hyperplane? A full explanation can be found in Wikipedia
In [10]:
#Imports
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
First we'll start by importing the Data set we are already very familiar with the Iris Data Set from last lecture:
In [2]:
from sklearn import datasets
# load the iris datasets
iris = datasets.load_iris()
# Grab features (X) and the Target (Y)
X = iris.data
Y = iris.target
# Show the Built-in Data Description
print iris.DESCR
Now we will import the SVC (Support Vector Classification) from the SVM library of Sci Kit Learn, I encourage you to check out the other types of SVM options in the Sci Kit Learn Documentation!
In [3]:
# Support Vector Machine Imports
from sklearn.svm import SVC
In [4]:
# Fit a SVM model to the data
model = SVC()
Now we will split the data into a training set and a testing set and then train our model.
In [5]:
from sklearn.cross_validation import train_test_split
# Split the data into Trainging and Testing sets
X_train, X_test, Y_train, Y_test = train_test_split(X, Y)
In [6]:
# Fit the model
model.fit(X_train,Y_train)
Out[6]:
Now we'll go ahead and see how well our model did!
In [7]:
from sklearn import metrics
# Get predictions
predicted = model.predict(X_test)
expected = Y_test
# Compare results
print metrics.accuracy_score(expected,predicted)
Looks like we have achieved a 100% accuracy with Support Vector Classification!
Now that we've gone through a basic implementation of SVM lets go ahead and quickly explore the various kernel types we can use for classification. We can do this by plotting out the boundaries created by each kernel type! We'll start with some imports and by setting up the data.
If we want to do non-linear classification we can employ the kernel trick. Using the kernel trick we can "slice" the feature space with a Hyperplane. For a quick illustraion of what this looks like, check out both the image and the video below!
In [19]:
# Kernel Trick for the Feature Space
from IPython.display import Image
url='http://i.imgur.com/WuxyO.png'
Image(url)
Out[19]:
In [20]:
# Kernel Trick Visualization
from IPython.display import YouTubeVideo
YouTubeVideo('3liCbRZPrZA')
Out[20]:
The four methods we will explore are two linear models, a Gaussian Radial Basis Function,and a SVC with a polynomial (3rd Degree) kernel.
The linear models LinearSVC() and SVC(kernel='linear') yield slightly different decision boundaries. This can be a consequence of the following differences:
In [8]:
# Import all SVM
from sklearn import svm
# We'll use all the data and not bother with a split between training and testing. We'll also only use two features.
X = iris.data[:,:2]
Y = iris.target
# SVM regularization parameter
C = 1.0
# SVC with a Linear Kernel (our original example)
svc = svm.SVC(kernel='linear', C=C).fit(X, Y)
# Gaussian Radial Bassis Function
rbf_svc = svm.SVC(kernel='rbf', gamma=0.7, C=C).fit(X, Y)
# SVC with 3rd degree poynomial
poly_svc = svm.SVC(kernel='poly', degree=3, C=C).fit(X, Y)
# SVC Linear
lin_svc = svm.LinearSVC(C=C).fit(X,Y)
Now that we have fitted the four models, we will go ahead and begin the process of setting up the visual plots. Note: This example is taken from the Sci-Kit Learn Documentation.
First we define a mesh to plot in. We define the max and min of the plot for the y and x axis by the smallest and larget features in the data set. We can use numpy's built in mesh grid method to construct our plot.
In [11]:
# Set the step size
h = 0.02
# X axis min and max
x_min=X[:, 0].min() - 1
x_max =X[:, 0].max() + 1
# Y axis min and max
y_min = X[:, 1].min() - 1
y_max = X[:, 1].max() + 1
# Finally, numpy can create a meshgrid
xx, yy = np.meshgrid(np.arange(x_min, x_max, h),np.arange(y_min, y_max, h))
Now the plot titles
In [12]:
# title for the plots
titles = ['SVC with linear kernel',
'LinearSVC (linear kernel)',
'SVC with RBF kernel',
'SVC with polynomial (degree 3) kernel']
Finally we will go through each model, set its position as a subplot, then scatter the data points and draw a countour of the decision boundaries.
In [16]:
# Use enumerate for a count
for i, clf in enumerate((svc, lin_svc, rbf_svc, poly_svc)):
# Plot the decision boundary. For that, we will assign a color to each
# point in the mesh [x_min, m_max]x[y_min, y_max].
plt.figure(figsize=(15,15))
# Set the subplot position (Size = 2 by 2, position deifined by i count
plt.subplot(2, 2, i + 1)
# SUbplot spacing
plt.subplots_adjust(wspace=0.4, hspace=0.4)
# Define Z as the prediction, not the use of ravel to format the arrays
Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
# Put the result into a color plot
Z = Z.reshape(xx.shape)
# Contour plot (filled with contourf)
plt.contourf(xx, yy, Z, cmap=plt.cm.terrain, alpha=0.5)
# Plot also the training points
plt.scatter(X[:, 0], X[:, 1], c=Y, cmap=plt.cm.Dark2)
# Labels and Titles
plt.xlabel('Sepal length')
plt.ylabel('Sepal width')
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())
plt.xticks(())
plt.yticks(())
plt.title(titles[i])
plt.show()
1.) Microsoft Research Paper SVM Tutorial
3.) Sci Kit Learn Documentation
4.) Wikipedia
In [ ]: