Binary Classification with Support Vector Machines

by Soeren Sonnenburg

This notebook illustrates how to train a binary support vector machine (SVM) classifier with shogun. A classifier attempts to distinguish objects of different type. In case of of binary classification there are just two types of objects that we want to distinguish.

More formally, we want to learn a decision function $f(x) \mapsto \{+1,-1\}$ based on a number of training examples $(x,y)_{i=0}^{N-1}$ where $x\in\mathcal{X}$ and $y\in\{-1,+1\}$. Now a SVM is a binary classifier that tries to separate objects of different classes by finding a (hyper-)plane such that the margin between the two classes is maximized:

More formally, a support vector machine is defined as $$f({\bf x})=sign\left(\sum_{i=0}^{N-1} \alpha_i k({\bf x}, {\bf x_i})+b\right)$$

where $N$ is the number of training examples $\alpha_i$ are the weights assigned to each training example $k(x,x')$ is some kernel function and $b$ the bias.

Using an a-priori choosen kernel, the $\alpha_i$ and bias are determined by solving the following quadratic program

\begin{eqnarray*} \max_{\bf \alpha} && \sum_{i=0}^{N-1} \alpha_i - \sum_{i=0}^{N-1}\sum_{j=0}^{N-1} \alpha_i y_i \alpha_j y_j k({\bf x_i}, {\bf x_j})\\ \mbox{s.t.} && 0\leq\alpha_i\leq C\\ && \sum_{i=0}^{N-1} \alpha_i y_i=0\\ \end{eqnarray*} where C is a pre-specified regularization parameter trading of the size of the margin and the training error.

Now let us see how one can train a support vector machine with shogun:

In a first step we just generate some two-dimensional Gaussians.

$x_-\sim{\cal N_2}(0,1)-d$

$x_+\sim{\cal N_2}(0,1)+d$

and corresponding positive and negative labels. We create traindata and testdata with num of them being negatively and positively labelled in traindata,trainlab and testdata, testlab. For that we utilize shoguns gaussian mixture model class (GMM) from which we sample the data points and plot them.



In [ ]:

    
from modshogun import *

num=1000;
dist=1.0;

gmm=GMM(2)
gmm.set_nth_mean(array([-dist,-dist]),0)
gmm.set_nth_mean(array([dist,dist]),1)
gmm.set_nth_cov(array([[1.0,0.0],[0.0,1.0]]),0)
gmm.set_nth_cov(array([[1.0,0.0],[0.0,1.0]]),1)

gmm.set_coef(array([1.0,0.0]))
xntr=array([gmm.sample() for i in xrange(num)]).T
xnte=array([gmm.sample() for i in xrange(num)]).T

gmm.set_coef(array([0.0,1.0]))
xptr=array([gmm.sample() for i in xrange(num)]).T
xpte=array([gmm.sample() for i in xrange(num)]).T


traindata=concatenate((xntr,xptr), axis=1)
testdata=concatenate((xnte,xpte), axis=1)

trainlab=concatenate((-ones(num), ones(num)))
testlab=concatenate((-ones(num), ones(num)))



In [ ]:

    
_=scatter(traindata[0,:], traindata[1,:], c=trainlab, s=100)

Using this data we create shogun features and label objects:



In [ ]:

    
feats_train=RealFeatures(traindata)
labels=BinaryLabels(trainlab)

Just for fun we compute the kernel matrix and display it - the nice block structure of the matrix suggests that the data will be nicely separable (examples 0..999 when compared to each other have mostly higher scores than when compared to example 1000...1999 and vice versa).



In [ ]:

    
width=2
kernel=GaussianKernel(feats_train, feats_train, width)



In [ ]:

    
km=kernel.get_kernel_matrix()
_=imshow(km)
_=colorbar()

Now we train an SVM with this GaussianKernel. We use LibSVM but we could use any of the other SVM from shogun. They all utilize the same kernel framework and so are drop-in replacements.



In [ ]:

    
C=1.0
svm=LibSVM(C, kernel, labels)
_=svm.train()

We could now check a number of properties like how many support vectors the trained SVM has, i.e. how many of the $\alpha_i$ above are non-zero:



In [ ]:

    
print svm.get_num_support_vectors()

or what the value of the objective function returned by the particular SVM learning algorithm or the explictly computed primal and dual objective function is



In [ ]:

    
libsvm_obj=svm.get_objective()
primal_obj, dual_obj=svm.compute_svm_primal_objective(), svm.compute_svm_dual_objective()

print libsvm_obj, primal_obj, dual_obj

and based on the objectives we can compute the duality gap, a measure of convergence quality of the svm training algorithm. In theory it is 0 at the optimum and in reality at least close to 0. The



In [ ]:

    
print "duality_gap", dual_obj-primal_obj

Let's now compute the test error



In [ ]:

    
out=svm.apply(RealFeatures(testdata))

evaluator=ErrorRateMeasure()
print "Test error is %2.2f%%" % (100*evaluator.evaluate(out,BinaryLabels(testlab)))

Now lets plot the contour output on a $-5...+5$ grid for

The Support Vector Machines decision function $\mbox{sign}(f(x))$
The Support Vector Machines raw output $f(x)$
The Original Gaussian Mixture Model Distribution



In [ ]:

    
size=100
x1=linspace(-5, 5, size)
x2=linspace(-5, 5, size)
x, y=meshgrid(x1, x2)
grid=RealFeatures(array((ravel(x), ravel(y))))
grid_out=svm.apply(grid)
z=grid_out.get_labels().reshape((size, size))

figure(figsize=(20,5))
subplot(131)
c=pcolor(x, y, z)
_=contour(x, y, z, linewidths=1, colors='black', hold=True)
_=colorbar(c)

z=grid_out.get_values().reshape((size, size))

subplot(132)
c=pcolor(x, y, z)
_=contour(x, y, z, linewidths=1, colors='black', hold=True)
_=colorbar(c)

subplot(133)
gmm.set_coef(array([1.0,0.0]))
gmm.set_features(grid)
grid_out=gmm.get_likelihood_for_all_examples()
zn=grid_out.reshape((size, size))
gmm.set_coef(array([0.0,1.0]))
grid_out=gmm.get_likelihood_for_all_examples()
zp=grid_out.reshape((size, size))
z=zp-zn
c=pcolor(x, y, z)
_=contour(x, y, z, linewidths=1, colors='black', hold=True)
_=colorbar(c)

And voila! The SVM decision rule reasonably distinguishes the red from the blue points. Despite being optimized for learning the discriminative function maximizing the margin, the SVM output quality wise resembles the original distribution of the gaussian mixture model.