Support Vector Data Description

The problem of data description or one-class classification is to make a description of a training set of objects and to detect which (new) objects resemble this training set.

This has been greatly used in outlier detection, i.e. the detection of uncharacteristic objects from a data set. Furthermore, another possible application of data description is in a classification problem where one of the classes is sampled very well, while the other class is severely undersampled.

Here, the method proposed by Tax and Duin in 2004. called Support Vector Data Description has been implemented. This method obtains a spherically shaped boundary around the complete target set. To minimize the chance of accepting outliers, the volume of this description is minimized.

The error function to be minimized is defined as:

$$F(R, \mathbf{a}) = R^2$$

with the constraint:

$$\|\Phi(\mathbf{x}_i)-\mathbf{a}\|^2 \le R^2, \hspace{3mm}\forall i,$$

where $\Phi$ defines an implicit mapping of the data into an another (possibly highdimensional) feature space.

To allow the possibility of outliers in the training set, the distance from $\Phi(\mathbf{x}_i)$ to the center $\mathbf{a}$ should not be strictly smaller than $R^2$, but larger distances should be penalized. Therefore, the slack variables $\xi_i \ge 0$ are introduced, so the minimization problem becomes:

$$F(R, \mathbf{a}) = R^2 + C\sum_i{\xi_i}$$

with the constraint which ensures that almost all objectives are within the sphere:

$$\|\Phi(\mathbf{x}_i)-\mathbf{a}\|^2 \le R^2 + \xi_i, \hspace{3mm} \xi_i \ge 0 \hspace{3mm} \forall i.$$

This optimization problem can be solved using the Lagrange multipliers, so the following expression is obtained:

$$L = \sum_i{\alpha_i(\Phi(\mathbf{x}_i)\cdot\Phi(\mathbf{x}_i))} - \sum_i\sum_j{\alpha_i \alpha_j(\Phi(\mathbf{x}_i)\cdot\Phi(\mathbf{x}_j))}$$

subject to the constraints:

$$\sum_i{\alpha_i} = 1,$$$$\mathbf{a} = \sum_i{\alpha_i\Phi(\mathbf{x}_i)},$$$$0 \le \alpha_i \le C.$$

Only objects $\Phi(\mathbf{x}_i)$ with $\alpha_i \gt 0$ are needed in the data description and these objects are called the support vectors (SVs) of the description.

The above used inner products $(\Phi(\mathbf{x}_i)\cdot\Phi(\mathbf{x}_j))$ can be replaced by a kernel function $K(\mathbf{x}_i, \mathbf{x}_j) = (\Phi(\mathbf{x}_i)\cdot\Phi(\mathbf{x}_j))$ to obtain more flexible methods, so the optimization problem above becomes:

$$L = \sum_i{\alpha_iK(\mathbf{x}_i, \mathbf{x}_i)} - \sum_i\sum_j{\alpha_i \alpha_jK(\mathbf{x}_i, \mathbf{x}_j)}.$$



In [7]:

    
import cvxopt as co
import numpy as np
import pylab as pl
import matplotlib.pyplot as plt
from svdd import SVDD
from kernel import Kernel



In [8]:

    
# kernel parameter and type
kparam = 0.1
ktype = 'rbf'
# generate 100 samples of raw training data with dimensionality d=2
Dtrain = co.normal(2,100)
print 'Training phase started...\n'
# initialize the training kernel of the given type with the given parameter
kernel = Kernel.get_kernel(Dtrain,Dtrain,ktype,kparam)
# train svdd
svdd = SVDD(kernel,0.9)
svdd.train_dual()
print 'Training phase finished.'









    



Training phase started...

Calculating Gaussian kernel with size 100x100.
Creating new SVDD with 100 samples and C=0.9.
     pcost       dcost       gap    pres   dres
 0: -2.9021e-01 -9.3109e+01  4e+02  2e+00  6e-16
 1: -2.4204e-01 -2.8832e+01  3e+01  2e-02  9e-16
 2: -2.0686e-01 -1.0536e+00  9e-01  2e-04  8e-16
 3: -2.2971e-01 -5.0176e-01  3e-01  6e-05  4e-16
 4: -2.8601e-01 -3.9723e-01  1e-01  2e-16  3e-16
 5: -3.0155e-01 -3.3739e-01  4e-02  2e-16  3e-16
 6: -3.0978e-01 -3.2000e-01  1e-02  2e-16  3e-16
 7: -3.1337e-01 -3.1370e-01  3e-04  2e-16  4e-16
 8: -3.1358e-01 -3.1358e-01  5e-06  2e-16  4e-16
 9: -3.1358e-01 -3.1358e-01  5e-08  2e-16  4e-16
Optimal solution found.
Threshold is [ 6.27e-01]

Training phase finished.



In [9]:

    
# generate the test data grid with 6400 samples with the same dimensionality d=2 as the training data
delta = 0.1
x = np.arange(-4.0, 4.0, delta)
y = np.arange(-4.0, 4.0, delta)
X, Y = np.meshgrid(x, y)    
(sx,sy) = X.shape
Xf = np.reshape(X,(1,sx*sy))
Yf = np.reshape(Y,(1,sx*sy))
Dtest = np.append(Xf,Yf,axis=0)



In [10]:

    
print 'Test phase started...\n'
Dtest = co.matrix(Dtest)
# compute the support vector indices
sv_indices = svdd.get_support_dual()
# initialize the test kernel of the given type with the given parameter	
kernel = Kernel.get_kernel(Dtest, Dtrain[:, sv_indices], ktype, kparam)
# for svdd we need the data norms additionally
norms = Kernel.get_diag_kernel(Dtest, ktype, kparam)
# apply the svdd
(res,state) = svdd.apply_dual(kernel,norms)
print '\nTest phase finished.'









    



Test phase started...

Calculating Gaussian kernel with size 6400x5.
Gaussian kernel diagonal is always exp(0)=1.

Test phase finished.



In [11]:

    
# nice visualization
Z = np.reshape(res,(sx,sy))
plt.contourf(X, Y, Z)
plt.contour(X, Y, Z, [np.array(svdd.get_threshold())[0,0]])
plt.scatter(Dtrain[0, sv_indices],Dtrain[1, sv_indices],40,c='k') 
plt.scatter(Dtrain[0,:],Dtrain[1,:],10)
plt.show()