Linear Support Vector Machines

Classification Loss vs. Hinge Loss vs. Huberized Hinge Loss vs. Square Hinge Loss


In [1]:
%matplotlib nbagg
import matplotlib.pyplot as plt
plt.clf()
plt.cla()

import numpy as np

ax = plt.subplot(1,1,1)

x_plot=np.linspace(-2,2,1000)
y_plot1=x_plot.copy()
y_plot1[x_plot < 0]=1
y_plot1[x_plot == 0]=0
y_plot1[x_plot > 0]=0
plot1 = ax.plot(x_plot,y_plot1, label='Classification Loss')

y_plot2=np.maximum(np.zeros(x_plot.shape),1-x_plot.copy())
plot2 = ax.plot(x_plot,y_plot2, label='Hinge Loss')

y_plot4=np.power(np.maximum(np.zeros(x_plot.shape),1-x_plot.copy()),2)
plot4 = ax.plot(x_plot,y_plot4, label='Square Hinge Loss')

h=.8
y_plot3= -1 * np.ones(x_plot.shape)
y_plot3[x_plot > 1+h]=0
y_plot3[x_plot < 1-h]=1-x_plot[x_plot < 1-h]
y_plot3[y_plot3 == -1]= ((1+h-x_plot[y_plot3 == -1])**2)/(4*h)
plot3 = ax.plot(x_plot,y_plot3, label='Huberized Hinge Loss')

handles, labels = ax.get_legend_handles_labels()
plt.legend(handles, labels)
plt.title('Loss Comparison')
plt.ylabel('Loss')
plt.xlabel('Distance From Decision Boundary')


/Users/mich/anaconda/lib/python2.7/site-packages/IPython/kernel/__init__.py:13: ShimWarning: The `IPython.kernel` package has been deprecated. You should import from ipykernel or jupyter_client instead.
  "You should import from ipykernel or jupyter_client instead.", ShimWarning)
Out[1]:
<matplotlib.text.Text at 0x1020351d0>

Analytic Expressions

Let $X \in R^{n \times d+1}$ and $y = (y_1,...,y_n)^T \in R^{n+1}$ and $\texttt{loss}(...) \ge 0$

Objective function: $$F(w) = \|w\|^2 + \frac Cn \|\texttt{loss}(y,Xw)\|_1$$

Clearly, $$(\vec\nabla F(w))_j = 2w_j + \frac Cn \sum^{n}_{i=1}\frac{d}{d(Xw)_i}\texttt{loss}(y_i,(Xw)_i) \cdot X_{i,j} \quad \texttt{for} ~j = 1,2,...,d+1$$

For Hinge Loss:
$$l_{hinge}(y,t) := \max(0, 1 - yt)$$

Then $$\frac{d}{dt}l_{hinge}(y,t) := \begin{cases} 0, & \mbox{if } 1-yt \lt 0\\ -y, & \mbox{if } 1-yt \gt 0 \end{cases}$$ And $$F(w) = \|w\|^2 + \frac Cn \sum^n\max(0, 1 - y*(Xw))$$ And $\texttt{for j = 1,2,...,d+1}$ $$(\vec\nabla F(w))_j = \begin{cases} 2w_j, & \mbox{if } 1-y*(Xw) \lt 0\\ 2w_j + \frac Cn \sum^{n}_{i=1} -y \cdot X_{i,j}, & \mbox{if } 1-y*(Xw) \gt 0 \end{cases}$$ Where $y*(Xw) \in R^{n}$

For Square Hinge Loss:
$$l_{square-hinge}(y,t) := \max(0, 1 - yt)^2$$

Then $$\frac{d}{dt}l_{square-hinge}(y,t) := \begin{cases} 0, & \mbox{if } 1-yt \lt 0\\ 2(1 - yt)(-y), & \mbox{if } 1-yt \gt 0 \end{cases}$$ And $$F(w) = \|w\|^2 + \frac Cn \sum^n\max(0, 1 - y*(Xw))^2$$ And $\texttt{for j = 1,2,...,d+1}$ $$(\vec\nabla F(w))_j = \begin{cases} 2w_j, & \mbox{if } 1-y*(Xw) \le 0\\ 2w_j + \frac Cn \sum^{n}_{i=1} 2(1 - y*(Xw))(-y) \cdot X_{i,j}, & \mbox{if } 1-y*(Xw) \gt 0 \end{cases}$$ Where $y*(Xw) \in R^{n}$

For Huberized Hinge Loss:
$$l_{huber-hinge}(y,t) := \begin{cases} 0, & \mbox{if } yt \gt 1+h\\ \frac{(1+h-yt)^2}{4h}, & \mbox{if } |1-yt| \le h \\ 1-yt, & \mbox{if } yt \lt 1-h \end{cases}$$$$\frac{d}{dt}l_{huber-hinge}(y,t) := \begin{cases} 0, & \mbox{if } yt \gt 1+h\\ \frac{2(1+h-yt)(-y)}{4h}, & \mbox{if } |1-yt| \le h \\ -y, & \mbox{if } yt \lt 1-h \end{cases}$$

We have continuity in $\frac{d}{dt}l_{huber-hinge}(y,t)$ since the derivatives limits' agree at the critical points; $$ \lim_{t^+ \rightarrow \frac{1+h}{y} } \frac{d}{dt}l_{huber-hinge}(y,t) = 0$$ $$ \lim_{t^- \rightarrow \frac{1+h}{y} } \frac{d}{dt}l_{huber-hinge}(y,t) = \lim_{t^- \rightarrow \frac{1+h}{y} } \frac{2(1+h-yt)(-y)}{4h} = \lim_{t^- \rightarrow \frac{1+h}{y} } \frac{2(1+h-y\frac{1+h}{y})(-y)}{4h} = 0$$ So $$ \lim_{t^+ \rightarrow \frac{1+h}{y} } \frac{d}{dt}l_{huber-hinge}(y,t) = \lim_{t^- \rightarrow \frac{1+h}{y} } \frac{d}{dt}l_{huber-hinge}(y,t)$$ And $$ \lim_{t^+ \rightarrow \frac{1-h}{y} } \frac{d}{dt}l_{huber-hinge}(y,t) = \lim_{t^+ \rightarrow \frac{1-h}{y} } \frac{2(1+h-yt)(-y)}{4h} = \lim_{t^+ \rightarrow \frac{1-h}{y} } \frac{2(1+h-y\frac{1-h}{y})(-y)}{4h} = -y$$ $$\lim_{t^- \rightarrow \frac{1-h}{y} } \frac{d}{dt}l_{huber-hinge}(y,t) = \lim_{t^- \rightarrow \frac{1-h}{y} } -y = -y $$ So $$ \lim_{t^+ \rightarrow \frac{1-h}{y} } \frac{d}{dt}l_{huber-hinge}(y,t) = \lim_{t^- \rightarrow \frac{1-h}{y} } \frac{d}{dt}l_{huber-hinge}(y,t)$$

Also note for Huberized Hinge Loss: $$F(w) = \begin{cases} \|w\|^2, & \mbox{if } y*(Xw) \gt 1+h\\ \|w\|^2 + \frac Cn \sum^n_1 \frac{(1+h-y*(Xw))^2}{4h}, & \mbox{if } |1-y*(Xw)| \le h \\ \|w\|^2 + \frac Cn \sum^n_1 1-y*(Xw), & \mbox{if } y*(Xw) \lt 1-h \end{cases}$$ Where $y*(Xw) \in R^{n}$
And $\texttt{for j = 1,2,...,d+1}$ $$(\vec\nabla F(w))_j = \begin{cases} 2w_j, & \mbox{if } y*(Xw) \gt 1+h\\ 2w_j + \frac Cn \sum^n_1 \frac{2(1+h-y*(Xw))(-y)}{4h}*(Xw), & \mbox{if } |1-y*(Xw)| \le h \\ 2w_j + \frac Cn \sum^n_1 -yX_{ij}, & \mbox{if } y*(Xw) \lt 1-h \end{cases}$$ Where $y*(Xw) \in R^{n}$

SVM Implementation


In [10]:
from sklearn.datasets import make_regression
from sklearn.cross_validation import train_test_split
from sklearn.kernel_ridge import KernelRidge
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.svm import LinearSVC
from sklearn.linear_model import SGDClassifier
import matplotlib.pyplot as plt
import sys
import numpy as np
from numpy.linalg import norm
from numpy.random import randint
import random
import math
import time 


def loss_hinge(y,t): # const
    return max(0,1-(y*t))


def loss_hinge_der(y,t): # const
    if 1-(y*t) <= 0:
        return 0
    return -y


def loss_huber_hinge(y,t,h): # const
    if y*t > 1+h:
        return 0
    if abs(1-(y*t)) <= h:
        return pow((1+h-(y*t)),2)/(4*h)
    if y*t < 1-h:
        return 1 - (y*t)

def loss_huber_hinge_der(y,t,h): # const
    if y*t > 1+h:
        return 0
    if abs(1-(y*t)) <= h:
        return 2*(1+h-(y*t))*(-y)/(4*h)
    if y*t < 1-h:
        return -y


def loss(y,t,loss_type,h): # const
    if loss_type=='hinge':
        return loss_hinge(y,t)
    if loss_type=='modified_huber':
        return loss_huber_hinge(y,t,h)


def loss_der(y,t,loss_type,h): # const
    if loss_type=='hinge':
        return loss_hinge_der(y,t)
    if loss_type=='modified_huber':
        return loss_huber_hinge_der(y,t,h)


def compute_obj(w,C,X,y,loss_type,h): # const
    ret = 0.0
    assert len(X)==len(y)
    assert len(X[0])==len(w)
    for i in range(len(X)):
        ret += loss(y[i],np.dot(X[i],w),loss_type,h)
    return norm(w)**2 + C*ret/len(X)


def compute_grad(w,C,X,y,loss_type,h): # const
    if len(X)==len(w):
        grad = 2*w.copy()
        for i in range(len(w)):
            grad[i] += C*(loss_der(y,np.dot(X,w),loss_type,h)*X[i])
        return grad
    if len(X)==len(y) and len(X[0])==len(w):
        n=len(X)
        X[n-1,len(w)-1]
        grad = 2*w.copy()
        for i in range(len(w)):
            loss_sum = 0.0
            for j in range(n):
                loss_sum += \
                    loss_der(y[j],np.dot(X[j],w),loss_type,h)*X[j,i]
            grad[i] += C/n*loss_sum
        return grad
    assert False


def numer_grad(w,ep,delta,C,X,y,loss_type,h): # const
    return (compute_obj(w+(ep*delta),C,X,y,loss_type,h) \
           -compute_obj(w-(ep*delta),C,X,y,loss_type,h))/(2*ep)


def grad_checker(w0,C,X,y,loss_type,h): # const
    ep=.0001
    delta=0
    d=len(w0)
    w=[]
    for i in range(d):
        delta=np.zeros(w0.shape)
        delta[i] = 1
        w.append(numer_grad(w0,ep,delta,C,X,y,loss_type,h))
    return np.asarray(w)


def score(X, y, w): # const
    error = 0.0
    error_comp = 0.0
    for i in range(len(X)):
        prediction = np.sign(np.dot(w,X[i]))
        if prediction == 1 and y[i] == 1:
            error += 1
        elif (prediction == -1 or prediction == 0) and y[i] == -1:
            error += 1
        else:
            error_comp += 1
    return 'correct',error/len(X), 'failed',error_comp/len(X)


def my_gradient_descent(X,y,w0=None,initial_step_size=.1,max_iter=1000,C=1,
                        loss_type=None,h=.01,X_test=None, y_test=None,stochastic=False,back_track=True): # const
    tol=10**-4 # scikit learn default
    if w0 == None:
        w0 = np.zeros(len(X[0]))
    if len(X) == 0:
        return 'Error'
    diff = -1
    grad = -1
    
    w = w0
    obj_array = []

    training_error_array = []
    training_error_array.append(score(X, y, w=w))

    testing_error_array = []
    testing_error_array.append(score(X_test, y_test, w=w))
    
    w_array = []
    w_array.append(w.copy())
    
    for i in range(max_iter):
#         print 'i',i
        obj=compute_obj(w,C,X,y,loss_type,h)
#         print 'obj',obj
        obj_array.append(obj)
        w_p = w
        if stochastic:
            random_index = randint(1,len(X))
            grad = compute_grad(w,C,X[random_index],y[random_index],loss_type,h)
        else:
            grad = compute_grad(w,C,X,y,loss_type,h)
            assert norm(grad-grad_checker(w,C,X,y,loss_type,h)) < 10**-2
#         print 'grad',grad
        if norm(grad) < tol:
            break
        
        step_size = initial_step_size
        if back_track:
            while obj < compute_obj(w - (step_size * grad),C,X,y,loss_type,h):
                step_size = step_size/2.0
                if step_size < .00000001:
                    break
#         print 'step_size',step_size
        w += - step_size * grad
#         print 'w',w

        w_array.append(w.copy())
        training_error_array.append(score(X, y, w=w))
        testing_error_array.append(score(X_test, y_test, w=w))

        if training_error_array[len(training_error_array)-1][1] > 0.99:
            break
        
        diff = norm(w-w_p)
    if norm(grad) > tol:
        print 'Warning: Did not converge.'
    return w, w_array, obj_array, training_error_array, testing_error_array


def my_sgd(X,y,w0=None,step_size=.01,max_iter=1000,C=1,loss_type=None,h=.01,X_test=None, 
           y_test=None,stochastic=False): # const
    return my_gradient_descent(X_train, y_train,w0=w0,
                               loss_type=loss_type,
                               max_iter=max_iter,
                               h=h,C=C,X_test=X_test, 
                               y_test=y_test,
                               stochastic=stochastic)
    


def my_svm(X_train, y_train,loss_type=None,max_iter=None,h=None,C=None,X_test=None, y_test=None, 
           stochastic=False): # const
    w0=np.zeros(len(X_train[0]))
    if stochastic:
        w, w_array, obj_array, training_error_array, testing_error_array = my_sgd(X_train, y_train,w0=w0,
                                                                                   loss_type=loss_type,
                                                                                   max_iter=max_iter,
                                                                                   h=h,C=C,X_test=X_test, 
                                                                               y_test=y_test,stochastic=True)
    else:
        w, w_array, obj_array, training_error_array, testing_error_array = \
                        my_gradient_descent(X_train, y_train,w0=w0, loss_type=loss_type, max_iter=max_iter, h=h,C=C,
                                            X_test=X_test, y_test=y_test, stochastic=stochastic)
    return w, w_array, obj_array, training_error_array, testing_error_array

SVM Gradient Descent Usage Example


In [11]:
# Generate data
'''Generate 2 Gaussians samples with the same covariance matrix'''
n, dim = 500, 2
np.random.seed(0)
C = np.array([[0., -0.23], [0.83, .23]])
gap = 1
X = np.r_[np.dot(np.random.randn(n, dim)+gap, C),
          np.dot(np.random.randn(n, dim)-gap, C)]

# append constant dimension
X = np.column_stack((X, np.ones(X.shape[0])))

y = np.hstack((-1*np.ones(n), np.ones(n)))

assert len(X[y==-1])==len(y[y==-1]);assert len(X[y==1])==len(y[y==1]);assert len(X)==len(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, 
                            test_size=0.5, random_state=20140210)

assert len(X_train)>1;assert len(X_test)>1;assert len(X_train)==len(y_train);assert len(X_test)==len(y_test)

max_iter=1000
C=1.0

for loss_type in ['modified_huber','hinge']:
    for h_index in range(1,10):
        h=(.1*(1.1**h_index))
        print 'parameters: loss_type',loss_type,'h',h

        w, w_array, obj_array, training_error_array, testing_error_array = my_svm(X_train, y_train,
                                                                                  loss_type=loss_type,
                                                                                  max_iter=max_iter,h=h,C=C,
                                                                                  X_test=X_test, y_test=y_test,
                                                                                  stochastic=False)
        print 'Custom w =',w,' test score = ',score(X_test, y_test, w=w)

        clf = SGDClassifier(loss=loss_type, penalty="l2",alpha=1/C, fit_intercept=False)
        clf.fit(X_train, y_train);        assert clf.intercept_ == 0
        print 'SGDClassifier w = ',clf.coef_[0],' test score = ',score(X_test, y_test, 
                                                                       w=clf.coef_[0])
        clf = LinearSVC( penalty="l2",C=C, fit_intercept=False); clf.fit(X_train, y_train); assert clf.intercept_ == 0
        print 'LinearSVC w = ',clf.coef_[0],' test score = ',score(X_test, y_test, w=clf.coef_[0])
        print
        print
#         break
#     break


parameters: loss_type modified_huber h 0.11
Custom w = [-0.37883235 -0.00094899  0.00553782]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46127166  0.13011518 -0.00498563]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532443  2.73645895 -0.03684543]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.121
Custom w = [-0.37856912 -0.00089322  0.00555948]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46409046  0.13010087 -0.00243389]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.5453255   2.73645985 -0.03684486]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.1331
Custom w = [-0.37833309 -0.00086592  0.00557005]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [ -4.62116648e-01   1.29701474e-01  -1.74064845e-04]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532496  2.73645688 -0.03684494]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.14641
Custom w = [-0.37802441 -0.00082948  0.00553391]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46225198  0.12988215 -0.00107523]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532435  2.73645842 -0.03684603]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.161051
Custom w = [-0.37761211 -0.00078655  0.00556132]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46243281  0.12951609 -0.00445037]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532278  2.73645534 -0.03684396]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.1771561
Custom w = [-0.37699047 -0.00068713  0.00561157]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46352841  0.12992556 -0.00083436]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532413  2.73645783 -0.03684337]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.19487171
Custom w = [-0.37633265 -0.00056997  0.00562419]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46378754  0.12861855 -0.00085066]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532507  2.73645358 -0.03684756]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.214358881
Custom w = [-0.3754884  -0.00040932  0.00554961]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46517469  0.12948477 -0.00211966]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54531735  2.73646155 -0.03684732]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.2357947691
Custom w = [ -3.74524818e-01  -2.35690236e-04   5.43222493e-03]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46192225  0.12971754 -0.00408566]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532657  2.73644758 -0.03684356]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.11
Warning: Did not converge.
Custom w = [-0.38235703 -0.00201035  0.00694507]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56578704  0.02837218  0.0048    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54532522  2.73645556 -0.03684578]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.121
Warning: Did not converge.
Custom w = [-0.38235703 -0.00201035  0.00694507]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56723627  0.02801803  0.0072    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54533053  2.73645058 -0.03685017]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.1331
Warning: Did not converge.
Custom w = [-0.38235703 -0.00201035  0.00694507]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56483093  0.02860691  0.006     ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54532773  2.7364525  -0.03684688]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.14641
Warning: Did not converge.
Custom w = [-0.38235703 -0.00201035  0.00694507]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56600092  0.02817915  0.0044    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54532565  2.73645387 -0.03684701]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.161051
Warning: Did not converge.
Custom w = [-0.38235703 -0.00201035  0.00694507]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56514875  0.02818257  0.0048    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.5453228   2.73645511 -0.03684426]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.1771561
Warning: Did not converge.
Custom w = [-0.38235703 -0.00201035  0.00694507]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56396978  0.02822439  0.0084    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54531477  2.73645921 -0.03685121]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.19487171
Warning: Did not converge.
Custom w = [-0.38235703 -0.00201035  0.00694507]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56518274  0.02794956  0.0044    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54533588  2.73644642 -0.0368444 ]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.214358881
Warning: Did not converge.
Custom w = [-0.38235703 -0.00201035  0.00694507]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56578985  0.02787265  0.0032    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54532444  2.73645457 -0.03684487]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.2357947691
Warning: Did not converge.
Custom w = [-0.38235703 -0.00201035  0.00694507]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.5665965   0.02813209  0.0056    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54532434  2.73645326 -0.03684476]  test score =  ('correct', 0.934, 'failed', 0.066)


/Users/mich/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:121: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.

SVM Stochastic Gradient Descent Usage Example


In [12]:
# Generate data
'''Generate 2 Gaussians samples with the same covariance matrix'''
n, dim = 2*(10**7), 2
n, dim = 500, 2
np.random.seed(0)
C = np.array([[0., -0.23], [0.83, .23]])
gap = 1
X = np.r_[np.dot(np.random.randn(n, dim)+gap, C),
          np.dot(np.random.randn(n, dim)-gap, C)]

# append constant dimension
X = np.column_stack((X, np.ones(X.shape[0])))

y = np.hstack((-1*np.ones(n), np.ones(n)))

assert len(X[y==-1])==len(y[y==-1]);assert len(X[y==1])==len(y[y==1]);assert len(X)==len(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=20140210)

assert len(X_train)>1;assert len(X_test)>1;assert len(X_train)==len(y_train);assert len(X_test)==len(y_test)

max_iter=1000
C=1.0

for loss_type in ['modified_huber','hinge']:
    for h_index in range(1,10):
        h=(.1*(1.1**h_index))
        print 'parameters: loss_type',loss_type,'h',h
        w, w_stoch_array, obj_stoch_array, training_error_stoch_array, testing_error_stoch_array = \
        my_svm(X_train, 
            y_train, loss_type=loss_type, max_iter=max_iter,h=h,C=C, X_test=X_test, y_test=y_test,stochastic=True)

        print 'Custom w =',w,' test score = ',score(X_test, y_test, w=w)

        clf = SGDClassifier(loss=loss_type, penalty="l2",alpha=1/C, fit_intercept=False);clf.fit(X_train, y_train)
        assert clf.intercept_ == 0
        print 'SGDClassifier w = ',clf.coef_[0],' test score = ',score(X_test, y_test, w=clf.coef_[0])
        
        clf = LinearSVC( penalty="l2",C=C, fit_intercept=False)
        clf.fit(X_train, y_train)
        assert clf.intercept_ == 0
        print 'LinearSVC w = ',clf.coef_[0],' test score = ',score(X_test, y_test, w=clf.coef_[0])
        print
        print
#         break
#     break


parameters: loss_type modified_huber h 0.11
Warning: Did not converge.
Custom w = [-0.37885441 -0.00091213  0.00553882]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.4643678   0.12931546  0.00085613]  test score =  ('correct', 0.87, 'failed', 0.13)
LinearSVC w =  [-1.54531843  2.73645576 -0.03684222]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.121
Warning: Did not converge.
Custom w = [-0.3785888  -0.00085547  0.00555696]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46203111  0.12927274 -0.0046118 ]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532664  2.73645722 -0.03684584]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.1331
Warning: Did not converge.
Custom w = [-0.37835219 -0.00082919  0.0055649 ]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [ -4.63249554e-01   1.29314126e-01   2.71547916e-04]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532475  2.7364533  -0.03684598]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.14641
Warning: Did not converge.
Custom w = [-0.37804444 -0.00079358  0.00552722]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46091007  0.1301667  -0.00292313]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532601  2.73645738 -0.03684555]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.161051
Warning: Did not converge.
Custom w = [-0.37763034 -0.00075183  0.00555886]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46188417  0.13009975 -0.00293085]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532345  2.73645097 -0.03684801]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.1771561
Warning: Did not converge.
Custom w = [-0.37700592 -0.00065157  0.00561209]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [ -4.61115073e-01   1.29817001e-01  -2.92439657e-04]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532776  2.73645037 -0.03684651]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.19487171
Warning: Did not converge.
Custom w = [-0.37634854 -0.0005348   0.00562393]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46326275  0.12959528 -0.00104296]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532606  2.73645477 -0.03684623]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.214358881
Warning: Did not converge.
Custom w = [ -3.75504360e-01  -3.74103321e-04   5.54808350e-03]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46384184  0.12980073 -0.00406194]  test score =  ('correct', 0.872, 'failed', 0.128)
LinearSVC w =  [-1.54532361  2.73645138 -0.03684542]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type modified_huber h 0.2357947691
Warning: Did not converge.
Custom w = [ -3.74545898e-01  -1.92500008e-04   5.42817648e-03]  test score =  ('correct', 0.85, 'failed', 0.15)
SGDClassifier w =  [-0.46320157  0.12908163  0.00047513]  test score =  ('correct', 0.87, 'failed', 0.13)
LinearSVC w =  [-1.54532141  2.73645501 -0.03684269]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.11
Warning: Did not converge.
Custom w = [-0.38238113 -0.0019832   0.00690511]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56404019  0.02808877  0.0032    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54532562  2.7364554  -0.03684625]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.121
Warning: Did not converge.
Custom w = [-0.3823708  -0.00201441  0.00690597]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.5654445   0.02821349  0.0064    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54531839  2.73645417 -0.03684516]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.1331
Warning: Did not converge.
Custom w = [-0.38238353 -0.00198222  0.00689972]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56530399  0.02813416  0.0056    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.5453018   2.73646493 -0.0368618 ]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.14641
Warning: Did not converge.
Custom w = [-0.38242033 -0.00186304  0.00690273]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56645388  0.02821714  0.0064    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54532763  2.73645531 -0.03684406]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.161051
Warning: Did not converge.
Custom w = [-0.38237718 -0.00199772  0.00690331]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56741708  0.02805452  0.0052    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54532638  2.73645645 -0.03684238]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.1771561
Warning: Did not converge.
Custom w = [-0.38237143 -0.00203003  0.00689138]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56698557  0.02750416  0.0064    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54532434  2.73645157 -0.03684518]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.19487171
Warning: Did not converge.
Custom w = [-0.38241706 -0.00187345  0.00690275]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56486109  0.02836734  0.0064    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54532401  2.73645278 -0.03684508]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.214358881
Warning: Did not converge.
Custom w = [-0.38238167 -0.00198578  0.00690149]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56453334  0.02813802  0.0072    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.54532731  2.73644959 -0.03684597]  test score =  ('correct', 0.934, 'failed', 0.066)


parameters: loss_type hinge h 0.2357947691
Warning: Did not converge.
Custom w = [-0.38238245 -0.00196879  0.00691353]  test score =  ('correct', 0.848, 'failed', 0.152)
SGDClassifier w =  [-0.56505017  0.02906051  0.0056    ]  test score =  ('correct', 0.852, 'failed', 0.148)
LinearSVC w =  [-1.5453229   2.73645335 -0.0368461 ]  test score =  ('correct', 0.934, 'failed', 0.066)


/Users/mich/anaconda/lib/python2.7/site-packages/ipykernel/__main__.py:121: FutureWarning: comparison to `None` will result in an elementwise object comparison in the future.

In [5]:
%matplotlib nbagg
plt.clf()
plt.cla()

ax = plt.subplot(1,1,1)

w_array = np.asarray(w_array)
ax.scatter(w_array[:,0],w_array[:,1],marker='^',label='Non-Stochastic')

w_stoch_array = np.asarray(w_stoch_array)
ax.scatter(w_stoch_array[:,0],w_stoch_array[:,1],marker='*',label='Stochastic')

handles, labels = ax.get_legend_handles_labels()
plt.legend(handles, labels)
plt.title('First Two Dimensions of Hyperplane over iterations')
plt.ylabel('w [1]')
plt.xlabel('w [0]')


Out[5]:
<matplotlib.text.Text at 0x107b71390>

Here we notice that non-stochastic descent walks straight to the solution, but stochastic descent randomly walks around, eventually making it.


In [6]:
%matplotlib nbagg
plt.clf()
plt.cla()

ax = plt.subplot(1,1,1)

obj_array = np.asarray(obj_array)
ax.scatter(range(1,len(obj_array)+1),obj_array,marker='^',label='Non-Stochastic')

obj_stoch_array = np.asarray(obj_stoch_array)
ax.scatter(range(1,len(obj_stoch_array)+1),obj_stoch_array,marker='*',label='Stochastic')

handles, labels = ax.get_legend_handles_labels()
plt.legend(handles, labels)
plt.title('Objective Function over iterations')
plt.ylabel('F (w)')
plt.xlabel('Iteration')


Out[6]:
<matplotlib.text.Text at 0x107b80590>

Here we notice that both methods decrease the objective function quickly. However, non-stochastic decreases faster.


In [7]:
%matplotlib nbagg
plt.clf()
plt.cla()

ax = plt.subplot(1,1,1)

training_error_array = np.asarray(training_error_array)
testing_error_array = np.asarray(testing_error_array)
ax.scatter(range(1,len(training_error_array)+1),training_error_array[:,1],marker='^',label='training error')
ax.scatter(range(1,len(testing_error_array)+1),testing_error_array[:,1],marker='*',label='testing error')

training_error_stoch_array = np.asarray(training_error_stoch_array)
testing_error_stoch_array = np.asarray(testing_error_stoch_array)
ax.scatter(range(1,len(training_error_stoch_array)+1),training_error_stoch_array[:,1],marker='+',
           label='stochastic training error')
ax.scatter(range(1,len(testing_error_stoch_array)+1),testing_error_stoch_array[:,1],marker='x',
           label='stochastic testing error')

handles, labels = ax.get_legend_handles_labels()
plt.legend(handles, labels)
plt.title('Classification Error over iterations')
plt.ylabel('Classification Error')
plt.xlabel('Iteration')


Out[7]:
<matplotlib.text.Text at 0x1080243d0>

Here we notice that training error is lower than test error, that is to be expected. And stochastic error is higher than non-stochastic error, but they both converge.


In [8]:
%matplotlib nbagg
plt.clf()
plt.cla()

ax = plt.subplot(1,1,1)

# print w
x_plot=[w[0], 0]
y_plot=[w[1], 0]
ax.plot(x_plot,y_plot)

# print clf.coef_[0]
x_plot=[clf.coef_[0][0], 0]
y_plot=[clf.coef_[0][1], 0]
ax.plot(x_plot,y_plot)

ax.scatter((X[y==0])[:,0],(X[y==0])[:,1],marker='*')
ax.scatter((X[y==1])[:,0],(X[y==1])[:,1],marker='^')

handles, labels = ax.get_legend_handles_labels()
plt.title('Data, Scikit-learn Hyperplane, and Our Own Hyperplane')
plt.ylabel('y')
plt.xlabel('x')


Out[8]:
<matplotlib.text.Text at 0x108351710>