sklearn: SVM classification

In this example we will use Optunity to optimize hyperparameters for a support vector machine classifier (SVC) in scikit-learn. We will learn a model to distinguish digits 8 and 9 in the MNIST data set in two settings

tune SVM with RBF kernel
tune SVM with RBF, polynomial or linear kernel, that is choose the kernel function and its hyperparameters at once



In [1]:

    
import optunity
import optunity.metrics

# comment this line if you are running the notebook
import sklearn.svm
import numpy as np

Create the data set: we use the MNIST data set and will build models to distinguish digits 8 and 9.



In [2]:

    
from sklearn.datasets import load_digits
digits = load_digits()
n = digits.data.shape[0]

positive_digit = 8
negative_digit = 9

positive_idx = [i for i in range(n) if digits.target[i] == positive_digit]
negative_idx = [i for i in range(n) if digits.target[i] == negative_digit]

# add some noise to the data to make it a little challenging
original_data = digits.data[positive_idx + negative_idx, ...]
data = original_data + 5 * np.random.randn(original_data.shape[0], original_data.shape[1])
labels = [True] * len(positive_idx) + [False] * len(negative_idx)

First, lets see the performance of an SVC with default hyperparameters.



In [3]:

    
# compute area under ROC curve of default parameters
@optunity.cross_validated(x=data, y=labels, num_folds=5)
def svm_default_auroc(x_train, y_train, x_test, y_test):
    model = sklearn.svm.SVC().fit(x_train, y_train)
    decision_values = model.decision_function(x_test)
    auc = optunity.metrics.roc_auc(y_test, decision_values)
    return auc

svm_default_auroc()









    Out[3]:





0.7328666183635757

Tune SVC with RBF kernel

In order to use Optunity to optimize hyperparameters, we start by defining the objective function. We will use 5-fold cross-validated area under the ROC curve. For now, lets restrict ourselves to the RBF kernel and optimize $C$ and $\gamma$.

We start by defining the objective function svm_rbf_tuned_auroc(), which accepts $C$ and $\gamma$ as arguments.



In [4]:

    
#we will make the cross-validation decorator once, so we can reuse it later for the other tuning task
# by reusing the decorator, we get the same folds etc.
cv_decorator = optunity.cross_validated(x=data, y=labels, num_folds=5)

def svm_rbf_tuned_auroc(x_train, y_train, x_test, y_test, C, logGamma):
    model = sklearn.svm.SVC(C=C, gamma=10 ** logGamma).fit(x_train, y_train)
    decision_values = model.decision_function(x_test)
    auc = optunity.metrics.roc_auc(y_test, decision_values)
    return auc

svm_rbf_tuned_auroc = cv_decorator(svm_rbf_tuned_auroc)
# this is equivalent to the more common syntax below
# @optunity.cross_validated(x=data, y=labels, num_folds=5)
# def svm_rbf_tuned_auroc...

svm_rbf_tuned_auroc(C=1.0, logGamma=0.0)









    Out[4]:





0.5

Now we can use Optunity to find the hyperparameters that maximize AUROC.



In [5]:

    
optimal_rbf_pars, info, _ = optunity.maximize(svm_rbf_tuned_auroc, num_evals=150, C=[0, 10], logGamma=[-5, 0])
# when running this outside of IPython we can parallelize via optunity.pmap
# optimal_rbf_pars, _, _ = optunity.maximize(svm_rbf_tuned_auroc, 150, C=[0, 10], gamma=[0, 0.1], pmap=optunity.pmap)

print("Optimal parameters: " + str(optimal_rbf_pars))
print("AUROC of tuned SVM with RBF kernel: %1.3f" % info.optimum)









    



Optimal parameters: {'logGamma': -3.0716796875000005, 'C': 3.3025997497032007}
AUROC of tuned SVM with RBF kernel: 0.987

We can turn the call log into a pandas dataframe to efficiently inspect the solver trace.



In [6]:

    
import pandas
df = optunity.call_log2dataframe(info.call_log)

Lets look at the best 20 sets of hyperparameters to make sure the results are somewhat stable.



In [7]:

    
df.sort('value', ascending=False)[:10]

Tune SVC without deciding the kernel in advance

In the previous part we choose to use an RBF kernel. Even though the RBF kernel is known to work well for a large variety of problems (and yielded good accuracy here), our choice was somewhat arbitrary.

We will now use Optunity's conditional hyperparameter optimization feature to optimize over all kernel functions and their associated hyperparameters at once. This requires us to define the search space.



In [8]:

    
space = {'kernel': {'linear': {'C': [0, 2]},
                    'rbf': {'logGamma': [-5, 0], 'C': [0, 10]},
                    'poly': {'degree': [2, 5], 'C': [0, 5], 'coef0': [0, 2]}
                    }
         }

We will also have to modify the objective function to cope with conditional hyperparameters. The reason we need to do this explicitly is because scikit-learn doesn't like dealing with None values for irrelevant hyperparameters (e.g. degree when using an RBF kernel). Optunity will set all irrelevant hyperparameters in a given set to None.



In [9]:

    
def train_model(x_train, y_train, kernel, C, logGamma, degree, coef0):
    """A generic SVM training function, with arguments based on the chosen kernel."""
    if kernel == 'linear':
        model = sklearn.svm.SVC(kernel=kernel, C=C)
    elif kernel == 'poly':
        model = sklearn.svm.SVC(kernel=kernel, C=C, degree=degree, coef0=coef0)
    elif kernel == 'rbf':
        model = sklearn.svm.SVC(kernel=kernel, C=C, gamma=10 ** logGamma)
    else: 
        raise ArgumentError("Unknown kernel function: %s" % kernel)
    model.fit(x_train, y_train)
    return model

def svm_tuned_auroc(x_train, y_train, x_test, y_test, kernel='linear', C=0, logGamma=0, degree=0, coef0=0):
    model = train_model(x_train, y_train, kernel, C, logGamma, degree, coef0)
    decision_values = model.decision_function(x_test)
    return optunity.metrics.roc_auc(y_test, decision_values)

svm_tuned_auroc = cv_decorator(svm_tuned_auroc)

Now we are ready to go and optimize both kernel function and associated hyperparameters!



In [10]:

    
optimal_svm_pars, info, _ = optunity.maximize_structured(svm_tuned_auroc, space, num_evals=150)
print("Optimal parameters" + str(optimal_svm_pars))
print("AUROC of tuned SVM: %1.3f" % info.optimum)









    



Optimal parameters{'kernel': 'rbf', 'C': 3.634209495387873, 'coef0': None, 'degree': None, 'logGamma': -3.6018043228483627}
AUROC of tuned SVM: 0.990

Again, we can have a look at the best sets of hyperparameters based on the call log.



In [11]:

    
df = optunity.call_log2dataframe(info.call_log)
df.sort('value', ascending=False)









    Out[11]:






  
    
      
      C
      coef0
      degree
      kernel
      logGamma
      value
    
  
  
    
      147
       3.806445
      NaN
      NaN
          rbf
      -3.594290
       0.990134
    
    
      124
       3.634209
      NaN
      NaN
          rbf
      -3.601804
       0.990134
    
    
      144
       4.350397
      NaN
      NaN
          rbf
      -3.539531
       0.990128
    
    
      82 
       5.998112
      NaN
      NaN
          rbf
      -3.611495
       0.989975
    
    
      75 
       2.245622
      NaN
      NaN
          rbf
      -3.392871
       0.989965
    
    
      139
       4.462613
      NaN
      NaN
          rbf
      -3.391728
       0.989965
    
    
      111
       2.832370
      NaN
      NaN
          rbf
      -3.384538
       0.989965
    
    
      92 
       5.531445
      NaN
      NaN
          rbf
      -3.378162
       0.989965
    
    
      121
       3.299037
      NaN
      NaN
          rbf
      -3.617871
       0.989818
    
    
      99 
       2.812451
      NaN
      NaN
          rbf
      -3.547038
       0.989810
    
    
      129
       4.212451
      NaN
      NaN
          rbf
      -3.518478
       0.989809
    
    
      135
       3.921212
      NaN
      NaN
          rbf
      -3.422389
       0.989800
    
    
      90 
       3.050174
      NaN
      NaN
          rbf
      -3.431659
       0.989800
    
    
      103
       3.181445
      NaN
      NaN
          rbf
      -3.525796
       0.989650
    
    
      93 
       2.714779
      NaN
      NaN
          rbf
      -3.292463
       0.989641
    
    
      89 
       2.345784
      NaN
      NaN
          rbf
      -3.313704
       0.989641
    
    
      149
       3.995946
      NaN
      NaN
          rbf
      -3.303042
       0.989641
    
    
      100
       3.516840
      NaN
      NaN
          rbf
      -3.664992
       0.989500
    
    
      119
       3.745784
      NaN
      NaN
          rbf
      -3.678403
       0.989500
    
    
      125
       4.387879
      NaN
      NaN
          rbf
      -3.486348
       0.989485
    
    
      24 
       1.914779
      NaN
      NaN
          rbf
      -3.476204
       0.989484
    
    
      136
       5.865572
      NaN
      NaN
          rbf
      -3.226204
       0.989483
    
    
      80 
       2.583507
      NaN
      NaN
          rbf
      -3.198326
       0.989482
    
    
      146
       5.398905
      NaN
      NaN
          rbf
      -3.459538
       0.989325
    
    
      102
       5.558878
      NaN
      NaN
          rbf
      -3.467218
       0.989325
    
    
      108
       2.721828
      NaN
      NaN
          rbf
      -3.463704
       0.989325
    
    
      98 
       2.255162
      NaN
      NaN
          rbf
      -3.230371
       0.989324
    
    
      64 
       1.686680
      NaN
      NaN
          rbf
      -3.240209
       0.989320
    
    
      140
       3.965939
      NaN
      NaN
          rbf
      -3.241095
       0.989320
    
    
      34 
       2.381445
      NaN
      NaN
          rbf
      -3.242871
       0.989320
    
    
      ...
      ...
      ...
      ...
      ...
      ...
      ...
    
    
      68 
       1.608145
      NaN
      NaN
          rbf
      -2.530371
       0.979475
    
    
      106
       5.681445
      NaN
      NaN
          rbf
      -2.526204
       0.979156
    
    
      50 
       1.477928
      NaN
      NaN
          rbf
      -2.498326
       0.977076
    
    
      35 
       2.081445
      NaN
      NaN
          rbf
      -2.459538
       0.974526
    
    
      15 
       3.014779
      NaN
      NaN
          rbf
      -2.459538
       0.974526
    
    
      71 
       1.464779
      NaN
      NaN
          rbf
      -2.451204
       0.973405
    
    
      49 
       2.239779
      NaN
      NaN
          rbf
      -2.380371
       0.969723
    
    
      9  
       4.106445
      NaN
      NaN
          rbf
      -2.380371
       0.969723
    
    
      53 
       3.648112
      NaN
      NaN
          rbf
      -2.359129
       0.968756
    
    
      17 
       0.131419
      NaN
      NaN
       linear
            NaN
       0.967925
    
    
      6  
       1.913086
      NaN
      NaN
       linear
            NaN
       0.967925
    
    
      26 
       1.726419
      NaN
      NaN
       linear
            NaN
       0.967925
    
    
      7  
       0.038086
      NaN
      NaN
       linear
            NaN
       0.967925
    
    
      27 
       0.224753
      NaN
      NaN
       linear
            NaN
       0.967925
    
    
      16 
       1.819753
      NaN
      NaN
       linear
            NaN
       0.967925
    
    
      37 
       0.318086
      NaN
      NaN
       linear
            NaN
       0.967925
    
    
      58 
       2.074811
      NaN
      NaN
          rbf
      -2.297038
       0.964444
    
    
      61 
       1.931445
      NaN
      NaN
          rbf
      -2.217871
       0.960290
    
    
      19 
       3.639779
      NaN
      NaN
          rbf
      -2.147038
       0.958086
    
    
      39 
       2.706445
      NaN
      NaN
          rbf
      -2.147038
       0.958086
    
    
      43 
       4.114779
      NaN
      NaN
          rbf
      -2.125796
       0.957296
    
    
      48 
       2.541478
      NaN
      NaN
          rbf
      -2.063704
       0.954737
    
    
      51 
       2.398112
      NaN
      NaN
          rbf
      -1.984538
       0.951944
    
    
      29 
       3.173112
      NaN
      NaN
          rbf
      -1.913704
       0.942719
    
    
      41 
       2.864779
      NaN
      NaN
          rbf
      -1.751204
       0.634160
    
    
      11 
       4.264779
      NaN
      NaN
          rbf
      -1.051204
       0.500000
    
    
      31 
       3.331445
      NaN
      NaN
          rbf
      -1.517871
       0.500000
    
    
      1  
       4.731445
      NaN
      NaN
          rbf
      -0.817871
       0.500000
    
    
      8  
       1.606445
      NaN
      NaN
          rbf
      -1.130371
       0.500000
    
    
      21 
       3.798112
      NaN
      NaN
          rbf
      -1.284538
       0.500000
    
  

150 rows × 6 columns

	C	logGamma	value
149	3.822811	-3.074680	0.987413
92	3.302600	-3.071680	0.987413
145	3.259690	-3.033531	0.987252
14	3.542839	-3.080013	0.987237
131	3.232732	-3.080968	0.987237
53	7.328411	-3.103471	0.987237
70	3.632562	-3.088346	0.987237
146	3.067660	-3.091143	0.987237
124	2.566381	-3.114649	0.987237
100	3.340268	-3.092535	0.987237

	C	coef0	degree	kernel	logGamma	value
147	3.806445	NaN	NaN	rbf	-3.594290	0.990134
124	3.634209	NaN	NaN	rbf	-3.601804	0.990134
144	4.350397	NaN	NaN	rbf	-3.539531	0.990128
82	5.998112	NaN	NaN	rbf	-3.611495	0.989975
75	2.245622	NaN	NaN	rbf	-3.392871	0.989965
139	4.462613	NaN	NaN	rbf	-3.391728	0.989965
111	2.832370	NaN	NaN	rbf	-3.384538	0.989965
92	5.531445	NaN	NaN	rbf	-3.378162	0.989965
121	3.299037	NaN	NaN	rbf	-3.617871	0.989818
99	2.812451	NaN	NaN	rbf	-3.547038	0.989810
129	4.212451	NaN	NaN	rbf	-3.518478	0.989809
135	3.921212	NaN	NaN	rbf	-3.422389	0.989800
90	3.050174	NaN	NaN	rbf	-3.431659	0.989800
103	3.181445	NaN	NaN	rbf	-3.525796	0.989650
93	2.714779	NaN	NaN	rbf	-3.292463	0.989641
89	2.345784	NaN	NaN	rbf	-3.313704	0.989641
149	3.995946	NaN	NaN	rbf	-3.303042	0.989641
100	3.516840	NaN	NaN	rbf	-3.664992	0.989500
119	3.745784	NaN	NaN	rbf	-3.678403	0.989500
125	4.387879	NaN	NaN	rbf	-3.486348	0.989485
24	1.914779	NaN	NaN	rbf	-3.476204	0.989484
136	5.865572	NaN	NaN	rbf	-3.226204	0.989483
80	2.583507	NaN	NaN	rbf	-3.198326	0.989482
146	5.398905	NaN	NaN	rbf	-3.459538	0.989325
102	5.558878	NaN	NaN	rbf	-3.467218	0.989325
108	2.721828	NaN	NaN	rbf	-3.463704	0.989325
98	2.255162	NaN	NaN	rbf	-3.230371	0.989324
64	1.686680	NaN	NaN	rbf	-3.240209	0.989320
140	3.965939	NaN	NaN	rbf	-3.241095	0.989320
34	2.381445	NaN	NaN	rbf	-3.242871	0.989320
...	...	...	...	...	...	...
68	1.608145	NaN	NaN	rbf	-2.530371	0.979475
106	5.681445	NaN	NaN	rbf	-2.526204	0.979156
50	1.477928	NaN	NaN	rbf	-2.498326	0.977076
35	2.081445	NaN	NaN	rbf	-2.459538	0.974526
15	3.014779	NaN	NaN	rbf	-2.459538	0.974526
71	1.464779	NaN	NaN	rbf	-2.451204	0.973405
49	2.239779	NaN	NaN	rbf	-2.380371	0.969723
9	4.106445	NaN	NaN	rbf	-2.380371	0.969723
53	3.648112	NaN	NaN	rbf	-2.359129	0.968756
17	0.131419	NaN	NaN	linear	NaN	0.967925
6	1.913086	NaN	NaN	linear	NaN	0.967925
26	1.726419	NaN	NaN	linear	NaN	0.967925
7	0.038086	NaN	NaN	linear	NaN	0.967925
27	0.224753	NaN	NaN	linear	NaN	0.967925
16	1.819753	NaN	NaN	linear	NaN	0.967925
37	0.318086	NaN	NaN	linear	NaN	0.967925
58	2.074811	NaN	NaN	rbf	-2.297038	0.964444
61	1.931445	NaN	NaN	rbf	-2.217871	0.960290
19	3.639779	NaN	NaN	rbf	-2.147038	0.958086
39	2.706445	NaN	NaN	rbf	-2.147038	0.958086
43	4.114779	NaN	NaN	rbf	-2.125796	0.957296
48	2.541478	NaN	NaN	rbf	-2.063704	0.954737
51	2.398112	NaN	NaN	rbf	-1.984538	0.951944
29	3.173112	NaN	NaN	rbf	-1.913704	0.942719
41	2.864779	NaN	NaN	rbf	-1.751204	0.634160
11	4.264779	NaN	NaN	rbf	-1.051204	0.500000
31	3.331445	NaN	NaN	rbf	-1.517871	0.500000
1	4.731445	NaN	NaN	rbf	-0.817871	0.500000
8	1.606445	NaN	NaN	rbf	-1.130371	0.500000
21	3.798112	NaN	NaN	rbf	-1.284538	0.500000