Deep Learning - Part I

Theory

Two important figures from Chapter 5:

Practical

Training an SVM in scikit-learn and choosing its hyperparameters using cross-validation. We are using a polynomial kernel and are tuning the polynomial degree of the kernel:

$ \kappa(\mathbf{u}, \mathbf{v}) = (\mathbf{u}^T \mathbf{v} + c)^d $

We are using the Iris flower data set first introduced by Ronald Fisher https://en.wikipedia.org/wiki/Iris_flower_data_set which contains:

  • 50 samples
  • 4 features (Sepal length, Sepal width, Petal length, Petal width)
  • 3 classes

In [1]:
import numpy as np
import pandas as pd
from sklearn import svm, datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV, train_test_split

In [2]:
# load iris data
iris = datasets.load_iris()
X = iris.data
y = iris.target

In [3]:
X[:3]


Out[3]:
array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2]])

In [4]:
y[:3]


Out[4]:
array([0, 0, 0])

Randomly select 20% of the samples as test set.


In [5]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Using cross-validation, try out $d=1,2,\ldots,20$. Use accuracy to determine the train/test error.


In [6]:
parameters = {'degree':list(range(1, 21))}
svc = svm.SVC(kernel='poly')
clf = GridSearchCV(svc, parameters, scoring='accuracy')
clf.fit(X_train, y_train)


Out[6]:
GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='poly',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'degree': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring='accuracy', verbose=0)

The cross-validation results can be loaded into a pandas DataFrame. We see that the model starts overfitting for polynomial degrees $>3$.


In [7]:
pd.DataFrame(clf.cv_results_)


Out[7]:
mean_fit_time mean_score_time mean_test_score mean_train_score param_degree params rank_test_score split0_test_score split0_train_score split1_test_score split1_train_score split2_test_score split2_train_score std_fit_time std_score_time std_test_score std_train_score
0 0.000769 0.000468 0.958333 0.974996 1 {'degree': 1} 1 0.975610 0.962025 0.900 1.0 1.000000 0.962963 0.000170 0.000123 0.042432 0.017685
1 0.000608 0.000312 0.950000 0.979216 2 {'degree': 2} 3 0.951220 0.974684 0.900 1.0 1.000000 0.962963 0.000009 0.000016 0.040575 0.015456
2 0.000837 0.000414 0.958333 0.987550 3 {'degree': 3} 1 0.975610 0.987342 0.900 1.0 1.000000 0.975309 0.000204 0.000106 0.042432 0.010081
3 0.012440 0.000452 0.933333 0.991770 4 {'degree': 4} 11 0.951220 1.000000 0.900 1.0 0.948718 0.975309 0.013862 0.000049 0.023592 0.011640
4 0.016245 0.000461 0.933333 1.000000 5 {'degree': 5} 11 0.951220 1.000000 0.900 1.0 0.948718 1.000000 0.011013 0.000054 0.023592 0.000000
5 0.013771 0.000355 0.933333 1.000000 6 {'degree': 6} 11 0.951220 1.000000 0.900 1.0 0.948718 1.000000 0.010346 0.000039 0.023592 0.000000
6 0.012751 0.000335 0.941667 1.000000 7 {'degree': 7} 9 0.975610 1.000000 0.900 1.0 0.948718 1.000000 0.010838 0.000045 0.031441 0.000000
7 0.024844 0.000338 0.950000 1.000000 8 {'degree': 8} 3 0.975610 1.000000 0.925 1.0 0.948718 1.000000 0.026482 0.000057 0.020808 0.000000
8 0.035246 0.000426 0.950000 1.000000 9 {'degree': 9} 3 0.975610 1.000000 0.925 1.0 0.948718 1.000000 0.039927 0.000010 0.020808 0.000000
9 0.047215 0.000335 0.950000 1.000000 10 {'degree': 10} 3 0.975610 1.000000 0.925 1.0 0.948718 1.000000 0.058638 0.000056 0.020808 0.000000
10 0.035092 0.000331 0.950000 1.000000 11 {'degree': 11} 3 0.975610 1.000000 0.925 1.0 0.948718 1.000000 0.041816 0.000055 0.020808 0.000000
11 0.048112 0.000344 0.941667 1.000000 12 {'degree': 12} 9 0.951220 1.000000 0.925 1.0 0.948718 1.000000 0.060093 0.000079 0.011829 0.000000
12 0.082170 0.000320 0.950000 1.000000 13 {'degree': 13} 3 0.951220 1.000000 0.950 1.0 0.948718 1.000000 0.109992 0.000050 0.001021 0.000000
13 0.125413 0.000335 0.933333 1.000000 14 {'degree': 14} 11 0.902439 1.000000 0.950 1.0 0.948718 1.000000 0.170076 0.000063 0.022263 0.000000
14 0.148595 0.000335 0.916667 1.000000 15 {'degree': 15} 15 0.878049 1.000000 0.925 1.0 0.948718 1.000000 0.202966 0.000058 0.029437 0.000000
15 0.177822 0.000345 0.916667 1.000000 16 {'degree': 16} 15 0.878049 1.000000 0.925 1.0 0.948718 1.000000 0.244669 0.000050 0.029437 0.000000
16 0.190446 0.000357 0.916667 1.000000 17 {'degree': 17} 15 0.878049 1.000000 0.925 1.0 0.948718 1.000000 0.262236 0.000068 0.029437 0.000000
17 0.182485 0.000333 0.908333 1.000000 18 {'degree': 18} 18 0.853659 1.000000 0.925 1.0 0.948718 1.000000 0.251188 0.000056 0.040546 0.000000
18 0.311858 0.000340 0.908333 1.000000 19 {'degree': 19} 18 0.853659 1.000000 0.925 1.0 0.948718 1.000000 0.428678 0.000068 0.040546 0.000000
19 0.337167 0.000333 0.900000 1.000000 20 {'degree': 20} 20 0.853659 1.000000 0.900 1.0 0.948718 1.000000 0.468121 0.000058 0.038796 0.000000

Finally, train the model with lowest mean test error in cross-validation on all training data and determine the error on the test set.


In [8]:
e = clf.estimator.fit(X_train, y_train)
e


Out[8]:
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='poly',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)

In [9]:
y_pred = e.predict(X_test)
accuracy_score(y_test, y_pred)


Out[9]:
1.0