Deep Learning - Part I

Theory

Two important figures from Chapter 5:

Practical

Training an SVM in scikit-learn and choosing its hyperparameters using cross-validation. We are using a polynomial kernel and are tuning the polynomial degree of the kernel:

$ \kappa(\mathbf{u}, \mathbf{v}) = (\mathbf{u}^T \mathbf{v} + c)^d $

We are using the Iris flower data set first introduced by Ronald Fisher https://en.wikipedia.org/wiki/Iris_flower_data_set which contains:

50 samples
4 features (Sepal length, Sepal width, Petal length, Petal width)
3 classes



In [1]:

    
import numpy as np
import pandas as pd
from sklearn import svm, datasets
from sklearn.metrics import accuracy_score
from sklearn.model_selection import GridSearchCV, train_test_split



In [2]:

    
# load iris data
iris = datasets.load_iris()
X = iris.data
y = iris.target



In [3]:

    
X[:3]









    Out[3]:





array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2]])



In [4]:

    
y[:3]









    Out[4]:





array([0, 0, 0])

Randomly select 20% of the samples as test set.



In [5]:

    
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Using cross-validation, try out $d=1,2,\ldots,20$. Use accuracy to determine the train/test error.



In [6]:

    
parameters = {'degree':list(range(1, 21))}
svc = svm.SVC(kernel='poly')
clf = GridSearchCV(svc, parameters, scoring='accuracy')
clf.fit(X_train, y_train)









    Out[6]:





GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='poly',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'degree': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]},
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring='accuracy', verbose=0)

The cross-validation results can be loaded into a pandas DataFrame. We see that the model starts overfitting for polynomial degrees $>3$.



In [7]:

    
pd.DataFrame(clf.cv_results_)









    Out[7]:







  
    
      
      mean_fit_time
      mean_score_time
      mean_test_score
      mean_train_score
      param_degree
      params
      rank_test_score
      split0_test_score
      split0_train_score
      split1_test_score
      split1_train_score
      split2_test_score
      split2_train_score
      std_fit_time
      std_score_time
      std_test_score
      std_train_score
    
  
  
    
      0
      0.000769
      0.000468
      0.958333
      0.974996
      1
      {'degree': 1}
      1
      0.975610
      0.962025
      0.900
      1.0
      1.000000
      0.962963
      0.000170
      0.000123
      0.042432
      0.017685
    
    
      1
      0.000608
      0.000312
      0.950000
      0.979216
      2
      {'degree': 2}
      3
      0.951220
      0.974684
      0.900
      1.0
      1.000000
      0.962963
      0.000009
      0.000016
      0.040575
      0.015456
    
    
      2
      0.000837
      0.000414
      0.958333
      0.987550
      3
      {'degree': 3}
      1
      0.975610
      0.987342
      0.900
      1.0
      1.000000
      0.975309
      0.000204
      0.000106
      0.042432
      0.010081
    
    
      3
      0.012440
      0.000452
      0.933333
      0.991770
      4
      {'degree': 4}
      11
      0.951220
      1.000000
      0.900
      1.0
      0.948718
      0.975309
      0.013862
      0.000049
      0.023592
      0.011640
    
    
      4
      0.016245
      0.000461
      0.933333
      1.000000
      5
      {'degree': 5}
      11
      0.951220
      1.000000
      0.900
      1.0
      0.948718
      1.000000
      0.011013
      0.000054
      0.023592
      0.000000
    
    
      5
      0.013771
      0.000355
      0.933333
      1.000000
      6
      {'degree': 6}
      11
      0.951220
      1.000000
      0.900
      1.0
      0.948718
      1.000000
      0.010346
      0.000039
      0.023592
      0.000000
    
    
      6
      0.012751
      0.000335
      0.941667
      1.000000
      7
      {'degree': 7}
      9
      0.975610
      1.000000
      0.900
      1.0
      0.948718
      1.000000
      0.010838
      0.000045
      0.031441
      0.000000
    
    
      7
      0.024844
      0.000338
      0.950000
      1.000000
      8
      {'degree': 8}
      3
      0.975610
      1.000000
      0.925
      1.0
      0.948718
      1.000000
      0.026482
      0.000057
      0.020808
      0.000000
    
    
      8
      0.035246
      0.000426
      0.950000
      1.000000
      9
      {'degree': 9}
      3
      0.975610
      1.000000
      0.925
      1.0
      0.948718
      1.000000
      0.039927
      0.000010
      0.020808
      0.000000
    
    
      9
      0.047215
      0.000335
      0.950000
      1.000000
      10
      {'degree': 10}
      3
      0.975610
      1.000000
      0.925
      1.0
      0.948718
      1.000000
      0.058638
      0.000056
      0.020808
      0.000000
    
    
      10
      0.035092
      0.000331
      0.950000
      1.000000
      11
      {'degree': 11}
      3
      0.975610
      1.000000
      0.925
      1.0
      0.948718
      1.000000
      0.041816
      0.000055
      0.020808
      0.000000
    
    
      11
      0.048112
      0.000344
      0.941667
      1.000000
      12
      {'degree': 12}
      9
      0.951220
      1.000000
      0.925
      1.0
      0.948718
      1.000000
      0.060093
      0.000079
      0.011829
      0.000000
    
    
      12
      0.082170
      0.000320
      0.950000
      1.000000
      13
      {'degree': 13}
      3
      0.951220
      1.000000
      0.950
      1.0
      0.948718
      1.000000
      0.109992
      0.000050
      0.001021
      0.000000
    
    
      13
      0.125413
      0.000335
      0.933333
      1.000000
      14
      {'degree': 14}
      11
      0.902439
      1.000000
      0.950
      1.0
      0.948718
      1.000000
      0.170076
      0.000063
      0.022263
      0.000000
    
    
      14
      0.148595
      0.000335
      0.916667
      1.000000
      15
      {'degree': 15}
      15
      0.878049
      1.000000
      0.925
      1.0
      0.948718
      1.000000
      0.202966
      0.000058
      0.029437
      0.000000
    
    
      15
      0.177822
      0.000345
      0.916667
      1.000000
      16
      {'degree': 16}
      15
      0.878049
      1.000000
      0.925
      1.0
      0.948718
      1.000000
      0.244669
      0.000050
      0.029437
      0.000000
    
    
      16
      0.190446
      0.000357
      0.916667
      1.000000
      17
      {'degree': 17}
      15
      0.878049
      1.000000
      0.925
      1.0
      0.948718
      1.000000
      0.262236
      0.000068
      0.029437
      0.000000
    
    
      17
      0.182485
      0.000333
      0.908333
      1.000000
      18
      {'degree': 18}
      18
      0.853659
      1.000000
      0.925
      1.0
      0.948718
      1.000000
      0.251188
      0.000056
      0.040546
      0.000000
    
    
      18
      0.311858
      0.000340
      0.908333
      1.000000
      19
      {'degree': 19}
      18
      0.853659
      1.000000
      0.925
      1.0
      0.948718
      1.000000
      0.428678
      0.000068
      0.040546
      0.000000
    
    
      19
      0.337167
      0.000333
      0.900000
      1.000000
      20
      {'degree': 20}
      20
      0.853659
      1.000000
      0.900
      1.0
      0.948718
      1.000000
      0.468121
      0.000058
      0.038796
      0.000000

Finally, train the model with lowest mean test error in cross-validation on all training data and determine the error on the test set.



In [8]:

    
e = clf.estimator.fit(X_train, y_train)
e









    Out[8]:





SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='poly',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False)



In [9]:

    
y_pred = e.predict(X_test)
accuracy_score(y_test, y_pred)









    Out[9]:





1.0

	mean_fit_time	mean_score_time	mean_test_score	mean_train_score	param_degree	params	rank_test_score	split0_test_score	split0_train_score	split1_test_score	split1_train_score	split2_test_score	split2_train_score	std_fit_time	std_score_time	std_test_score	std_train_score
0	0.000769	0.000468	0.958333	0.974996	1	{'degree': 1}	1	0.975610	0.962025	0.900	1.0	1.000000	0.962963	0.000170	0.000123	0.042432	0.017685
1	0.000608	0.000312	0.950000	0.979216	2	{'degree': 2}	3	0.951220	0.974684	0.900	1.0	1.000000	0.962963	0.000009	0.000016	0.040575	0.015456
2	0.000837	0.000414	0.958333	0.987550	3	{'degree': 3}	1	0.975610	0.987342	0.900	1.0	1.000000	0.975309	0.000204	0.000106	0.042432	0.010081
3	0.012440	0.000452	0.933333	0.991770	4	{'degree': 4}	11	0.951220	1.000000	0.900	1.0	0.948718	0.975309	0.013862	0.000049	0.023592	0.011640
4	0.016245	0.000461	0.933333	1.000000	5	{'degree': 5}	11	0.951220	1.000000	0.900	1.0	0.948718	1.000000	0.011013	0.000054	0.023592	0.000000
5	0.013771	0.000355	0.933333	1.000000	6	{'degree': 6}	11	0.951220	1.000000	0.900	1.0	0.948718	1.000000	0.010346	0.000039	0.023592	0.000000
6	0.012751	0.000335	0.941667	1.000000	7	{'degree': 7}	9	0.975610	1.000000	0.900	1.0	0.948718	1.000000	0.010838	0.000045	0.031441	0.000000
7	0.024844	0.000338	0.950000	1.000000	8	{'degree': 8}	3	0.975610	1.000000	0.925	1.0	0.948718	1.000000	0.026482	0.000057	0.020808	0.000000
8	0.035246	0.000426	0.950000	1.000000	9	{'degree': 9}	3	0.975610	1.000000	0.925	1.0	0.948718	1.000000	0.039927	0.000010	0.020808	0.000000
9	0.047215	0.000335	0.950000	1.000000	10	{'degree': 10}	3	0.975610	1.000000	0.925	1.0	0.948718	1.000000	0.058638	0.000056	0.020808	0.000000
10	0.035092	0.000331	0.950000	1.000000	11	{'degree': 11}	3	0.975610	1.000000	0.925	1.0	0.948718	1.000000	0.041816	0.000055	0.020808	0.000000
11	0.048112	0.000344	0.941667	1.000000	12	{'degree': 12}	9	0.951220	1.000000	0.925	1.0	0.948718	1.000000	0.060093	0.000079	0.011829	0.000000
12	0.082170	0.000320	0.950000	1.000000	13	{'degree': 13}	3	0.951220	1.000000	0.950	1.0	0.948718	1.000000	0.109992	0.000050	0.001021	0.000000
13	0.125413	0.000335	0.933333	1.000000	14	{'degree': 14}	11	0.902439	1.000000	0.950	1.0	0.948718	1.000000	0.170076	0.000063	0.022263	0.000000
14	0.148595	0.000335	0.916667	1.000000	15	{'degree': 15}	15	0.878049	1.000000	0.925	1.0	0.948718	1.000000	0.202966	0.000058	0.029437	0.000000
15	0.177822	0.000345	0.916667	1.000000	16	{'degree': 16}	15	0.878049	1.000000	0.925	1.0	0.948718	1.000000	0.244669	0.000050	0.029437	0.000000
16	0.190446	0.000357	0.916667	1.000000	17	{'degree': 17}	15	0.878049	1.000000	0.925	1.0	0.948718	1.000000	0.262236	0.000068	0.029437	0.000000
17	0.182485	0.000333	0.908333	1.000000	18	{'degree': 18}	18	0.853659	1.000000	0.925	1.0	0.948718	1.000000	0.251188	0.000056	0.040546	0.000000
18	0.311858	0.000340	0.908333	1.000000	19	{'degree': 19}	18	0.853659	1.000000	0.925	1.0	0.948718	1.000000	0.428678	0.000068	0.040546	0.000000
19	0.337167	0.000333	0.900000	1.000000	20	{'degree': 20}	20	0.853659	1.000000	0.900	1.0	0.948718	1.000000	0.468121	0.000058	0.038796	0.000000