超参优化

在机器学习的上下文中,超参数是在开始学习过程之前设置值的参数,而不是通过训练得到的参数数据.通常情况下,需要对超参数进行优化,给学习机选择一组最优超参数,以提高学习的性能和效果.

通过搜索超参数空间以便获得最好交叉验证分数是可能的而且是值得提倡的.

scikit-learn包中提供了两种采样搜索候选的通用方法:

  • 对于给定的值,网格搜索GridSearchCV考虑了所有参数组合

  • 随机搜索RandomizedSearchCV可以从具有指定分布的参数空间中抽取给定数量的候选

sklearn提供了如下相关接口:

  • model_selection.GridSearchCV(estimator, …)|网格搜索
  • model_selection.ParameterGrid(param_grid)|按网格穷举参数
  • model_selection.RandomizedSearchCV(…[, …])|使用随机搜索搜索超参
  • model_selection.ParameterSampler(…[, …])|通过指定的分布产生超参的生成器
  • model_selection.fit_grid_point(X, y, …[, …])|搜索器训练

而且GridSearchCVRandomizedSearchCV 可以通过使用关键字 n_jobs 可以使计算并行运.为-1表示有几个核跑几个进程

GridSearch

简单说就是在范围内穷举超参的组合


In [5]:
from sklearn import svm, datasets
from sklearn.model_selection import GridSearchCV,ParameterGrid,ParameterSampler,RandomizedSearchCV,fit_grid_point

In [1]:
param_grid = [
  {'C': [1, 10, 100, 1000], 'kernel': ['linear']},
  {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']},
 ]

In [4]:
list(ParameterGrid(param_grid))


Out[4]:
[{'C': 1, 'kernel': 'linear'},
 {'C': 10, 'kernel': 'linear'},
 {'C': 100, 'kernel': 'linear'},
 {'C': 1000, 'kernel': 'linear'},
 {'C': 1, 'gamma': 0.001, 'kernel': 'rbf'},
 {'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'},
 {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'},
 {'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'},
 {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'},
 {'C': 100, 'gamma': 0.0001, 'kernel': 'rbf'},
 {'C': 1000, 'gamma': 0.001, 'kernel': 'rbf'},
 {'C': 1000, 'gamma': 0.0001, 'kernel': 'rbf'}]

In [6]:
iris = datasets.load_iris()

In [7]:
svc = svm.SVC()
clf = GridSearchCV(svc, param_grid)
clf.fit(iris.data, iris.target)


Out[7]:
GridSearchCV(cv=None, error_score='raise',
       estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
       fit_params={}, iid=True, n_jobs=1,
       param_grid=[{'C': [1, 10, 100, 1000], 'kernel': ['linear']}, {'C': [1, 10, 100, 1000], 'gamma': [0.001, 0.0001], 'kernel': ['rbf']}],
       pre_dispatch='2*n_jobs', refit=True, return_train_score=True,
       scoring=None, verbose=0)

In [9]:
for i,v in clf.cv_results_.items():
    print(i)
    print(v)


split0_test_score
[ 1.          1.          1.          1.          0.90196078  0.90196078
  0.94117647  0.90196078  0.98039216  0.94117647  1.          0.98039216]
split1_test_score
[ 0.96078431  0.92156863  0.92156863  0.92156863  0.90196078  0.90196078
  0.92156863  0.90196078  0.96078431  0.92156863  0.96078431  0.96078431]
split2_test_score
[ 0.97916667  1.          0.97916667  1.          0.9375      0.9375
  0.97916667  0.9375      0.97916667  0.97916667  1.          0.97916667]
mean_test_score
[ 0.98        0.97333333  0.96666667  0.97333333  0.91333333  0.91333333
  0.94666667  0.91333333  0.97333333  0.94666667  0.98666667  0.97333333]
std_test_score
[ 0.01617914  0.03715363  0.03345566  0.03715363  0.0165782   0.0165782
  0.02371536  0.0165782   0.00902067  0.02371536  0.01857681  0.00902067]
rank_test_score
[ 2  3  7  3 10 10  8 10  3  8  1  3]
split0_train_score
[ 0.97979798  0.95959596  0.96969697  0.97979798  0.91919192  0.91919192
  0.93939394  0.91919192  0.96969697  0.93939394  0.96969697  0.96969697]
split1_train_score
[ 1.          1.          1.          1.          0.94949495  0.94949495
  0.96969697  0.94949495  0.98989899  0.96969697  1.          0.98989899]
split2_train_score
[ 0.99019608  0.98039216  0.98039216  0.98039216  0.89215686  0.89215686
  0.91176471  0.89215686  0.96078431  0.91176471  0.99019608  0.96078431]
mean_train_score
[ 0.98999802  0.97999604  0.98336304  0.98673005  0.92028124  0.92028124
  0.9402852   0.92028124  0.97346009  0.9402852   0.98663102  0.97346009]
std_train_score
[ 0.00824863  0.01649726  0.01254825  0.00938641  0.02342085  0.02342085
  0.02365914  0.02342085  0.01218023  0.02365914  0.01262539  0.01218023]
mean_fit_time
[ 0.00033339  0.          0.00033331  0.0006667   0.0006667   0.00066678
  0.00066678  0.00099993  0.00033347  0.00033331  0.00033331  0.00100017]
std_fit_time
[  4.71482745e-04   0.00000000e+00   4.71370354e-04   4.71426560e-04
   4.71426560e-04   4.71482745e-04   4.71482745e-04   0.00000000e+00
   4.71595137e-04   4.71370354e-04   4.71370354e-04   1.94667955e-07]
mean_score_time
[ 0.          0.00033339  0.          0.          0.          0.00033339
  0.00033331  0.00033339  0.00033339  0.00033331  0.          0.        ]
std_score_time
[ 0.          0.00047148  0.          0.          0.          0.00047148
  0.00047137  0.00047148  0.00047148  0.00047137  0.          0.        ]
param_C
[1 10 100 1000 1 1 10 10 100 100 1000 1000]
param_kernel
['linear' 'linear' 'linear' 'linear' 'rbf' 'rbf' 'rbf' 'rbf' 'rbf' 'rbf'
 'rbf' 'rbf']
param_gamma
[-- -- -- -- 0.001 0.0001 0.001 0.0001 0.001 0.0001 0.001 0.0001]
params
({'C': 1, 'kernel': 'linear'}, {'C': 10, 'kernel': 'linear'}, {'C': 100, 'kernel': 'linear'}, {'C': 1000, 'kernel': 'linear'}, {'C': 1, 'gamma': 0.001, 'kernel': 'rbf'}, {'C': 1, 'gamma': 0.0001, 'kernel': 'rbf'}, {'C': 10, 'gamma': 0.001, 'kernel': 'rbf'}, {'C': 10, 'gamma': 0.0001, 'kernel': 'rbf'}, {'C': 100, 'gamma': 0.001, 'kernel': 'rbf'}, {'C': 100, 'gamma': 0.0001, 'kernel': 'rbf'}, {'C': 1000, 'gamma': 0.001, 'kernel': 'rbf'}, {'C': 1000, 'gamma': 0.0001, 'kernel': 'rbf'})

RandomizedSearch

随机搜索如其名就是在范围内随机的搜索生成参数.它同样需要一个范围,但它也支持参数为一个scipy定义的分布.


In [13]:
import scipy
import numpy as np

In [14]:
np.random.seed(0)
param_grid = {'C': scipy.stats.expon(scale=100), 'gamma': scipy.stats.expon(scale=.1),
  'kernel': ['rbf'], 'class_weight':['balanced', None]}

In [15]:
list(ParameterSampler(param_grid, n_iter=4))


Out[15]:
[{'C': 79.587450816311005,
  'class_weight': None,
  'gamma': 0.18596042409118513,
  'kernel': 'rbf'},
 {'C': 195.1545320209259,
  'class_weight': None,
  'gamma': 0.055104849109549929,
  'kernel': 'rbf'},
 {'C': 103.81592949436094,
  'class_weight': 'balanced',
  'gamma': 0.035315914097266164,
  'kernel': 'rbf'},
 {'C': 5.8384670780703338,
  'class_weight': 'balanced',
  'gamma': 0.048360210090225335,
  'kernel': 'rbf'}]

In [16]:
clf =RandomizedSearchCV(svc, param_grid)
clf.fit(iris.data, iris.target)


Out[16]:
RandomizedSearchCV(cv=None, error_score='raise',
          estimator=SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape=None, degree=3, gamma='auto', kernel='rbf',
  max_iter=-1, probability=False, random_state=None, shrinking=True,
  tol=0.001, verbose=False),
          fit_params={}, iid=True, n_iter=10, n_jobs=1,
          param_distributions={'C': <scipy.stats._distn_infrastructure.rv_frozen object at 0x0000000017362390>, 'gamma': <scipy.stats._distn_infrastructure.rv_frozen object at 0x00000000173624E0>, 'kernel': ['rbf'], 'class_weight': ['balanced', None]},
          pre_dispatch='2*n_jobs', random_state=None, refit=True,
          return_train_score=True, scoring=None, verbose=0)

In [17]:
for i,v in clf.cv_results_.items():
    print(i)
    print(v)


split0_test_score
[ 1.          0.98039216  1.          1.          0.98039216  0.98039216
  0.96078431  1.          1.          0.98039216]
split1_test_score
[ 0.90196078  0.94117647  0.90196078  0.90196078  0.94117647  0.94117647
  0.96078431  0.90196078  0.90196078  0.94117647]
split2_test_score
[ 1.          0.97916667  1.          1.          1.          1.
  0.97916667  0.97916667  1.          1.        ]
mean_test_score
[ 0.96666667  0.96666667  0.96666667  0.96666667  0.97333333  0.97333333
  0.96666667  0.96        0.96666667  0.97333333]
std_test_score
[ 0.04644204  0.01830211  0.04644204  0.04644204  0.02441472  0.02441472
  0.00857493  0.04250721  0.04644204  0.02441472]
rank_test_score
[ 4  4  4  4  1  1  4 10  4  1]
split0_train_score
[ 0.97979798  0.96969697  0.96969697  0.95959596  0.95959596  0.95959596
  0.94949495  0.96969697  0.96969697  0.95959596]
split1_train_score
[ 1.          1.          1.          1.          1.          1.
  0.98989899  1.          1.          1.        ]
split2_train_score
[ 0.97058824  0.97058824  0.98039216  0.98039216  0.98039216  0.98039216
  0.96078431  0.98039216  0.98039216  0.98039216]
mean_train_score
[ 0.98346207  0.98009507  0.98336304  0.97999604  0.97999604  0.97999604
  0.96672608  0.98336304  0.98336304  0.97999604]
std_train_score
[ 0.01228365  0.01407961  0.01254825  0.01649726  0.01649726  0.01649726
  0.01702156  0.01254825  0.01254825  0.01649726]
mean_fit_time
[ 0.00033339  0.0006667   0.00033339  0.00033331  0.0006667   0.00033339
  0.00033339  0.00033339  0.0006667   0.00099993]
std_fit_time
[ 0.00047148  0.00047143  0.00047148  0.00047137  0.00047143  0.00047148
  0.00047148  0.00047148  0.00047143  0.        ]
mean_score_time
[ 0.00033331  0.          0.          0.          0.          0.00033331
  0.00033331  0.          0.          0.        ]
std_score_time
[ 0.00047137  0.          0.          0.          0.          0.00047137
  0.00047137  0.          0.          0.        ]
param_C
[156.88961399691684 7.3685354912847876 150.57842318652155
 61.892945861296852 15.47296827276452 30.728035309045374 1.8968571662383891
 287.79150787410231 119.61078011333701 23.620670533400776]
param_class_weight
[None None None None None None None None None None]
param_gamma
[0.065388256909147333 0.10446124955007757 0.20408922720361231
 0.11359389306783295 0.14213646962011073 0.13352901816120605
 0.016213649344010039 0.059774607745513735 0.35002136562021868
 0.13890426992173752]
param_kernel
['rbf' 'rbf' 'rbf' 'rbf' 'rbf' 'rbf' 'rbf' 'rbf' 'rbf' 'rbf']
params
({'C': 156.88961399691684, 'class_weight': None, 'gamma': 0.065388256909147333, 'kernel': 'rbf'}, {'C': 7.3685354912847876, 'class_weight': None, 'gamma': 0.10446124955007757, 'kernel': 'rbf'}, {'C': 150.57842318652155, 'class_weight': None, 'gamma': 0.20408922720361231, 'kernel': 'rbf'}, {'C': 61.892945861296852, 'class_weight': None, 'gamma': 0.11359389306783295, 'kernel': 'rbf'}, {'C': 15.47296827276452, 'class_weight': None, 'gamma': 0.14213646962011073, 'kernel': 'rbf'}, {'C': 30.728035309045374, 'class_weight': None, 'gamma': 0.13352901816120605, 'kernel': 'rbf'}, {'C': 1.8968571662383891, 'class_weight': None, 'gamma': 0.016213649344010039, 'kernel': 'rbf'}, {'C': 287.79150787410231, 'class_weight': None, 'gamma': 0.059774607745513735, 'kernel': 'rbf'}, {'C': 119.61078011333701, 'class_weight': None, 'gamma': 0.35002136562021868, 'kernel': 'rbf'}, {'C': 23.620670533400776, 'class_weight': None, 'gamma': 0.13890426992173752, 'kernel': 'rbf'})