Hyperopt-Sklearn Intro

Easiest possible thing

As an ML researcher, I want a quick way to do model selection implicitly, in order to get a baseline accuracy score for a new data set.



In [ ]:

    
# Skdata-based code
import skdata.iris.view
from skdata.base import SklearnClassifier
from hpsklearn.estimator import HyperoptEstimatorFactory

view = skdata.iris.view.KfoldClassification(5)
algo = SklearnClassifier(
    HyperoptEstimatorFactory(
        max_iter=25,  # -- consider also a time-based budget
    ))
mean_test_error = view.protocol(algo)
print 'mean test error:', mean_test_error

As an ML researcher, I want to evaluate a certain parly-defined model class, in order to do model-family comparisons. For example, PCA followed by SVM.



In [ ]:

    
from hpsklearn.components import svc, pca

algo_pca_svm = SklearnClassifier(
    HyperoptEstimatorFactory(
        max_iter=25,  # -- consider also a time-based budget
        preprocessing=[pca('pca')],
        classifier=svc('svc')))
mean_test_error = view.protocol(algo_pca_svm)
print 'mean test error:', mean_test_error

As a domain expert, I have a particular pre-processing that I believe reveals important patterns in my data. I would like to know how good a classifier can be built on top of my preprocessing algorithm.



In [ ]:

    
def my_feature_extractor(name, *kwargs):
    # Should return a pyll graph that evaluates to a Sklearn-compatible
    # feature-extraction component (i.e. with a transform() method)
    raise NotImplementedError()
    
algo_pca_svm = SklearnClassifier(
    HyperoptEstimatorFactory(
        max_iter=25,
        # -- consider an any_preprocessing() constructor that accepts
        #    lambdas which provide initial and final steps to all the
        #    default pre-processing pipelines.
        preprocessing=hp.choice('pp',[
            [my_feature_extractor('foo-pre-pca'), pca('post-foo-pca')],
            [my_feature_extractor('foo-alone')],
        ]),
        classifier=any_classifier('classif')))
mean_test_error = view.protocol(algo_pca_svm)
print 'mean test error:', mean_test_error