Title: Model Selection Using Grid Search
Slug: model_selection_tuning_using_grid_search
Summary: How to conduct grid search for model selection in scikit-learn for machine learning in Python.
Date: 2017-09-18 12:00
Category: Machine Learning
Tags: Model Selection
Authors: Chris Albon
In [1]:
# Load libraries
import numpy as np
from sklearn import datasets
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
# Set random seed
np.random.seed(0)
In [2]:
# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target
In [3]:
# Create a pipeline
pipe = Pipeline([('classifier', RandomForestClassifier())])
# Create space of candidate learning algorithms and their hyperparameters
search_space = [{'classifier': [LogisticRegression()],
'classifier__penalty': ['l1', 'l2'],
'classifier__C': np.logspace(0, 4, 10)},
{'classifier': [RandomForestClassifier()],
'classifier__n_estimators': [10, 100, 1000],
'classifier__max_features': [1, 2, 3]}]
In [4]:
# Create grid search
clf = GridSearchCV(pipe, search_space, cv=5, verbose=0)
In [5]:
# Fit grid search
best_model = clf.fit(X, y)
In [6]:
# View best model
best_model.best_estimator_.get_params()['classifier']
Out[6]:
In [7]:
# Predict target vector
best_model.predict(X)
Out[7]: