This notebook contains an excerpt from the book Machine Learning for OpenCV by Michael Beyeler. The code is released under the MIT license, and is available on GitHub.

Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations. If you find this content useful, please consider supporting the work by buying the book!

Tuning Hyperparameters with Grid Search

The most commonly used tool for hyperparameter tuning is grid search, which is basically a fancy term for saying we will try all possible parameter combinations with a for loop.

Let's have a look at how that is done in practice.

Returning to our $k$-NN classifier, we find that we have only one hyperparameter to tune: $k$. Typically, you would have a much larger number of open parameters to mess with, but the $k$-NN algorithm is simple enough for us to manually implement grid search.

Before we get started, we need to split the dataset as we have done before into training and test sets. Here we choose a 75-25 split:


In [1]:
from sklearn.datasets import load_iris
import numpy as np
iris = load_iris()
X = iris.data.astype(np.float32)
y = iris.target

In [2]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, random_state=37
)

Then the goal is to loop over all possible values of $k$. As we do this, we want to keep track of the best accuracy we observed as well as the value for $k$ that gave rise to this result:


In [3]:
best_acc = 0
best_k = 0

Grid search then looks like an outer loop around the entire train and test procedure. After calculating the accuracy on the test set (acc), we compare it to the best accuracy found so far (best_acc). If the new value is better, we update our bookkeeping variables and move on to the next iteration:


In [4]:
import cv2
from sklearn.metrics import accuracy_score
for k in range(1, 20):
    knn = cv2.ml.KNearest_create()
    knn.setDefaultK(k)
    knn.train(X_train, cv2.ml.ROW_SAMPLE, y_train)
    _, y_test_hat = knn.predict(X_test)
    acc = accuracy_score(y_test, y_test_hat)
    if acc > best_acc:
        best_acc = acc
        best_k = k

When we are done, we can have a look at the best accuracy:


In [5]:
best_acc, best_k


Out[5]:
(0.97368421052631582, 1)

Turns out, we can get 97.4% accuracy using $k=1$.

How would you do this when you have more than one hyperparameter? Refer to the book to find the answer to this one (p.318).

Understanding the value of a validation set

Following our best practice of splitting the data into training and test sets, we might be tempted to tell people that we have found a model that performs with 97.4% accuracy on the dataset. However, our result might not necessarily generalize to new data. The argument is the same as earlier on in the book when we warranted the train-test split that we need an independent dataset for evaluation.

However, when we implemented grid search in the last section, we used the test set to evaluate the outcome of the grid search and update the hyperparameter $k$. This means we can no longer use the test set to evaluate the final data! Any model choices made based on the test set accuracy would leak information from the test set into the model.

One way to resolve this data is to split the data again and introduce what is known as a validation set. The validation set is different from the training and test set and is used exclusively for selecting the best parameters of the model. It is a good practice to do all exploratory analysis and model selection on this validation set and keep a separate test set, which is only used for the final evaluation.

In other words, we should end up splitting the data into three different sets:

  • a training set, which is used to build the model
  • a validation set, which is used to select the parameters of the model
  • a test set, which is used to evaluate the performance of the final model

In practice, the three-way split is achieved in two steps.

First, split the data into two chunks: one that contains training and validation sets and another that contains the test set:


In [6]:
X_trainval, X_test, y_trainval, y_test = train_test_split(
    X, y, random_state=37
)

In [7]:
X_trainval.shape


Out[7]:
(112, 4)

Second, split X_trainval again into proper training and validation sets:


In [8]:
X_train, X_valid, y_train, y_valid = train_test_split(
    X_trainval, y_trainval, random_state=37
)

In [9]:
X_train.shape


Out[9]:
(84, 4)

Then we repeat the manual grid search from the preceding code, but this time, we will use the validation set to find the best $k$:


In [10]:
best_acc = 0.0
best_k = 0
for k in range(1, 20):
    knn = cv2.ml.KNearest_create()
    knn.setDefaultK(k)
    knn.train(X_train, cv2.ml.ROW_SAMPLE, y_train)
    _, y_valid_hat = knn.predict(X_valid)
    acc = accuracy_score(y_valid, y_valid_hat)
    if acc >= best_acc:
        best_acc = acc
        best_k = k
best_acc, best_k


Out[10]:
(1.0, 7)

We now find that a 100% validation score (best_acc) can be achieved with $k=7$ (best_k)! However, recall that this score might be overly optimistic. To find out how well the model really performs, we need to test it on held-out data from the test set.

In order to arrive at our final model, we can use the value for $k$ we found during grid search and re-train the model on both the training and validation data. This way, we used as much data as possible to build the model while still honoring the train-test split principle.

This means we should retrain the model on X_trainval, which contains both the training and validation sets and score it on the test set:


In [11]:
knn = cv2.ml.KNearest_create()
knn.setDefaultK(best_k)
knn.train(X_trainval, cv2.ml.ROW_SAMPLE, y_trainval)
_, y_test_hat = knn.predict(X_test)
accuracy_score(y_test, y_test_hat), best_k


Out[11]:
(0.94736842105263153, 7)

With this procedure, we find a formidable score of 94.7% accuracy on the test set. Because we honored the train-test split principle, we can now be sure that this is the performance we can expect from the classifier when applied to novel data. It is not as high as the 100% accuracy reported during validation, but it is still a very good score!

Combining grid search with cross-validation

One potential danger of the grid search we just implemented is that the outcome might be relatively sensitive to how exactly we split the data. After all, we might have accidentally chosen a split that put most of the easy-to-classify data points in the test set, resulting in an overly optimistic score. Although we would be happy at first, as soon as we tried the model on some new held-out data, we would find that the actual performance of the classifier is much lower than expected.

Instead, we can combine grid search with cross-validation. This way, the data is split multiple times into training and validation sets, and cross-validation is performed at every step of the grid search to evaluate every parameter combination.

Because grid search with cross-validation is such a commonly used method for hyperparameter tuning, scikit-learn provides the GridSearchCV class, which implements it in the form of an estimator.

We can specify all the parameters we want GridSearchCV to search over by using a dictionary. Every entry of the dictionary should be of the form {name: values}, where name is a string that should be equivalent to the parameter name usually passed to the classifier, and values is a list of values to try.

For example, in order to search for the best value of the parameter n_neighbors of the KNeighborsClassifier class, we would design the parameter dictionary as follows:


In [12]:
param_grid = {'n_neighbors': range(1, 20)}

Here, we are searching for the best $k$ in the range [1, 19]. We then need to pass the parameter grid as well as the classifier (KNeighborsClassifier) to the GridSearchCV object:


In [13]:
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import GridSearchCV
grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)

Then we can train the classifier using the fit method. In return, scikit-learn will inform us about all the parameters used in the grid search:


In [14]:
grid_search.fit(X_trainval, y_trainval)


Out[14]:
GridSearchCV(cv=5, error_score='raise',
       estimator=KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='uniform'),
       fit_params={}, iid=True, n_jobs=1,
       param_grid={'n_neighbors': range(1, 20)}, pre_dispatch='2*n_jobs',
       refit=True, return_train_score=True, scoring=None, verbose=0)

This will allow us to find the best validation score and the corresponding value for $k$:


In [15]:
grid_search.best_score_, grid_search.best_params_


Out[15]:
(0.9642857142857143, {'n_neighbors': 3})

We thus get a validation score of 96.4% for $k=3$. Since grid search with cross-validation is more robust than our earlier procedure, we would expect the validation scores to be more realistic than the 100% accuracy we found before.

However, from the previous section, we know that this score might still be overly optimistic, so we need to score the classifier on the test set instead:


In [16]:
grid_search.score(X_test, y_test)


Out[16]:
0.97368421052631582

And to our surprise, the test score is even better.