Machine learning algorithms often have a number of hyperparameters whose values must be chosen by the practitioner. For example, an optimization algorithm may have a step size, a decay rate, and a regularization coefficient. In a deep network, the network parameterization itself (e.g., the number of layers and the number of units per layer) can be considered a hyperparameter.

Choosing these parameters can be challenging, and so a common practice is to search over the space of hyperparameters. One approach that works surprisingly well is to randomly sample different options.

Problem Setup

Suppose that we want to train a convolutional network, but we aren't sure how to choose the following hyperparameters:

  • the learning rate

  • the batch size

  • the dropout probability

  • the standard deviation of the distribution from which to initialize the network weights

Suppose that we've defined a remote function train_cnn_and_compute_accuracy, which takes values for these hyperparameters as its input (along with the dataset), trains a convolutional network using those hyperparameters, and returns the accuracy of the trained model on a validation set.


In [1]:
import numpy as np
import ray

@ray.remote
def train_cnn_and_compute_accuracy(hyperparameters,
                                  train_images,
                                  train_labels,
                                  validation_images,
                                  validation_labels):
    # Construct a deep network, train it, and return the accuracy on the 
    # validation data.
    return np.random.uniform(0, 1)

Basic random search

Something that works surprisingly well is to try random values for the hyperparameters. For example, we can write a function that randomly generates hyperparameter configurations.


In [2]:
def generate_hyperparameters():
    # Randomly choose values for the hyperparameters.
    return {"learning_rate": 10 ** np.random.uniform(-5, 5),
           "batch_size": np.random.randint(1, 100),
           "dropout": np.random.uniform(0, 1),
           "stddev": 10 ** np.random.uniform(-5, 5)}

In addition, let's assume that we've started Ray and loaded some data.


In [3]:
import ray

ray.init()

from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets("MNIST_data", one_hot=True)
train_images = ray.put(mnist.train.images)
train_labels = ray.put(mnist.train.labels)
validation_images = ray.put(mnist.validation.images)
validation_labels = ray.put(mnist.validation.labels)


Waiting for redis server at 127.0.0.1:10959 to respond...
Waiting for redis server at 127.0.0.1:48866 to respond...
Starting local scheduler with 8 CPUs, 0 GPUs

======================================================================
View the web UI at http://localhost:8899/notebooks/ray_ui49161.ipynb?token=048871364b51ded2148b8d14fefe4308670f876157c3af02
======================================================================

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz

Then basic random hyperparameter search looks something like this. We launch a bunch of experiments, and we get the results.


In [4]:
# Generate a bunch of hyperparameter configurations.
hyperparameter_configurations = [generate_hyperparameters() for _ in 
                                 range(20)]

# Launch some experiments.
results = []
for hyperparameters in hyperparameter_configurations:
    results.append(train_cnn_and_compute_accuracy.remote(hyperparameters,
                                                        train_images,
                                                        train_labels,
                                                        validation_images,
                                                        validation_labels))
# Get the results.
accuracies = ray.get(results)

In [5]:
print(accuracies)


[0.171062082542895, 0.3552355790101438, 0.6345706299988285, 0.8511016250887493, 0.23033498089853077, 0.5393103197033947, 0.0421369303859066, 0.23900207909059545, 0.0277995861542496, 0.825001234254735, 0.1501638171059968, 0.3622482800389978, 0.8632430265668725, 0.4304847593621731, 0.8523370390308364, 0.3239056614812661, 0.4237274312998158, 0.7601984768615376, 0.24814550723928352, 0.3733094783161002]

Then we can inspect the contents of accuracies and see which set of hyperparameters worked the best. Note that in the above example, the for loop will run instantaneously and the program will block in the call to ray.get, which will wait until all of the experiments have finished.

Processing results as they become available

One problem with the above approach is that you have to wait for all of the experiments to finish before you can process the results. Instead, you may want to process the results as they become available, perhaps in order to adaptively choose new experiments to run, or perhaps simply so you know how well the experiments are doing. To process the results as they become available, we can use the ray.wait primitive.

The most simple usage is the following. This example is implemented in more detail in driver.py.


In [6]:
# Launch some experiments.
remaining_ids = []
for hyperparameters in hyperparameter_configurations:
    remaining_ids.append(train_cnn_and_compute_accuracy.remote(hyperparameters,
                                                              train_images,
                                                              train_labels,
                                                              validation_images,
                                                              validation_labels))
# Whenever a new experiment finishes, print the value and start a new
# experiment
for i in range(10):
    ready_ids, remaining_ids = ray.wait(remaining_ids, num_returns=1)
    accuracy = ray.get(ready_ids[0])
    print("Accuracy is {}".format(accuracy))
    # Start a new experiment.
    new_hyperparameters = generate_hyperparameters()
    remaining_ids.append(train_cnn_and_compute_accuracy.remote(new_hyperparameters,
                                                              train_images,
                                                              train_labels,
                                                              validation_images,
                                                              validation_labels))


Accuracy is 0.8157981306122437
Accuracy is 0.8409265676418705
Accuracy is 0.7009304799956888
Accuracy is 0.05666791962265616
Accuracy is 0.856091538630789
Accuracy is 0.5127068811444101
Accuracy is 0.6275496824674608
Accuracy is 0.7059641762120641
Accuracy is 0.053909642467126595
Accuracy is 0.2970829854952617

More sophisticated hyperparameter search

Hyperparameter search algorithms can get much more sophisticated. So far, we’ve been treating the function train_cnn_and_compute_accuracy as a black box, that we can choose its inputs and inspect its outputs, but once we decide to run it, we have to run it until it finishes.

However, there is often more structure to be exploited. For example, if the training procedure is going poorly, we can end the session early and invest more resources in the more promising hyperparameter experiments. And if we’ve saved the state of the training procedure, we can always restart it again later.

This is one of the ideas of the Hyperband algorithm. Start with a huge number of hyperparameter configurations, aggressively stop the bad ones, and invest more resources in the promising experiments.

To implement this, we can first adapt our training method to optionally take a model and to return the updated model.


In [7]:
@ray.remote
def train_cnn_and_compute_accuracy(hyperparameters, model=None):
    # Construct a deep network, train it, and return the accuracy on the 
    # validation data as well as the latest version of the model. If the 
    # model argument is not None, this will continue training an existing
    # model.
    validation_accuracy = np.random.uniform(0, 1)
    new_model = model
    return validation_accuracy, new_model

Here’s a different variant that uses the same principles. Divide each training session into a series of shorter training sessions. Whenever a short session finishes, if it still looks promising, then continue running it. If it isn’t doing well, then terminate it and start a new experiment.


In [8]:
import numpy as np

def is_promising(model):
    # Return true if the model is doing well and false otherwise. In 
    # practice, this function will want more information than just the
    # model.
    return np.random.choice([True, False])

# Start 10 experiments.
remaining_ids = []
for _ in range(10):
    experiment_id = train_cnn_and_compute_accuracy.remote(hyperparameters,
                                                         model=None)
    remaining_ids.append(experiment_id)
    
accuracies = []
for i in range(100):
    # Whenever a segment of an experiment finishes, decide if it looks
    # promising or not. 
    ready_ids, remaining_ids = ray.wait(remaining_ids, num_returns=1)
    experiment_id = ready_ids[0]
    current_accuracy, current_model = ray.get(experiment_id)
    accuracies.append(current_accuracy)
    
    if is_promising(experiment_id):
        # Continue running the experiment.
        experiment_id = train_cnn_and_compute_accuracy.remote(hyperparameters,
                                                             model=current_model)
    else:
        # Start a new experiment.
        experiment_id = train_cnn_and_compute_accuracy.remote(hyperparameters)

    remaining_ids.append(experiment_id)

In [11]:
print(accuracies)


[0.0032888629991347784, 0.3394687511913411, 0.06441167430794892, 0.29650494448578724, 0.40546441265999056, 0.08033897243135579, 0.008924158551022465, 0.7011348463246787, 0.6608524029887255, 0.25983477069636784, 0.32089895370824295, 0.19033076145659478, 0.8095892843979615, 0.9759169054487475, 0.25587053766267864, 0.5484864265701449, 0.5050613969068954, 0.87108194178955, 0.7134665058484346, 0.5431590908554597, 0.08598294666235629, 0.034126030265245966, 0.27878649516778997, 0.29646251064178586, 0.4572107201345531, 0.518211705741424, 0.12088708409510696, 0.5030399219900685, 0.9312998030305782, 0.6525765035487331, 0.9219901994849555, 0.3318921128281893, 0.8239138896150974, 0.2633683157283965, 0.9329793293493681, 0.7197972116739015, 0.8563713085897978, 0.009857136358348173, 0.684006255258068, 0.2197462441951048, 0.17813855683062907, 0.6585219818888212, 0.9800859972131809, 0.6710364884525326, 0.6048388458532494, 0.4001376972783529, 0.5581675879927963, 0.18893122959434439, 0.8586797850884813, 0.3901987516893547, 0.48022124896663, 0.32016701066545017, 0.7536710586333473, 0.1808736452256562, 0.6204999486093425, 0.49266532336805446, 0.8906041189928601, 0.47726982267802787, 0.8681599491486434, 0.18440020630350062, 0.7404179774413728, 0.9479355448581708, 0.3751652647867322, 0.05892482114319042, 0.4721918273503133, 0.787076328425683, 0.9567513020359438, 0.562623119975451, 0.6338582262318234, 0.5190796216246442, 0.8655110127555093, 0.49697780158133287, 0.44417043789149624, 0.9181773359952201, 0.057765932179698054, 0.7107537077794506, 0.21908554318569418, 0.5925939009584585, 0.41908922164908835, 0.24010027048235782, 0.3839720143111187, 0.4491487202921055, 0.025010492652711158, 0.1742353370692049, 0.8188843028177933, 0.24638327186888098, 0.44521347821044965, 0.4556094094695422, 0.4419579989097051, 0.6796525066735303, 0.1669234760870586, 0.7121955968771096, 0.5259970585169582, 0.3462188275499346, 0.9155025111715138, 0.22965393506410292, 0.11179281075730496, 0.8845708746182613, 0.6798073935162395, 0.21515724153438776]