Searching through high dimensional hyperparameter spaces to find the most performant model can get unwieldy very fast. Hyperparameter sweeps provide an organized and efficient way to conduct a battle royale of models and pick the most accurate model. They enable this by automatically searching through combinations of hyperparameter values (e.g. learning rate, batch size, number of hidden layers, optimizer type) to find the most optimal values.
In this tutorial we'll see how you can run sophisticated hyperparameter sweeps in 3 easy steps using Weights and Biases.
Running a hyperparameter sweep with Weights & Biases is very easy. There are just 3 simple steps:
Define the sweep: we do this by creating a dictionary or a YAML file that specifies the parameters to search through, the search strategy, the optimization metric et all.
Initialize the sweep: with one line of code we initialize the sweep and pass in the dictionary of sweep configurations:
sweep_id = wandb.sweep(sweep_config)
Run the sweep agent: also accomplished with one line of code, we call wandb.agent() and pass the sweep_id to run, along with a function that defines your model architecture and trains it:
wandb.agent(sweep_id, function=train)
And voila! That's all there is to running a hyperparameter sweep! In the notebook below, we'll walk through these 3 steps in more detail.
We highly encourage you to fork this notebook, tweak the parameters, or try the model with your own dataset!
In [0]:
# WandB – Install the W&B library
%pip install wandb -q
import wandb
from wandb.keras import WandbCallback
In [0]:
!pip install wandb -qq
from keras.datasets import fashion_mnist
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dropout, Dense, Flatten
from keras.utils import np_utils
from keras.optimizers import SGD
from keras.optimizers import RMSprop, SGD, Adam, Nadam
from keras.callbacks import ReduceLROnPlateau, ModelCheckpoint, Callback, EarlyStopping
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import tensorflow as tf
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
import wandb
from wandb.keras import WandbCallback
(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()
labels=["T-shirt/top","Trouser","Pullover","Dress","Coat",
"Sandal","Shirt","Sneaker","Bag","Ankle boot"]
img_width=28
img_height=28
X_train = X_train.astype('float32') / 255.
X_test = X_test.astype('float32') / 255.
# reshape input data
X_train = X_train.reshape(X_train.shape[0], img_width, img_height, 1)
X_test = X_test.reshape(X_test.shape[0], img_width, img_height, 1)
# one hot encode outputs
y_train = np_utils.to_categorical(y_train)
y_test = np_utils.to_categorical(y_test)
num_classes = y_test.shape[1]
Weights & Biases sweeps give you powerful levers to configure your sweeps exactly how you want them, with just a few lines of code. The sweeps config can be defined as a dictionary or a YAML file.
Let's walk through some of them together:
name
(this metric should be logged by your training script) and a goal
(maximize or minimize). You can find a list of all configuration options here.
In [0]:
# Configure the sweep – specify the parameters to search through, the search strategy, the optimization metric et all.
sweep_config = {
'method': 'random', #grid, random
'metric': {
'name': 'accuracy',
'goal': 'maximize'
},
'parameters': {
'epochs': {
'values': [2, 5, 10]
},
'batch_size': {
'values': [256, 128, 64, 32]
},
'dropout': {
'values': [0.3, 0.4, 0.5]
},
'conv_layer_size': {
'values': [16, 32, 64]
},
'weight_decay': {
'values': [0.0005, 0.005, 0.05]
},
'learning_rate': {
'values': [1e-2, 1e-3, 1e-4, 3e-4, 3e-5, 1e-5]
},
'optimizer': {
'values': ['adam', 'nadam', 'sgd', 'rmsprop']
},
'activation': {
'values': ['relu', 'elu', 'selu', 'softmax']
}
}
}
In [0]:
# Initialize a new sweep
# Arguments:
# – sweep_config: the sweep config dictionary defined above
# – entity: Set the username for the sweep
# – project: Set the project name for the sweep
sweep_id = wandb.sweep(sweep_config, entity="sweep", project="sweeps-tutorial")
Before we can run the sweep, let's define a function that creates and trains our neural network.
In the function below, we define a simplified version of a VGG19 model in Keras, and add the following lines of code to log models metrics, visualize performance and output and track our experiments easily:
In [0]:
# The sweep calls this function with each set of hyperparameters
def train():
# Default values for hyper-parameters we're going to sweep over
config_defaults = {
'epochs': 5,
'batch_size': 128,
'weight_decay': 0.0005,
'learning_rate': 1e-3,
'activation': 'relu',
'optimizer': 'nadam',
'hidden_layer_size': 128,
'conv_layer_size': 16,
'dropout': 0.5,
'momentum': 0.9,
'seed': 42
}
# Initialize a new wandb run
wandb.init(config=config_defaults)
# Config is a variable that holds and saves hyperparameters and inputs
config = wandb.config
# Define the model architecture - This is a simplified version of the VGG19 architecture
model = Sequential()
# Set of Conv2D, Conv2D, MaxPooling2D layers with 32 and 64 filters
model.add(Conv2D(filters = config.conv_layer_size, kernel_size = (3, 3), padding = 'same',
activation ='relu', input_shape=(img_width, img_height,1)))
model.add(Dropout(config.dropout))
model.add(Conv2D(filters = config.conv_layer_size, kernel_size = (3, 3),
padding = 'same', activation ='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(config.hidden_layer_size, activation ='relu'))
model.add(Dense(num_classes, activation = "softmax"))
# Define the optimizer
if config.optimizer=='sgd':
optimizer = SGD(lr=config.learning_rate, decay=1e-5, momentum=config.momentum, nesterov=True)
elif config.optimizer=='rmsprop':
optimizer = RMSprop(lr=config.learning_rate, decay=1e-5)
elif config.optimizer=='adam':
optimizer = Adam(lr=config.learning_rate, beta_1=0.9, beta_2=0.999, clipnorm=1.0)
elif config.optimizer=='nadam':
optimizer = Nadam(lr=config.learning_rate, beta_1=0.9, beta_2=0.999, clipnorm=1.0)
model.compile(loss = "categorical_crossentropy", optimizer = optimizer, metrics=['accuracy'])
model.fit(X_train, y_train, batch_size=config.batch_size,
epochs=config.epochs,
validation_data=(X_test, y_test),
callbacks=[WandbCallback(data_type="image", validation_data=(X_test, y_test), labels=labels),
EarlyStopping(patience=10, restore_best_weights=True)])
In [0]:
# Initialize a new sweep
# Arguments:
# – sweep_id: the sweep_id to run - this was returned above by wandb.sweep()
# – function: function that defines your model architecture and trains it
wandb.agent(sweep_id, train)
This plot maps hyperparameter values to model metrics. It’s useful for honing in on combinations of hyperparameters that led to the best model performance.
The hyperparameter importance plot surfaces which hyperparameters were the best predictors of, and highly correlated to desirable values for your metrics.
These visualizations can help you save both time and resources running expensive hyperparameter optimizations by honing in on the parameters (and value ranges) that are the most important, and thereby worthy of further exploration.
We created a simple training script and a few flavors of sweep configs for you to play with. We highly encourage you to give these a try. This repo also has examples to help you try more advanced sweep features like Bayesian Hyperband, and Hyperopt.