Hyper-parameter optimization comes quite handy in deep learning. There are different architectures to try out, different parameters, ... On the other hand, we only have a limited amount of time to inspect the results of each modeling choice and decide on the next set of hyper-parmaeters to try out.
By hyper-parameters I refer to any parameter that are not optimized by the learning algorithm (i.e. gradient descent). Some examples are: learning rate of gradient descent, batch size, number of hidden layers, whether to use skip-connections, particular way of pre-processing the data (say data augmentation amount), etc etc.
In genomics, being able to explore a vast amount of modeling choices automatically is important. Only a handful of papers (if any) have been written for your given task at hand, hence the space of different modeling choices has not been as extensively explored as for say image classification on ImageNet.
I will guide you through the typical process of hyper-parameter optimization, using the example from PWM_initialization.ipynb.
Hyper-parameter optimization in CONCISE assumes you have defined two functions: data(...)
and model(...)
.
First, the define data(...)
: a function returing the whole trainig, validation and test dataset.
data() - returns a tuple: (train, test) or (train, valid, test), where:
X_*
is a numpy array, list of numpy arrays or a dictionary of numpy array
y_*
is the response variableX_train
and y_train
are directly feed to the .fit method of a Keras model: model.fit(x=X_train, y = y_train, ...)
In [1]:
from concise.preprocessing import encodeDNA
import pandas as pd
import numpy as np
In [2]:
def data(seq_length=101):
def load(split="train"):
dt = pd.read_csv("../data/RBP/PUM2_{0}.csv".format(split))
# DNA/RNA sequence
xseq = encodeDNA(dt.seq, maxlen=seq_length, seq_align='center')
# response variable
y = dt.binding_site.as_matrix().reshape((-1, 1)).astype("float")
if split=="train":
from concise.data import attract
# add also the pwm_list
pwm_list = attract.get_pwm_list(["129"])
return {"seq": xseq}, y, pwm_list
else:
return {"seq": xseq}, y
return load("train"), load("valid"), load("test")
In [3]:
train, valid, test = data()
model()
returns a compiled Keras model. The only restriction on the parameters is that the function needs a train_data
parameter. This allows you to extract the shape of your dataset.
In [101]:
import concise.layers as cl
import keras.layers as kl
import concise.initializers as ci
import concise.regularizers as cr
from keras.callbacks import EarlyStopping
from keras.models import Model, load_model
from keras.optimizers import Adam
def model(train_data, filters=1, kernel_size=9, motif_init=None, lr=0.001):
seq_length = train_data[0]["seq"].shape[1]
pwm_list = train_data[2]
if motif_init is not None:
# Motif init is a dictionary with fields: "stddev"
kinit = ci.PSSMKernelInitializer(pwm_list,
stddev=motif_init.get("stddev", 0.05), # if not specified, use 0.05
add_noise_before_Pwm2Pssm=True)
binit = "zeros"
else:
kinit = "glorot_uniform"
binit = "zeros"
# sequence
in_dna = cl.InputDNA(seq_length=seq_length, name="seq")
x = cl.ConvDNA(filters=filters,
kernel_size=kernel_size,
activation="relu",
kernel_initializer=kinit,
bias_initializer=binit,
name="conv1")(in_dna)
x = kl.AveragePooling1D(pool_size=4)(x)
x = kl.Flatten()(x)
x = kl.Dense(units=1)(x)
m = Model(in_dna, x)
m.compile(Adam(lr=lr), loss="binary_crossentropy", metrics=["acc"])
return m
Current hyper-parameters are the arguments of data
and model
functions:
Let's use a hyperopt pyll graph to define the hyper-parameter search space:
In [18]:
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
hyper_params = {
"data": {
"seq_length": hp.choice("d_seq_length", (101, 51, 21))
},
"model": {
"filters": hp.choice("m_filters", (1, 16)),
"kernel_size": 15,
"motif_init": hp.choice("m_motif_init", (
None,
{"stddev": hp.uniform("m_stddev", 0, 0.2)}
)),
"lr": hp.loguniform("m_lr", np.log(1e-4), np.log(1e-2)) # 0.0001 - 0.01
},
"fit": {
"epochs": 1,
"patience": 5,
"batch_size": 128,
}
}
We are saying that
seq_length
in the data function should be 21, 51 or 101[1e-4, 1e-2]
, sampled on a log-scaleHyperopt also allows parameter nesting. This is demonstrated with the ["model"]["motif_init"]
parameter: we want to try a model with and without motif initialization...and if we decide on motif initialization, we want to explore different options for the "sttdev" parameter.
Read the hyperopt
documentation about defining the search space for more information.
Note that in addition to the data()
and model()
function arguments, we can also specify the hyper-parameters related to the training procedure (say batch_size or early-stopping patience).
In [13]:
from concise.hyopt import CompileFN, CMongoTrials, test_fn
To run hyperopt, you need to define an objective function to minimize and the space of parameters you want to explore. Here is an example from the hyperopt tutorial:
In [16]:
import pickle
import time
from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
logger = logging.getLogger()
logger.setLevel(logging.WARN)
def objective(x):
return {'loss': x ** 2, 'status': STATUS_OK }
import logging
best = fmin(objective,
space=hp.uniform('x', -10, 10),
algo=tpe.suggest,
max_evals=100)
print(best)
For supervised learning, the objective function is the evaluation metric on the validation dataset. Concise provides a class to compile the objective function given the data() and the model() function:
In [32]:
import hyopt_example # module with data.py and model.py
# We are using the data() and model() functions defined in a file
# The reason is that the serialization doens't work with functions
# defined in the notebook
objective = CompileFN(db_name="mydb", exp_name="motif_initialization", # experiment name
data_fn=hyopt_example.data.data,
model_fn=hyopt_example.model.model,
add_eval_metrics=["auprc", "auc"], # metrics from concise.eval_metrics, you can also use your own
optim_metric="auprc", # which metric to optimize for
optim_metric_mode="max", # maximum should be searched for
valid_split=None, # use valid from the data function
save_model='best', # checkpoint the best model
save_results=True, # save the results as .json (in addition to mongoDB)
save_dir="./saved_models/") # place to store the models
Now, we can use hyper-opt do search the hyper-parameters:
Before starting many workers, its good to try running the objective function once. It's much easier to debug it that way.
In [31]:
# Test if the objective function works on a subset of data for one random hyper-parameter set
test_fn(objective, hyper_params)
Trials() from hyperopt will run the jobs sequentially on a single machine
In [ ]:
cd hyopt_example
In [28]:
trials = Trials("hyopt_example")
best = fmin(objective,
space=hyper_params,
algo=tpe.suggest,
trials=trials,
max_evals=3)
MongoTrials will defer the work to multiple workers running in parallel. All the results will get stored to the MongoDB database.
In [30]:
# just replaced Trials with CMongoTrials
trials = CMongoTrials(db_name="hyopt_example", exp_name="motif_initialization")
best = fmin(objective,
space=hyper_params,
algo=tpe.suggest,
trials=trials,
max_evals=300)
This script waits for workers to complete the work. Start the workers by running the following bash script many times in parallel:
#!/bin/bash
#
# Slurm parameters
#SBATCH --mem=16G
#SBATCH -c 4
# ------------------
DB=hyopt_example
# path to the repository containing the script
export PYTHONPATH="."
hyperopt-mongo-worker \
--mongo=ouga03:1234/$DB \
--poll-interval=1 \
--reserve-timeout=3600 \
--workdir="."
Run:
$ for i in {1..20}; do sbatch start_worker.bash; done
I typically organize the code in the following files:
See the nbs/hyopt_example
folder.
Hyperopt keeps track of all the trials (parameters + resulting objective loss) in a Monogo database. The object hyperopt.MongoTrials
provides a handle to the databse. Concise extends the class hyperopt.MongoTrials
with some additional methods for easier hyper-parameter inspection.
Since the results are stored in a database, you only need to know the database and the experiment name in order to access the database from any folder:
In [173]:
trials = CMongoTrials(db_name="hyopt_example", exp_name="motif_initialization", ip="ouga03")
In [174]:
# Number of completed trials
len(trials)
Out[174]:
In [175]:
df = trials.as_df().sort_values("eval.auprc")
df.sort_values("eval.auprc", ascending=False).head()
Out[175]:
We can see that the best trial achieved auPRC=0.72 on the validation set.
In [176]:
best_loss = np.maximum.accumulate(df.sort_values("tid")["eval.auprc"])
In [178]:
fig, ax = plt.subplots(nrows=2, figsize=(7,7))
fig.tight_layout()
ax[0].plot(df.sort_values("tid")["eval.auprc"], linestyle="", marker="o")
ax[0].set(ylabel="Validation auPRC")
ax[1].plot(best_loss)
ax[1].set(xlabel="Iteration",ylabel="Best validation auPRC");
We can see that through time, hyperopt finds more and more good hyper-parameter configurations.
In [181]:
cd hyopt_example
In [186]:
best_tid = trials.best_trial_tid()
params = trials.get_param(best_tid).to_dict()
model = trials.load_model(best_tid)
In [188]:
# Best hyper-parameters were:
params
Out[188]:
In [189]:
import data
In [191]:
# Load the train, valid, test
train, valid, test = data.data(**params["data"])
In [192]:
y_pred_test = model.predict(test[0])
In [193]:
import concise.eval_metrics as cem
In [200]:
# Performance on the test-set:
cem.auprc(test[1], y_pred_test)
Out[200]:
In [250]:
train_hist = trials.train_history(best_tid)
In [251]:
train_hist.head()
Out[251]:
In [261]:
plt.plot(train_hist.epoch + 1, train_hist.loss, label="Training")
plt.plot(train_hist.epoch + 1, train_hist.val_loss, label="Validation")
plt.xlabel("Epoch")
plt.ylabel("Cross-entropy loss")
plt.legend();
We can see that the optimal number of epochs is 10.
To draw conclusions and guide future design, its useful to inspect which hyper-parameters lead to good results. Typically, this means to plot the loss function against the hyper-parameter we are interested in. Since we already have a tidy table with the hyper-parameter experiment results, this is pretty straighforward.
In [218]:
plt.scatter(df["param.model.lr"], df["eval.auprc"])
plt.xlim([1e-4, 1e-2])
plt.xscale("log")
plt.xlabel("Learning rate")
plt.ylabel("Validation auPR");
In [219]:
import seaborn as sns
In [267]:
df["motif_init_used"] = ~df["param.model.motif_init.stddev"].isnull()
sns.swarmplot("motif_init_used", "eval.auprc", data=df)
plt.ylabel("validation auPR");
Motif initialization helped to achieve better performance. We can also see that the hyper-parameter optimization algorithm has drawn more samples using it, since it figured out it yields better performance.
In [245]:
plt.scatter(df["param.model.motif_init.stddev"], df["eval.auprc"])
plt.ylabel("validation auPR");
plt.xlabel("Added random noise to the motifs (stddev)");
In [247]:
sns.swarmplot("param.model.filters", "eval.auprc", data=df)
plt.ylabel("validation auPR")
plt.xlabel("Number of filters");
In [249]:
sns.swarmplot("param.data.seq_length", "eval.auprc", data=df)
plt.ylabel("validation auPR")
plt.xlabel("Input sequence length");