In this tutorial, we will learn the basics of the Bayesian optimization (BO) framework through a step-by-step example in the context of optimizing the hyperparameters of a binary classifier. But first of all, where is Bayesian optimization useful?
There are a lot of case scenarios, one would typically use the BO framework in situations like:
The BO framework uses a surrogate model to approximate the objective function and chooses to optimize it instead according to a chosen criteria. For an in-depth introduction to the topic, we whole-heartedly recommend reading [@Snoek2012, @Jimenez2017].
Let's start by creating some synthetic data that we will use later for classification.
In [1]:
import numpy as np
from sklearn.datasets import make_moons
np.random.seed(20)
X, y = make_moons(n_samples = 200, noise = 0.3) # Data and target
Before going any further, let's visualize it!
In [2]:
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
cm_bright = ListedColormap(['#fc4349', '#6dbcdb'])
fig = plt.figure()
plt.scatter(X[:, 0], X[:, 1], c = y, cmap = cm_bright)
plt.show()
Let's say that we want to use a Support Vector Machine (SVM) with the radial basis function kernel classifier on this data, which has two usual parameters to optimize, $C$ and $\gamma$. We need to first define a target function that takes these two hyperparameters as input and spits out an error (e.g, using some form of cross validation). Define also a dictionary, specifying parameters and input spaces for each.
In [3]:
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_score
def evaluateModel(C, gamma):
clf = SVC(C=10**C, gamma=10**gamma)
return np.average(cross_val_score(clf, X, y))
params = {'C': ('cont', (-4, 5)),
'gamma': ('cont', (-4, 5))
}
Now comes the fun part, where we specify our BO framework using pyGPGO
. We are going to use a Gaussian Process (GP) model to approximate our true objective function, and a covariance function that measures similarity among training examples. An excellent introduction to Gaussian Process regression can be found in [@Rassmussen-Williams2004]. We are going to use the squared exponential kernel for this example, that takes the form:
,
where $r = |x - x'|$ two examples.
In [4]:
from pyGPGO.surrogates.GaussianProcess import GaussianProcess
from pyGPGO.covfunc import squaredExponential
sexp = squaredExponential()
gp = GaussianProcess(sexp)
We specify now an acquisition function, that will determine the behaviour of the BO procedure when selecting a new point. For instance, it is very common to use the Expected Improvement (EI) acquisition, that will both take into account the probability of improvement of a point and its magnitude:
In [5]:
from pyGPGO.acquisition import Acquisition
acq = Acquisition(mode = 'ExpectedImprovement')
We're almost done! Finally call the GPGO
class and put everything together. We'll run the procedure for 20 epochs.
In [6]:
from pyGPGO.GPGO import GPGO
gpgo = GPGO(gp, acq, evaluateModel, params)
gpgo.run(max_iter = 20)
Finally retrieve your result!
In [7]:
gpgo.getResult()
Out[7]:
[@Rasmussen-Williams2004]: Rasmussen, C. E., & Williams, C. K. I. (2004). Gaussian processes for machine learning. International journal of neural systems (Vol. 14). http://doi.org/10.1142/S0129065704001899
[@Snoek2012]: Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian Optimization of Machine Learning Algorithms. Adv. Neural Inf. Process. Syst. 25, 1–9. http://doi.org/2012arXiv1206.2944S
[@Jiménez2017]: Jiménez J., Ginebra, J. (2017). Bayesian optimization in machine-learning. http://hdl.handle.net/2117/105999