Gaussian Process Summer School, 13th September 2017. Written by Javier Gonzalez
The goal of this lab session is to illustrate the concepts seen during the tutorial in Gaussian processes for Global optimization. We will focus on two aspects of Bayesian Optimization (BO): (1) the choice of the model (2) the choice of the acquisition function.
The technical material associated to the methods used in this lab can be found in the slides of the tutorial.
In addition to GPy, this lab uses GPyOpt (http://sheffieldml.github.io/GPy/). Please be sure that it is correctly installed before starting. The easiest way is using pip. In Ubuntu machines you can do:
pip install gpyopt
Some of the options of GPyOpt depend on other external packages: DIRECT, cma, pyDOE. Please be sure that this are installed if you want to use all the options. With everything installed, you are ready to start.
Now, just as in the previous lab, specify to include plots in the notebook and to import relevant libraries.
In [ ]:
%pylab inline
import GPy
import GPyOpt
import numpy as np
from numpy.random import seed
seed(12345)
Before starting with the lab, remember that (BO) is an heuristic for global optimization of black-box functions. Let $f: {\mathcal X} \to R$ be a 'well behaved' continuous function defined on a compact subset ${\mathcal X} \subseteq R^d$. Our goal is to solve the global optimization problem of finding $$ x_{M} = \arg \min_{x \in {\mathcal X}} f(x). $$
We assume that $f$ is a black-box from which only perturbed evaluations of the type $y_i = f(x_i) + \epsilon_i$, with $\epsilon_i \sim\mathcal{N}(0,\sigma^2)$, are available. The goal is to find $x_M$ by minimizing the number of evaluations of $f$. To do this, we need to determine two crucial bits:
A Gaussian process that will capture the our beliefs on $f$.
An acquisition function that based on the model will be useful to determine where to collect new evaluations of f.
Remember that every time a new data point is collected the model is updated and the acquisition function optimized again.
We start with a one-dimensional example. Consider here the Forrester function
$$f(x) =(6x-2)^2 \sin(12x-4),$$defined on the interval $[0, 1]$.
The minimum of this function is located at $x_{min}=0.78$. We assume that the evaluations of $f$ to are perturbed by zero-mean Gaussian noise with standard deviation 0.1. The Forrester function is part of the benchmark of functions of GPyOpt. To create the true function, the perturbed version and the boundaries of the problem you need to run the following cell.
In [ ]:
f_true = GPyOpt.objective_examples.experiments1d.forrester() # true function object
f_sim = GPyOpt.objective_examples.experiments1d.forrester(sd=.1) # noisy version
bounds = [{'name': 'var_1', 'type': 'continuous', 'domain': (0,1)}] # problem constraints
f_objective = f_sim.f # objective function
To plot the true $f$, simply write:
In [ ]:
f_true.plot()
f_objective contains the function that we are going to optimize. You can define your own objective but it should be able to map any numpy array of dimension $n\times d$ (inputs) to a numpy array of dimension $n\times 1$ (outputs). For instance:
In [ ]:
n = 8
x = np.random.rand(n).reshape(n,1)
x
In [ ]:
f_objective(x)
The bounds of the problem shoould be defined as a tuple containing the upper and lower limits of the box in which the optimization will be performed. In our example:
In [ ]:
bounds
To use BO to solve this problem, we need to create a GPyOpt object in which we need to specify the following elements:
And now we have all the elements to start optimizing $f$. We create the optimization problem instance. Note that you don't need to specify the evaluation budget of. This is because at this stage we are not running the optimization, we are just initializing the different elements of the BO algorithm.
In [ ]:
# Creation of the object that we will use to run BO.
seed(123)
myBopt = GPyOpt.methods.BayesianOptimization(f = f_objective, # function to optimize
domain = bounds, # box-constraints of the problem
acquisition_type ='EI') # acquisition = Expected improvement
In [ ]:
GPyOpt.methods.BayesianOptimization?
At this point you can access to a number of elements in myBopt, including the GP model and the current dataset (initialized at 5 random locations by default).
In [ ]:
myBopt.X
In [ ]:
myBopt.Y
In [ ]:
myBopt.model.model
In case of having a problem where the evaluations of $f$ are exact you only need to include 'exact_feval=True' when creating the BO object as above. Now, to run the optimization for certain number of iterations you only need to write:
In [ ]:
# Run the optimization (may take a few senconds)
max_iter = 5 # evaluation budget
myBopt.run_optimization(max_iter) # run optimization
And that's it! You should have receive a message describing if the method converged (two equal x's are selected in consecutive steps of the optimization) or if the maximum number of iterations was reached. In one dimensional examples, you can visualize the model and the acquisition function (normalized between 0 and 1) as follows.
In [ ]:
myBopt.plot_acquisition()
You can only make the previous plot if the dimension of the problem is 1 or 2. However, you can always how the optimization evolved by running:
In [ ]:
myBopt.plot_convergence()
The first plot shows the distance between the last two collected observations at each iteration. This plot is useful to evaluate the convergence of the method. The second plot shows the best found value at each iteration. It is useful to compare different methods. The fastest the curve decreases the best is the method. See how the method converged after 10 iterations. You can also print the updated GP
In [ ]:
myBopt.model.model
Use Bayesian optimization to find the minimum of the function $f(x)= x^2 + 10 \sin(x)$ in the interval [-10, 10].
1.1 Define the bounds of the problem, the function and check that it admits a numpy array of observations as input.
In [ ]:
# Answer to 1.1 here:
1.2 Create a GPyOpt object for global optimization using a Mattern52 kernel and adding a jitter of $0.01$ to the expected improvement acquisition (Hint: when creating the object use the option jitter = 0.01). Also remember to use 'exact_feval=True' to set the to zero the Gaussian noise of the model.
In [ ]:
# Answer to 1.2 here:
1.3 run the optimization for 10 iterations. Make and comment the convergence plots. Has the method converged?
In [ ]:
# Answer to 1.3 here:
In this section we are going to have a look to different acquisition functions. In GPyOpt you can use the expected improvement ('EI') the maximum probability of improvement ('MPI') and the lower confidence bound. When using GPyOpt you can simply specify the acquisition that you want at the moment of creating the BO object. However, you can also load these acquisitions as separate objects.
In [ ]:
from GPyOpt.acquisitions import AcquisitionEI, AcquisitionLCB, AcquisitionMPI
To access these acquisitions 'externally' we create a GP model using the objective function in Section 1 evaluated in 10 locations.
In [ ]:
seed(1234)
n = 10
X = np.random.rand(n).reshape(n,1)
Y = f_objective(X)
m = GPy.models.GPRegression(X,Y)
m.optimize_restarts(10)
m.plot([0,1])
Now we create thee objects, one for each acquisition. The jitter parameter to balance exploration and exploitation the model and the acquisition optimizers need to be specified. To work with the acquisition separately we need to create instances of these objects separately.
In [ ]:
# GPyOpt model
model = GPyOpt.models.GPModel(optimize_restarts=5,verbose=False)
model.model = m
# GPyOpt design space
space = GPyOpt.Design_space(bounds)
# GPyOpt acquisition optimizer
aquisition_optimizer = GPyOpt.optimization.AcquisitionOptimizer(space)
Now we are ready to create the acquisitions.
In [ ]:
acq_EI = AcquisitionEI(model,space,aquisition_optimizer,jitter=0)
acq_LCB = AcquisitionLCB(model,space,aquisition_optimizer,exploration_weight=2)
acq_MPI = AcquisitionMPI(model,space,aquisition_optimizer,jitter=0)
The objects acq_EI, acq_LCB, acq_MPI contain the acquisition functions and their gradients. By running the following piece of code you can visualize the three acquisitions. Note that we add a negative sign before each acquisition. This is because within GPyOpt these functions are minimized (instead of maximized) using gradient optimizers (like BFGS) to select new locations. In this plot, however, the larger is the value of the acquisition, the better is the point.
In [ ]:
# Plot the three acquisition functions (factor 0.1 added in in the LCB for visualization)
X_grid = np.linspace(0,1,200)[:, None]
plt.figure(figsize=(10,6))
plt.title('Acquisition functions',size=25)
plt.plot(X_grid, - 0.1*acq_LCB.acquisition_function(X_grid),label='LCB')
plt.plot(X_grid, -acq_EI.acquisition_function(X_grid),label='EI')
plt.plot(X_grid, -acq_MPI.acquisition_function(X_grid),label='MPI')
plt.xlabel('x',size=15)
plt.ylabel('a(x)',size=15)
legend()
2.1 According to the previous plot, what areas in the domain are worth expoloring and why? How can we interpret the previous plot in terms of the exploration/exploitation trade off of each one of the three acquisitions?
In [ ]:
# Answer to 2.1 here:
2.2 Now make a plot comparing the shape of the LCB acquisition (of GP-UCB in the literature) with values different values of parameters. Use the values $[0,0.1,0.25,0.5,1,2,5]$. How does the decision about where to collect the sample change when we increase the value of the parameter?
In [ ]:
# Answer to 2.2 here:
Consider the sixhumpcamel function defined as $$f(x_1,x_2) = \left(4-2.1x_1^2 + \frac{x_1^4}{3} \right)x_1^2 + x_1x_2 + (-4 +4x_2^2)x_2^2,$$
in $[-2,2]\times [-1,1]$. This function has two global minima, at $(0.0898,-0.7126)$ and $(-0.0898,0.7126)$. This function is also implemented in GPyOpt so, to load and visualize it simply run.
In [ ]:
GPyOpt.objective_examples.experiments2d.sixhumpcamel().plot()
f_shc = GPyOpt.objective_examples.experiments2d.sixhumpcamel(sd = 0.1) # simulated version with some noise
f_shc_objective = f_shc.f
bounds_shc =[{'name': 'var_1', 'type': 'continuous', 'domain': f_shc.bounds[0]},
{'name': 'var_2', 'type': 'continuous', 'domain': f_shc.bounds[1]}]
3.1 Create three objects to optimize this function using the the 'EI' (with parameter equal to zero), the LCB (with parameter equal to 2) and the MPI (with parameter equal to zero). Use the same initial data in the three cases (Hint: use the options 'X' and 'Y' when creating the BO object).
In [ ]:
# Answer to 3.1
3.2 In the three cases run the optimization for 30 iterations
In [ ]:
# Answer to 3.2 here:
3.3 Now make a plot comparing the three methods. The x axis should contain the number of iterations and y axis the best found value (Hint: use .Y_best to extract from the BO objects the best current value at each iteration). Which acquisition is has the best performance in this example?
In [ ]:
# Answer to 3.3 here:
3.4 Compare the models and the acquisition functions in the three cases (after the 30 iterations). What do you observe?
In [ ]:
# Answer to 3.4 here: