=====================================================================================================
What is GPyOpt?
How to install GPyopt?
How to use GPyOpt?
The Basics of Bayesian Optimization
1D optimization example
2D optimization example
=====================================================================================================
GPyOpt (http://sheffieldml.github.io/GPy/) is a tool for optimization (minimization) of black-box functions using Gaussian processes. It is based on GPy (http://sheffieldml.github.io/GPy/) and it has been implemented in Python 2.7 (https://www.python.org/download/releases/2.7/) by the group of Machine Learning (at SITraN) of the University of Sheffield (http://ml.dcs.shef.ac.uk/sitran/).
These are the main features of GPyOpt:
=====================================================================================================
In [1]:
%pylab inline
import GPy
import GPyOpt
from numpy.random import seed
seed(12345)
GPyOpt is easy to use as a black-box functions optimizer. To start you only need:
In [2]:
def myf(x):
return (2*x)**2
In [3]:
bounds = [(-1,1)]
In [4]:
max_iter = 15
With this three pieces of information GPyOpt has enough to find the minimum of $f$ in the selected region. GPyOpt solves the problem in two steps. First, you need to create a GPyOpt object that stores the problem (f and and box-constrains). You can do it as follows.
In [5]:
myProblem = GPyOpt.methods.BayesianOptimization(myf,bounds)
Next you need to run the optimization for the given budget of iterations
In [6]:
myProblem.run_optimization(max_iter)
Out[6]:
Now you can check the best found location $x^*$ by
In [7]:
myProblem.x_opt
Out[7]:
and the predicted value value of $f$ at $x^*$ optimum by
In [11]:
myProblem.fx_opt
Out[11]:
And that's it! Keep reading to learn how GPyOpt uses Bayesian Optimization to solve this an other optimization problem. You will also learn all the features and options that you can use to solve your problems efficiently.
=====================================================================================================
Bayesian optimization (BO) is an strategy for global optimization of black-box functions (Snoek et al., 2012). Let $f: {\mathcal X} \to R$ be a L-Lipschitz continuous function defined on a compact subset ${\mathcal X} \subseteq R^d$. We are interested in solving the global optimization problem of finding $$ x_{M} = \arg \min_{x \in {\mathcal X}} f(x). $$
We assume that $f$ is a black-box from which only perturbed evaluations of the type $y_i = f(x_i) + \epsilon_i$, with $\epsilon_i \sim\mathcal{N}(0,\psi^2)$, are available. The goal is to make a series of $x_1,\dots,x_N$ evaluations of $f$ such that the cumulative regret $$r_N= Nf(x_{M})- \sum_{n=1}^N f(x_n),$$ is minimized. Essentially, $r_N$ is minimized if we start evaluating $f$ at $x_{M}$ as soon as possible.
There are two crucial bits in any Bayesian Optimization (BO) procedure approach.
Define a prior probability measure on $f$: this function will capture the our prior beliefs on $f$. The prior will be updated to a 'posterior' using the available data.
Define an acquisition function $acqu(x)$: this is a criteria to decide where to sample next in order to gain the maximum information about the location of the global maximum of $f$.
Every time a new data point is collected. The model is re-estimated and the acquisition function optimized again until convergence. Given a prior over the function $f$ and an acquisition function, a BO procedure will converge to the optimum of $f$ under some conditions (Bull, 2011).
A Gaussian process (GP) is a probability distribution across classes functions, typically smooth, such that each linear finite-dimensional restriction is multivariate Gaussian (Rasmussen and Williams, 2006). GPs are fully parametrized by a mean $\mu(x)$ and a covariance function $k(x,x')$. Without loss of generality $\mu(x)$ is assumed to be zero. The covariance function $k(x,x')$ characterizes the smoothness and other properties of $f$. It is known as the kernel of the process and has to be continuous, symmetric and positive definite. A widely used kernel is the square exponential, given by
$$ k(x,x') = l \cdot \exp{ \left(-\frac{\|x-x'\|^2}{2\sigma^2}\right)} $$where $\sigma^2$ and and $l$ are positive parameters.
To denote that $f$ is a sample from a GP with mean $\mu$ and covariance $k$ we write
$$f(x) \sim \mathcal{GP}(\mu(x),k(x,x')).$$For regression tasks, the most important feature of GPs is that process priors are conjugate to the likelihood from finitely many observations $y= (y_1,\dots,y_n)^T$ and $X =\{x_1,...,x_n\}$, $x_i\in \mathcal{X}$ of the form $y_i = f(x_i) + \epsilon_i $ where $\epsilon_i \sim \mathcal{N} (0,\sigma^2)$. We obtain the Gaussian posterior posterior $f(x^*)|X, y, \theta \sim \mathcal{N}(\mu(x^*),\sigma^2(x^*))$, where $\mu(x^*)$ and $\sigma^2(x^*)$ have close form. See (Rasmussen and Williams, 2006) for details.
Acquisition functions are designed represents our beliefs over the maximum of $f(x)$. Denote by $\theta$ the parameters of the GP model and by $\{x_i,y_i\}$ the available sample. Three of the most common acquisition functions, all available in GPyOpt are:
$\psi$ is a tunable parameters that help to make the acquisition functions more flexible. Also, in the case of the UBC, the parameter $\eta$ is useful to define the balance between the importance we give to the mean and the variance of the model. This is know as the exploration/exploitation trade off.
Bayesian Optimization has been applied to solve a wide range of problems. Among many other, some nice applications of Bayesian Optimization include:
Sensor networks (http://www.robots.ox.ac.uk/~parg/pubs/ipsn673-garnett.pdf),
Automatic algorithm configuration (http://www.cs.ubc.ca/labs/beta/Projects/SMAC/papers/11-LION5-SMAC.pdf),
Deep learning (http://www.mlss2014.com/files/defreitas_slides1.pdf),
Gene desing (http://bayesopt.github.io/papers/paper5.pdf),
and a long etc!
In this Youtube video you can see Bayesian Optimization working in a real time in a robotics example. (Calandra1 et al. 2008)
In [9]:
from IPython.display import YouTubeVideo
YouTubeVideo('ualnbKfkc3Q')
Out[9]:
In this example we show how GPyOpt works in a one-dimensional example a bit more difficult that the one we analyzed in Section 3. Let's consider here the Forrester function
$$f(x) =(6x-2)^2 \sin(12x-4)$$defined on the interval $[0, 1]$.
The minimum of this function is located at $x_{min}=0.78$. In this case we assume that the evaluations of $f$ to are perturbed by zero-mean Gaussian noise with standard deviation 0.25. The Forrester function is part of the benchmark of functions of GPyOpt. To create the true function, the perturbed version and boundaries of the problem you need to run the following cell.
In [12]:
# Create the true and perturbed Forrester function and the boundaries of the problem
f_true = GPyOpt.fmodels.experiments1d.forrester() # true function
f_sim = GPyOpt.fmodels.experiments1d.forrester(sd= .25) # noisy version
bounds = [(0,1)] # problem constrains
We plot the true Forrester function.
In [13]:
f_true.plot(bounds)
As we did in Section 3, we need to create the GPyOpt object that will run the optimization. We specify the function, the boundaries and we add the type of acquisition function to use. The acquisition parameter (acquistion_par in the funcion input) is the $\psi$ parameter in Section 4.2.
In [16]:
# Creates GPyOpt object with the model and anquisition fucntion
seed(1234)
myBopt = GPyOpt.methods.BayesianOptimization(f=f_sim.f, # function to optimize
bounds=bounds, # box-constrains of the problem
acquisition='EI', # Selects the Expected improvement
acquisition_par = 0) # psi parameter is set to zero
Now we want to run the optimization. Apart from the number of iterations you can select how do you want to optimize the acquisition function. You can run a number of local optimizers (acqu_optimize_restart) at random or in grid (acqu_optimize_method).
In [17]:
# Run the optimization
max_iter = 15 # evaluation budget
myBopt.run_optimization(max_iter, # Number of iterations
acqu_optimize_method = 'fast_random', # method to optimize the acq. function
acqu_optimize_restarts = 30, # number of local optimizers
eps=10e-6) # secondary stop criteria
Out[17]:
When the optimization is done you should receive a message describing if the method converged or if the maximum number of iterations was reached. In one dimensional examples, you can see the result of the optimization as follows.
In [18]:
myBopt.plot_acquisition()
In problems of any dimension three evaluations plots are available.
The distance between the last two observations.
The value of $f$ at the best location previous to each iteration.
The predicted standard deviation of each new location.
To see these plots just run the following cell.
In [19]:
myBopt.plot_convergence()
Now let's make a video to track what the algorithm is doing in each iteration. Let's use the LCB in this case with parameter equal to 3.
In [56]:
# starts the optimization
seed(0)
iterBopt = GPyOpt.methods.BayesianOptimization(f=f_sim.f,
model_data_init =2,
bounds=bounds,
acquisition='LCB',
acquisition_par = 3)
iterBopt.model.kern.variance.constrain_fixed(2)
iterBopt.model.Gaussian_noise.constrain_bounded(0.01,0.5) # fixed for visualization purposes
from IPython.display import clear_output
N_iter = 20
for i in range(N_iter):
clear_output()
seed(1234)
iterBopt.run_optimization(max_iter=1,
acqu_optimize_method = 'fast_random', # method to optimize the acquisition function
acqu_optimize_restarts = 30, # number of local optimizers
eps=10e-6) # secondary stop criteria
iterBopt.plot_acquisition('iteration%.03i.png' % (i + 1))
In [57]:
# Import JSAnimation
from JSAnimation import IPython_display, HTMLWriter
from matplotlib import animation
from PIL import Image
import glob
# Choose location of files to use
filenames=sorted(glob.glob('iteration*.png'))
#Create figure
fig = plt.figure(figsize=(10,5))
ax = fig.add_axes([0, 0, 1, 1])
ax.axis('off')
im = plt.imshow(Image.open(filenames[0]))
# Create initializatio function and animation
def init():
im.set_data(Image.open(filenames[0]))
return im,
def animate(i):
image=Image.open(filenames[i])
im.set_data(image)
return im,
nframes = N_iter # we create as many frames as iterations in the BO loop.
anim = animation.FuncAnimation(fig, animate, init_func=init,
frames=nframes, interval=1000, blit=True)
# Show animation in ipython notebook
anim
Out[57]:
Next, we try a 2-dimensional example. In this case we minimize the use the Six-hump camel function
$$f(x_1,x_2) = \left(4-2.1x_1^2 = \frac{x_1^4}{3} \right)x_1^2 + x_1x_2 + (-4 +4x_2^2)x_2^2,$$in $[-3,3]\times [-2,2]$. This functions has two global minimum, at $(0.0898,-0.7126)$ and $(-0.0898,0.7126)$. As in the previous case we create the function, which is already in GPyOpt.
In [59]:
# create the object function
f_true = GPyOpt.fmodels.experiments2d.sixhumpcamel()
f_sim = GPyOpt.fmodels.experiments2d.sixhumpcamel(sd = 0.1)
f_true.plot()
bounds = f_true.bounds
We create the GPyOpt object. In this case we use the Lower Confidence bound acquisition function to solve the problem.
In [63]:
# Creates three identical objects that we will later use to compare the optimization strategies
myBopt2D = GPyOpt.methods.BayesianOptimization(f_sim.f,
bounds=bounds,
acquisition='LCB', # Lower confidence bound method
acquisition_par = 2, # Set parameter psi=2
normalize = True) # Normalize the acquisition funtction
We run the optimization for 40 iterations and show the evaluation plot and the acquisition function.
In [65]:
# runs the optimization for the three methods
max_iter = 40
myBopt2D.run_optimization(max_iter,
acqu_optimize_method = 'fast_brute', # lattice of initial points for local optimizers
acqu_optimize_restarts = 10, # partition per dimension
eps=1e-10) # stop criteria (distance between consecutive observations)
Out[65]:
Finally, we plot the acquisition function and the convergence plot.
In [66]:
myBopt2D.plot_acquisition()
In [67]:
myBopt2D.plot_convergence()
In [ ]: