=====================================================================================================

**What is GPyOpt?****How to install GPyopt?****How to use GPyOpt?**- Gaussian Processes
- Acquisition functions
- Applications of Bayesian Optimization

**The Basics of Bayesian Optimization****1D optimization example****2D optimization example**

=====================================================================================================

GPyOpt (http://sheffieldml.github.io/GPy/) is a tool for optimization (minimization) of black-box functions using Gaussian processes. It is based on GPy (http://sheffieldml.github.io/GPy/) and it has been implemented in Python 2.7 (https://www.python.org/download/releases/2.7/) by the group of Machine Learning (at SITraN) of the university of Sheffield (http://ml.dcs.shef.ac.uk/sitran/).

These are the main features of GPyOPt:

- You can perform Bayesian Optimization with
**most common acquisition functions**, such as Expected improvement (EI), Maximum probability of improvement (MPI) or the Upper confidence bound (GP-UCB).

**Any model available in GPy can used in GPyOpt**as a surrogate of the function to optimize. This includes a wide variety of kernel fuctions (and kernel combinations) and models such as Sparse Gaussian process, usefull to speed up computational costs in large scale scenarios. See GPy https://gpy.readthedocs.org/en/latest/ for details.

- GPyOpt allows you to run
**parallel Bayesian optimization**. This is helpful when you want to perform simultaneously several evaluations of the function that you want to optimize.

- You can use GPyOpt to
**design physical experiments**. If you are, or if you work with, a wetlab person you can use GPyOpt to determine optimal strategies for sequential experimental designs.

- GPyOpt contains a repository of
**test functions for optimization**.

=====================================================================================================

```
In [1]:
```%pylab inline
import GPy
import GPyOpt
from numpy.random import seed
seed(12345)

```
```

GPyOpt is easy to use as a black-box functions optimizer. To start you only need:

- Your favorite function $f$ to minimize. We use $f(x)=2x^2$ in this toy example, whose global minimum is at x=0.

```
In [2]:
```def myf(x):
return (2*x)**2

- A set of box constrains, the interval [-1,1] in our case.

```
In [3]:
```bounds = [(-1,1)]

- A budget, or number of allowed evaluations of $f$.

```
In [4]:
```max_iter = 15

```
In [5]:
```myProblem = GPyOpt.methods.BayesianOptimization(myf,bounds)

```
```

Next you need to run the optimization for the given budget of iterations

```
In [6]:
```myProblem.run_optimization(max_iter)

```
Out[6]:
```

```
In [7]:
```myProblem.x_opt

```
Out[7]:
```

```
In [8]:
```myProblem.fx_opt

```
Out[8]:
```

and to the predicted value value of $f$ at $x^*$ optimum by

```
In [9]:
```myProblem.plot_acquisition()

```
```

And that's it! Keep reading to learn how GPyOpt uses Bayesian Optimization to solve this an other optimization problem. You will also learn all the features and options that you can use to solve your problems efficiently.

Bayesian optimization (BO) is an strategy for global optimization of black-box functions (Snoek et al., 2012). Let $f: {\mathcal X} \to R$ be a L-Lipschitz continuous function defined on a compact subset ${\mathcal X} \subseteq R^d$. We are interested in solvig the global optimization problem of finding $$ x_{M} = \arg \min_{x \in {\mathcal X}} f(x). $$

We assume that $f$ is a *black-box* from which only perturbed evaluations of the type $y_i = f(x_i) + \epsilon_i$, with $\epsilon_i \sim\mathcal{N}(0,\psi^2)$, are available. The goal is to make a series of $x_1,\dots,x_N$ evaluations of $f$ such that the *cumulative regret*
$$r_N= Nf(x_{M})- \sum_{n=1}^N f(x_n),$$
is minimised. Essentially, $r_N$ is minimised if we start evaluating $f$ at $x_{M}$ as soon as possible.

There are two crucial bits in any Bayesian Optimization (BO) procedure approach.

Define a

**prior probability measure**on $f$: this function will capture the our prior beliefs on $f$. The prior will be updated to a 'posterior' using the available data.Define an

**acquisition function**$acqu(x)$: this is a criteria to decide where to sample next in order to gain the maximum information about the location of the global maximum of $f$.

Everytime a new data point is collected. The model is reestimated and the acquisition function optimized again until convergence. Given a prior over the function $f$ and an acquisition function, a BO procedure will converge to the optimum of $f$ under some conditions (Bull, 2011).

A Gaussian process (GP) is a probability distribution across classes functions, typically smooth, such that each linear finite-dimensional restriction is multivariate Gaussian (Rasmussen and Williams, 2006). GPs are fully parametrised by a mean $\mu(x)$ and a covariance function $k(x,x')$. Without loss of generality $\mu(x)$ is assumed to be zero. The covariance function $k(x,x')$ characterizes the smoothness and other properties of $f$. It is known as the kernel of the process and has to be continuous, symmetric and positive definite. A widely used kernel is the square exponential, given by

$$ k(x,x') = l \cdot \exp{ \left(-\frac{\|x-x'\|^2}{2\sigma^2}\right)} $$where $\sigma^2$ and and $l$ are positive parameters.

To denote that $f$ is a sample from a GP with mean $\mu$ and covariance $k$ we write

$$f(x) \sim \mathcal{GP}(\mu(x),k(x,x')).$$For regression taks, the most important feature of GPs is that process priors are conjugate to the likelihood from finitely many observations $y= (y_1,\dots,y_n)^T$ and $X =\{x_1,...,x_n\}$, $x_i\in \mathcal{X}$ of the form $y_i = f(x_i) + \epsilon_i $ where $\epsilon_i \sim \mathcal{N} (0,\sigma^2)$. We obtain the Gaussian posterior posterior $f(x^*)|X, y, \theta \sim \mathcal{N}(\mu(x^*),\sigma^2(x^*))$, where $\mu(x^*)$ and $\sigma^2(x^*)$ have close form. See (Rasmussen and Williams, 2006) for details.

Acquisition functions are designed represents our beliefs over the maximum of $f(x)$. Denote by $\theta$ the parameters of the GP model and by $\{x_i,y_i\}$ the available sample. Three of the most common acquisition functions, all available in GPyOpt are:

**Maximum probability of improvement (MPI)**:

**Expected improvement (EI)**:

**Upper confidence bound (UCB)**:

$\psi$ is a tunable parameters that help to make the acquisition functions more flexible. Also, in the case of the UBC, the parameter $\eta$ is useful to define the balance between the importance we give to the mean and the variance of the model. This is know as the **exploration/exploitation trade off**.

Bayesian Optimization has been applied to solve a wide range of problems. Among many other, some nice applications of Bayesian Optimization include:

Sensor networks (http://www.robots.ox.ac.uk/~parg/pubs/ipsn673-garnett.pdf),

Automatic algorithm configuration (http://www.cs.ubc.ca/labs/beta/Projects/SMAC/papers/11-LION5-SMAC.pdf),

Deep learning (http://www.mlss2014.com/files/defreitas_slides1.pdf),

Gene desing (http://bayesopt.github.io/papers/paper5.pdf),

and a long etc!

In this Youtube video you can see Bayesian Optimization working in a real time in a robotics example. (Calandra1 et al. 2008)

```
In [9]:
```from IPython.display import YouTubeVideo
YouTubeVideo('ualnbKfkc3Q')

```
Out[9]:
```

In this example we show how GPyOpt works in a one-dimensional example a bit more difficult that the one we analyzed in Section 3. Let's consider here the Forrester function

$$f(x) =(6x-2)^2 \sin(12x-4)$$defined on the interval $[0, 1]$.

The minimum of this function is located at $x_{min}=0.78$. In this case we assume that the evaluations of $f$ to are perturbed by zero-mean Gaussian noise with standar deviation 0.25. The Forrester function is part of the benchmark of functions of GPyOpt. To create the true function, the perturbed version and boundaries of the problem you need to run the following cell.

```
In [8]:
```# Create the true and perturbed Forrester function and the boundaries of the problem
f_true = GPyOpt.fmodels.experiments1d.forrester() # true function
f_sim = GPyOpt.fmodels.experiments1d.forrester(sd= .25) # noisy version
bounds = [(0,1)] # problem constrains

We plot the true Forrester function.

```
In [9]:
```f_true.plot(bounds)

```
```

```
In [21]:
```# Creates GPyOpt object with the model and anquisition fucntion
seed(1234)
myBopt = GPyOpt.methods.BayesianOptimization(f=f_sim.f, # function to optimize
bounds=bounds, # box-constrains of the problem
acquisition='EI', # Selects the Expected improvement
acquisition_par = 0) # psi parameter is set to zero

```
```

```
In [22]:
```# Run the optimization
max_iter = 15 # evaluation budget
myBopt.run_optimization(max_iter, # Number of iterations
acqu_optimize_method = 'fast_random', # method to optimize the acq. function
acqu_optimize_restarts = 30, # number of local optimizers
stop_criteria=10e-6) # secondary stop criteria

```
Out[22]:
```

```
In [23]:
```myBopt.plot_acquisition()

```
```

In problems of any dimension three evaluatios plots are available.

The distance between the last two observations.

The value of $f$ at the best location previous to each iteration.

The predicted standad deviation of each new location.

To see these plots just run the following cell.

```
In [24]:
```myBopt.plot_convergence()

```
```

Now let's make a video to track what the algorithm is doing in each iteration.

```
In [29]:
```# starts the optimization
seed(1234)
iterBopt = GPyOpt.methods.BayesianOptimization(f=f_sim.f,
bounds=bounds,
acquisition='EI',
acquisition_par = 0.01)
iterBopt.model.kern.variance.constrain_fixed(1)
iterBopt.model.Gaussian_noise.constrain_bounded(0.01,0.1) # fixed for visualization purposes
from IPython.display import clear_output
N_iter = 20
for i in range(N_iter):
clear_output()
seed(1234)
iterBopt.run_optimization(max_iter=1,
acqu_optimize_method = 'fast_random', # method to optimize the acquisition function
acqu_optimize_restarts = 30, # number of local optimizers
stop_criteria=10e-6) # secondary stop criteria
iterBopt.plot_acquisition('iteration%.03i.png' % (i + 1))

```
```