In [1]:

    
from bayes_opt import BayesianOptimization
import numpy as np

import matplotlib.pyplot as plt
from matplotlib import gridspec
%matplotlib inline

Target Function

Lets create a target 1-D function with multiple local maxima to test and visualize how the BayesianOptimization package works. The target function we will try to maximize is the following:

$$f(x) = e^{-(x - 2)^2} + e^{-\frac{(x - 6)^2}{10}} + \frac{1}{x^2 + 1}, $$

its maximum is at $x = 2$ and we will restrict the interval of interest to $x \in (-2, 10)$.



In [2]:

    
def target(x):
    return np.exp(-(x - 2)**2) + np.exp(-(x - 6)**2/10) + 1/ (x**2 + 1)



In [3]:

    
x = np.linspace(-2, 10, 1000)
y = target(x)

plt.plot(x, y)









    Out[3]:





[<matplotlib.lines.Line2D at 0x106f5ba58>]

Create a BayesianOptimization Object

Enter the target function to be maximized, its variable(s) and their corresponding ranges (see this example for a multi-variable case). A minimum number of 2 initial guesses is necessary to kick start the algorithms, these can either be random or user defined.



In [4]:

    
bo = BayesianOptimization(target, {'x': (-2, 10)})

In this example we will use the Upper Confidence Bound (UCB) as our utility function. It has the free parameter $\kappa$ which control the balance between exploration and exploitation; we will set $\kappa=5$ which, in this case, makes the algorithm quite bold. Additionally we will use the cubic correlation in our Gaussian Process.



In [5]:

    
gp_params = {'corr': 'cubic'}
bo.maximize(init_points=2, n_iter=0, acq='ucb', kappa=5, **gp_params)









    



Initializing function at point:  {'x': 6.3765433238557367} | result: 1.009925
Initializing function at point:  {'x': 5.3207302357204425} | result: 0.989042
Optimization finished with maximum: 1.009925, at position: {'x': 6.3765433238557367}.
Time taken: 0 minutes and 0.294316 seconds.

Plotting and visualizing the algorithm at each step

Lets first define a couple functions to make plotting easier



In [6]:

    
def posterior(bo, xmin=-2, xmax=10):
    xmin, xmax = -2, 10
    bo.gp.fit(bo.X, bo.Y)
    mu, sigma2 = bo.gp.predict(np.linspace(xmin, xmax, 1000).reshape(-1, 1), eval_MSE=True)
    return mu, np.sqrt(sigma2)

def plot_gp(bo, x, y):
    
    fig = plt.figure(figsize=(16, 10))
    fig.suptitle('Gaussian Process and Utility Function After {} Steps'.format(len(bo.X)), fontdict={'size':30})
    
    gs = gridspec.GridSpec(2, 1, height_ratios=[3, 1]) 
    axis = plt.subplot(gs[0])
    acq = plt.subplot(gs[1])
    
    mu, sigma = posterior(bo)
    axis.plot(x, y, linewidth=3, label='Target')
    axis.plot(bo.X.flatten(), bo.Y, 'D', markersize=8, label=u'Observations', color='r')
    axis.plot(x, mu, '--', color='k', label='Prediction')

    axis.fill(np.concatenate([x, x[::-1]]), 
              np.concatenate([mu - 1.9600 * sigma, (mu + 1.9600 * sigma)[::-1]]),
        alpha=.6, fc='c', ec='None', label='95% confidence interval')
    
    axis.set_xlim((-2, 10))
    axis.set_ylim((None, None))
    axis.set_ylabel('f(x)', fontdict={'size':20})
    axis.set_xlabel('x', fontdict={'size':20})
    
    utility = bo.util.utility(x.reshape((-1, 1)), bo.gp, 0)
    acq.plot(x, utility, label='Utility Function', color='purple')
    acq.plot(x[np.argmax(utility)], np.max(utility), '*', markersize=15, 
             label=u'Next Best Guess', markerfacecolor='gold', markeredgecolor='k', markeredgewidth=1)
    acq.set_xlim((-2, 10))
    acq.set_ylim((0, np.max(utility) + 0.5))
    acq.set_ylabel('Utility', fontdict={'size':20})
    acq.set_xlabel('x', fontdict={'size':20})
    
    axis.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)
    acq.legend(loc=2, bbox_to_anchor=(1.01, 1), borderaxespad=0.)

Two random points

After we probe two points at random, we can fit a Gaussian Process and start the bayesian optimization procedure. Two points should give us a uneventful posterior with the uncertainty growing as we go further from the observations.



In [7]:

    
plot_gp(bo, x, y)

After one step of GP (and two random points)



In [8]:

    
bo.maximize(init_points=0, n_iter=1, kappa=5)
plot_gp(bo, x, y)









    



Iteration:   1 | Last sampled value:    0.201662 | with parameters:  {'x': -2.0}
               | Current maximum:       1.009925 | with parameters:  {'x': 6.3765433238557367}
               | Time taken: 0 minutes and 0.365958 seconds

Optimization finished with maximum: 1.009925, at position: {'x': 6.3765433238557367}.
Time taken: 0 minutes and 0.687699 seconds.

After two steps of GP (and two random points)



In [9]:

    
bo.maximize(init_points=0, n_iter=1, kappa=5)
plot_gp(bo, x, y)









    



Iteration:   1 | Last sampled value:    0.211798 | with parameters:  {'x': 10.0}
               | Current maximum:       1.009925 | with parameters:  {'x': 6.3765433238557367}
               | Time taken: 0 minutes and 0.442298 seconds

Optimization finished with maximum: 1.009925, at position: {'x': 6.3765433238557367}.
Time taken: 0 minutes and 0.81218 seconds.

After three steps of GP (and two random points)



In [10]:

    
bo.maximize(init_points=0, n_iter=1, kappa=5)
plot_gp(bo, x, y)









    



Iteration:   1 | Last sampled value:    1.372330 | with parameters:  {'x': 1.8136269323017804}
               | Current maximum:       1.372330 | with parameters:  {'x': 1.8136269323017804}
               | Time taken: 0 minutes and 0.668987 seconds

Optimization finished with maximum: 1.372330, at position: {'x': 1.8136269323017804}.
Time taken: 0 minutes and 1.218729 seconds.

After four steps of GP (and two random points)



In [11]:

    
bo.maximize(init_points=0, n_iter=1, kappa=5)
plot_gp(bo, x, y)









    



Iteration:   1 | Last sampled value:    0.974352 | with parameters:  {'x': 0.42849551923455093}
               | Current maximum:       1.372330 | with parameters:  {'x': 1.8136269323017804}
               | Time taken: 0 minutes and 0.73223 seconds

Optimization finished with maximum: 1.372330, at position: {'x': 1.8136269323017804}.
Time taken: 0 minutes and 1.415253 seconds.

After five steps of GP (and two random points)



In [12]:

    
bo.maximize(init_points=0, n_iter=1, kappa=5)
plot_gp(bo, x, y)









    



Iteration:   1 | Last sampled value:    0.794934 | with parameters:  {'x': 3.1671835459440341}
               | Current maximum:       1.372330 | with parameters:  {'x': 1.8136269323017804}
               | Time taken: 0 minutes and 0.895441 seconds

Optimization finished with maximum: 1.372330, at position: {'x': 1.8136269323017804}.
Time taken: 0 minutes and 1.683799 seconds.

After six steps of GP (and two random points)



In [13]:

    
bo.maximize(init_points=0, n_iter=1, kappa=5)
plot_gp(bo, x, y)









    



Iteration:   1 | Last sampled value:    0.641565 | with parameters:  {'x': 8.1614111490227153}
               | Current maximum:       1.372330 | with parameters:  {'x': 1.8136269323017804}
               | Time taken: 0 minutes and 0.966817 seconds

Optimization finished with maximum: 1.372330, at position: {'x': 1.8136269323017804}.
Time taken: 0 minutes and 1.768432 seconds.

After seven steps of GP (and two random points)



In [14]:

    
bo.maximize(init_points=0, n_iter=1, kappa=5)
plot_gp(bo, x, y)









    



Iteration:   1 | Last sampled value:    1.086627 | with parameters:  {'x': 1.2873747306732302}
               | Current maximum:       1.372330 | with parameters:  {'x': 1.8136269323017804}
               | Time taken: 0 minutes and 0.84768 seconds

Optimization finished with maximum: 1.372330, at position: {'x': 1.8136269323017804}.
Time taken: 0 minutes and 1.647441 seconds.

Stopping

After just a few points the algorithm was able to get pretty close to the true maximum. It is important to notice that the trade off between exploration (exploring the parameter space) and exploitation (probing points near the current known maximum) is fundamental to a succesful bayesian optimization procedure. The utility function being used here (Upper Confidence Bound - UCB) has a free parameter $\kappa$ that allows the user to make the algorithm more or less conservative. Additionally, a the larger the initial set of random points explored, the less likely the algorithm is to get stuck in local minima due to being too conservative.