The structure of GPflowOpt

Joachim van der Herten

In this document, the structure of the GPflowOpt library is explained, including some small examples. First the Domain and Optimizer interfaces are shortly illustrated, followed by a description of the BayesianOptimizer. At the end, a step-by-step walkthrough of the BayesianOptimizer is given.

Optimization

The underlying design principles of GPflowOpt were chosen to address the following task: optimizing problems of the form $$\underset{\boldsymbol{x} \in \mathcal{X}}{\operatorname{argmin}} f(\boldsymbol{x}).$$ The objective function $f: \mathcal{X} \rightarrow \mathbb{R}^p$ maps a candidate optimum to a score (or multiple). Here $\mathcal{X}$ represents the input domain. This domain encloses all candidate solutions to the optimization problem and can be entirely continuous (i.e., a $d$-dimensional hypercube) but may also consist of discrete and categorical parameters.

In GPflowOpt, the Domain and Optimizer interfaces and corresponding subclasses are used to explicitly represent the optimization problem.

Objective function

The objective function itself is provided and must be implemented as any python callable (function or object with implemented call operator), accepting a two dimensional numpy array with shape $(n, d)$ as an input, with $n$ the batch size. It returns a tuple of numpy arrays: the first element of the tuple has shape $(n, p)$ and returns the objective scores for each point to evaluate. The second element is the gradient in every point, shaped either $(n, d)$ if we have a single-objective optimization problem, or $(n, d, p)$ in case of a multi-objective function. If the objective function is passed on to a gradient-free optimization method, the gradient is automatically discarded. GPflowOpt provides decorators which handle batch application of a function along the n points of the input matrix, or dealing with functions which accept each feature as function argument.

Here, we define a simple quadratic objective function:


In [5]:
import numpy as np

def fx(X):
    X = np.atleast_2d(X)
    # Return objective & gradient
    return np.sum(np.square(X), axis=1)[:,None], 2*X

Domain

Then, we represent $\mathcal{X}$ by composing parameters. This is how a simple continuous square domain is defined:


In [6]:
from gpflowopt.domain import ContinuousParameter
domain = ContinuousParameter('x1', -2, 2) + ContinuousParameter('x2', -1, 2)
domain


Out[6]:
NameTypeValues
x1Continuous[-2. 2.]
x2Continuous[-1. 2.]

Optimize

Based on the domain and a valid objective function, we can now easily apply one of the included optimizers to optimize objective functions. GPflowOpt defines an intuitive Optimizer interface which can be used to specify the domain, the initial point(s), constraints (to be implemented) etc. Some popular optimization approaches are provided. Here is how our function is optimized using one of the available methods of SciPy's minimize:


In [7]:
from gpflowopt.optim import SciPyOptimizer

optimizer = SciPyOptimizer(domain, method='SLSQP')
optimizer.set_initial([-1,-1])
optimizer.optimize(fx)


Out[7]:
     fun: 0.0
     jac: array([ 0.,  0.])
 message: 'Optimization terminated successfully.'
    nfev: 3
     nit: 2
    njev: 2
  status: 0
 success: True
       x: array([[ 0.,  0.]])

And here is how we optimize it Monte-Carlo. We can pass the same function as the gradients are automatically discarded.


In [8]:
from gpflowopt.optim import MCOptimizer
optimizer = MCOptimizer(domain, 200)
optimizer.optimize(fx)


Out[8]:
     fun: array([[ 0.02115951]])
 message: 'OK'
    nfev: 200
 success: True
       x: array([[ 0.05731395, -0.13369599]])

Bayesian Optimization

In Bayesian Optimization (BO), the typical assumption is that $f$ is expensive to evaluate and no gradients are available. The typical approach is to sequentially select a limited set of decisions $\boldsymbol{x}_0, \boldsymbol{x}_1, ... \boldsymbol{x}_{n-1}$ using a sampling policy. Hence each decision $\boldsymbol{x}_i \in \mathcal{X}$ itself is the result of an optimization problem $$\boldsymbol{x}_i = \underset{\boldsymbol{x}}{\operatorname{argmax}} \alpha_i(\boldsymbol{x})$$

Each iteration, a function $\alpha_i$ which is cheap-to-evaluate acts as a surrogate for the expensive function. It is typically a mapping of the predictive distribution of a (Bayesian) model built on all decisions and their corresponding (noisy) evaluations. The mapping introduces an order in $\mathcal{X}$ to obtain a certain goal. The typical goal within the context of BO is the search for optimality or feasibility while keeping the amount of required evaluations ($n$) a small number. As we can have several functions $f$ representing objectives and constraints, BO may invoke several models and mappings $\alpha$. These mappings are typically referred to as acquisition functions (or infill criteria). GPflowOpt defines an Acquisition interface to implement these mappings and provides implementations of some default choices. In combination with a special Optimizer implementation for BO, following steps are required for a typical workflow:

1) Define the problem domain. Its dimensionality matches the input to the objective and constraint functions. (like normal optimization)

2) Specify the (GP) models for the constraints and objectives. This involves choice of kernels, priors, fixes, transforms... this step follows the standard way of setting up GPflow models. GPflowOpt does not further wrap models hence it is possible to implement custom models in GPflow and use them directly in GPflowOpt

3) Set up the acquisition function(s) using the available built-in implementations in GPflowOpt, or design your own by implementing the Acquisition interface.

4) Set up an optimizer for the acquisition function.

5) Run the high-level BayesianOptimizer which implements a typical BO flow. BayesianOptimizer in itself is compliant with the Optimizer interface. Exceptionally, the BayesianOptimizer requires that the objective function returns no gradients.

Alternatively, advanced users requiring finer control can easily implement their own flow based on the low-level interfaces of GPflowOpt, as the coupling between these objects was intentionally kept loose.

As illustration of the described flow, the previous example is optimized using Bayesian optimization instead, with the well-known Expected Improvement acquisition function:


In [9]:
from gpflowopt.bo import BayesianOptimizer
from gpflowopt.design import FactorialDesign
from gpflowopt.acquisition import ExpectedImprovement
import gpflow

# The Bayesian Optimizer does not expect gradients to be returned
def fx(X):
    X = np.atleast_2d(X)
    # Return objective & gradient
    return np.sum(np.square(X), axis=1)[:,None]

    
X = FactorialDesign(2, domain).generate()
Y = fx(X)

# initializing a standard BO model, Gaussian Process Regression with
# Matern52 ARD Kernel
model = gpflow.gpr.GPR(X, Y, gpflow.kernels.Matern52(2, ARD=True))
alpha = ExpectedImprovement(model)

# Now we must specify an optimization algorithm to optimize the acquisition 
# function, each iteration. 
acqopt = SciPyOptimizer(domain)

# Now create the Bayesian Optimizer
optimizer = BayesianOptimizer(domain, alpha, optimizer=acqopt)
with optimizer.silent():
    r = optimizer.optimize(fx, n_iter=15)
print(r)


     fun: array([ 0.17684308])
 message: 'OK'
    nfev: 15
 success: True
       x: array([[ 0.        ,  0.42052714]])

This brief snippet code starts from a 2-level grid (corner points of the domain) and uses a GP model to model the response surface of the objective function. The BayesianOptimizer follows the same interface as other optimizers and is initialized with a domain, the acquisition function and an additional optimization method to optimize the acquisition function each iteration. Finally, the optimizer performs 10 iterations to optimize fx.

The code to evaluate the acquisition function on the model is written in TensorFlow, allowing gradient-based optimization without additional effort due to the automated differentation.

Step-by-step description of the BayesianOptimizer

Prior to running BayesianOptimizer.optimize(), the acquisition function is initialized with an underlying model. Any data previously included in the model (through the GPModel.__init__ constructor in GPflow) is used as initial data. When optimize(function, n_iter) is called:

1) Any data points returned by get_initial() are evaluated. Afterwards the evaluated points are added to the models by calling _update_model_data().

2) n_iter iterations are performed. Each iteration the acquisition function is optimized, and the models are updated by calling _update_model_data()

The updating of a model through _update_model_data() calls set_data(X, Y) on the acquisition function. This covers following aspects:

  • GPModel.X and GPModel.Y are updated

  • Each of the contained models are returned to the state when the acquisition function was initialized and optimized. If the optimize_restarts parameter of the Acquisition.__init__() was set to $n>1$, the state of the model is randomized and optimized $n-1$ times. Finally, the state resulting in the best log_likelihood() is the new model state

  • Call Acquisition.setup() to perform any pre-calculation of quantities independent of candidate points, which can be used in build_acquisition().

The GPflow tree

The Acquisition interface, mapping the belief of the model(s) to a score indicating areas of optimality/feasibility, is implemented as part of the GPflow tree structure. More specifically it implements the Parameterized interface permitting the use of the useful AutoFlow decorator. The build_acquisition() method to be implemented by subclasses is a TensorFlow method, allowing automated differentiation of the acquisition function which enables gradient-based optimization thereof (not of the objective!). It may directly access the graph for computing the predictive distribution of a model by calling build_predict().