GPflowOpt Documentation

Introduction

GPflowOpt is a library for Bayesian Optimization with GPflow. It makes use of TensorFlow for computation of acquisition functions, to offer scalability, and avoid implementation of gradients. The package was created, and is currently maintained by Joachim van der Herten and Ivo Couckuyt

The project is open source: if you feel you have some relevant skills and are interested in contributing then please contact us on GitHub by opening an issue or pull request.

Install

  1. Install package

A straightforward way to install GPflowOpt is to clone its repository and run

pip install . --process-dependency-links

in the root folder. This also installs required dependencies including TensorFlow. For alternative TensorFlow installations (e.g., gpu), please see the instructions on the main TensorFlow webpage.

  1. Development

GPflowOpt is a pure python library so you could just add it to your python path. We use

pip  install -e . --process-dependency-links

  1. Testing

For testing, GPflowOpt uses nox to automatically create a virtualenv and install the additional test dependencies. To install nox:

pip install nox-automation

followed by

nox

to run all test sessions.

  1. Documentation

To build the documentation, first install the extra dependencies with pip install -e .[docs]. Then proceed with python setup.py build_sphinx.

Getting started

A simple example of Bayesian optimization to get up and running is provided by the first steps into Bayesian optimization notebook

For more advanced use cases have a look at the other tutorial notebooks and the API and architecture.

Citing GPflowOpt

To cite GPflowOpt, please reference the preliminary arXiv paper. Sample Bibtex is given below:

@ARTICLE{GPflowOpt2017,
author = {Knudde, Nicolas and {van der Herten}, Joachim and Dhaene, Tom and Couckuyt, Ivo},
title = “{{GP}flow: A {G}aussian process library using {T}ensor{F}low}”,
journal = {arXiv preprint – arXiv:1711.03845},
year = {2017},
}

Acknowledgements

Joachim van der Herten and Ivo Couckuyt are Ghent University - imec postdoctoral fellows. Ivo Couckuyt is supported by FWO Vlaanderen.

Tutorials and examples

First steps into Bayesian optimization

Ivo Couckuyt, Joachim van der Herten

Introduction

Bayesian optimization is particularly useful for expensive optimization problems. This includes optimization problems where the objective (and constraints) are time-consuming to evaluate: measurements, engineering simulations, hyperparameter optimization of deep learning models, etc. Another area where Bayesian optimization may provide a benefit is in the presence of (a lot of) noise. If your problem does not satisfy these requirements other optimization algorithms might be better suited.

To setup a Bayesian optimization scheme with GPflowOpt you have to:

  • define your objective and specify the optimization domain
  • setup a GPflow model and choose an acquisition function
  • create a BayesianOptimizer

Objective function

In [1]:
import numpy as np
from gpflowopt.domain import ContinuousParameter

def branin(x):
    x = np.atleast_2d(x)
    x1 = x[:, 0]
    x2 = x[:, 1]
    a = 1.
    b = 5.1 / (4. * np.pi ** 2)
    c = 5. / np.pi
    r = 6.
    s = 10.
    t = 1. / (8. * np.pi)
    ret = a * (x2 - b * x1 ** 2 + c * x1 - r) ** 2 + s * (1 - t) * np.cos(x1) + s
    return ret[:, None]

domain = ContinuousParameter('x1', -5, 10) + \
         ContinuousParameter('x2', 0, 15)
domain
Out[1]:
NameTypeValues
x1Continuous[-5. 10.]
x2Continuous[ 0. 15.]

Bayesian optimizer

In [8]:
import gpflow
from gpflowopt.bo import BayesianOptimizer
from gpflowopt.design import LatinHyperCube
from gpflowopt.acquisition import ExpectedImprovement
from gpflowopt.optim import SciPyOptimizer, StagedOptimizer, MCOptimizer

# Use standard Gaussian process Regression
lhd = LatinHyperCube(21, domain)
X = lhd.generate()
Y = branin(X)
model = gpflow.gpr.GPR(X, Y, gpflow.kernels.Matern52(2, ARD=True))
model.kern.lengthscales.transform = gpflow.transforms.Log1pe(1e-3)

# Now create the Bayesian Optimizer
alpha = ExpectedImprovement(model)

acquisition_opt = StagedOptimizer([MCOptimizer(domain, 200),
                                   SciPyOptimizer(domain)])

optimizer = BayesianOptimizer(domain, alpha, optimizer=acquisition_opt, verbose=True)

# Run the Bayesian optimization
r = optimizer.optimize(branin, n_iter=10)
print(r)
iter #  0 - MLL [-13.1] - fmin [4.42]
iter #  1 - MLL [-13.4] - fmin [4.42]
iter #  2 - MLL [-10.6] - fmin [0.723]
iter #  3 - MLL [-9.09] - fmin [0.486]
iter #  4 - MLL [-7.01] - fmin [0.486]
iter #  5 - MLL [-2.69] - fmin [0.446]
iter #  6 - MLL [1.96] - fmin [0.446]
iter #  7 - MLL [4.6] - fmin [0.446]
iter #  8 - MLL [7.37] - fmin [0.4]
iter #  9 - MLL [12.6] - fmin [0.4]
 constraints: array([], dtype=float64)
         fun: array([0.39970302])
     message: 'OK'
        nfev: 10
     success: True
           x: array([[9.40798299, 2.43938799]])

That’s all! Your objective function has now been optimized for 10 iterations.

Bayesian Optimization with black-box constraints

Joachim van der Herten

Introduction

This notebook demonstrates the optimization of an analytical function using the well known Expected Improvement (EI) function. The problem is constrained by a black-box constraint function. The feasible regions are learnt jointly with the optimal regions by considering a second acquisition function known as the Probability of Feasibility (PoF), following the approach of Gardner et al. (2014)

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

import gpflow
import gpflowopt
import numpy as np

Constrained problem

First we set up an objective function (the townsend function) and a constraint function. We further assume both functions are black-box. We also define the optimization domain (2 continuous parameters).

In [2]:
# Objective & constraint
def townsend(X):
    return -(np.cos((X[:,0]-0.1)*X[:,1])**2 + X[:,0] * np.sin(3*X[:,0]+X[:,1]))[:,None]

def constraint(X):
    return -(-np.cos(1.5*X[:,0]+np.pi)*np.cos(1.5*X[:,1])+np.sin(1.5*X[:,0]+np.pi)*np.sin(1.5*X[:,1]))[:,None]

# Setup input domain
domain = gpflowopt.domain.ContinuousParameter('x1', -2.25, 2.5) + \
         gpflowopt.domain.ContinuousParameter('x2', -2.5, 1.75)

# Plot
def plotfx():
    X = gpflowopt.design.FactorialDesign(101, domain).generate()
    Zo = townsend(X)
    Zc = constraint(X)
    mask = Zc>=0
    Zc[mask] = np.nan
    Zc[np.logical_not(mask)] = 1
    Z = Zo * Zc
    shape = (101, 101)

    f, axes = plt.subplots(1, 1, figsize=(7, 5))
    axes.contourf(X[:,0].reshape(shape), X[:,1].reshape(shape), Z.reshape(shape))
    axes.set_xlabel('x1')
    axes.set_ylabel('x2')
    axes.set_xlim([domain.lower[0], domain.upper[0]])
    axes.set_ylim([domain.lower[1], domain.upper[1]])
    return axes

plotfx();
_images/notebooks_constrained_bo_4_0.png

Modeling and joint acquisition function

We proceed by assigning the objective and constraint function a GP prior. Both functions are evaluated on a space-filling set of points (here, a Latin Hypercube design). Two GPR models are created. The EI is based on the model of the objective function (townsend), whereas PoF is based on the model of the constraint function. We then define the joint criterioin as the product of the EI and PoF.

In [3]:
# Initial evaluations
design = gpflowopt.design.LatinHyperCube(11, domain)
X = design.generate()
Yo = townsend(X)
Yc = constraint(X)

# Models
objective_model = gpflow.gpr.GPR(X, Yo, gpflow.kernels.Matern52(2, ARD=True))
objective_model.likelihood.variance = 0.01
constraint_model = gpflow.gpr.GPR(np.copy(X), Yc, gpflow.kernels.Matern52(2, ARD=True))
constraint_model.kern.lengthscales.transform = gpflow.transforms.Log1pe(1e-3)
constraint_model.likelihood.variance = 0.01
constraint_model.likelihood.variance.prior =  gpflow.priors.Gamma(1./4.,1.0)

# Setup
ei = gpflowopt.acquisition.ExpectedImprovement(objective_model)
pof = gpflowopt.acquisition.ProbabilityOfFeasibility(constraint_model)
joint = ei * pof

Initial belief

We can now inspect our belief about the optimization problem by plotting the models, the EI, PoF and joint mappings. Both models clearly are not very accurate yet. More specifically, the constraint model does not correctly capture the feasibility yet.

In [4]:
def plot():
    Xeval = gpflowopt.design.FactorialDesign(101, domain).generate()
    Yevala,_ = joint.operands[0].models[0].predict_f(Xeval)
    Yevalb,_ = joint.operands[1].models[0].predict_f(Xeval)
    Yevalc = np.maximum(ei.evaluate(Xeval), 0)
    Yevald = pof.evaluate(Xeval)
    Yevale = np.maximum(joint.evaluate(Xeval), 0)
    shape = (101, 101)
    plots = [('Objective model', Yevala), ('Constraint model', Yevalb),
             ('EI', Yevalc), ('PoF', Yevald),
             ('EI * PoF', Yevale)]

    plt.figure(figsize=(10,10))
    for i, plot in enumerate(plots):
        if i == 4:
            ax = plt.subplot2grid((3, 4), (2, 1), colspan=2)
        else:
            ax = plt.subplot2grid((3, 2), (int(i/2), i % 2))

        ax.contourf(Xeval[:,0].reshape(shape), Xeval[:,1].reshape(shape), plot[1].reshape(shape))
        ax.scatter(joint.data[0][:,0], joint.data[0][:,1], c='w')
        ax.set_title(plot[0])
        ax.set_xlabel('x1')
        ax.set_ylabel('x2')
        ax.set_xlim([domain.lower[0], domain.upper[0]])
        ax.set_ylim([domain.lower[1], domain.upper[1]])
    plt.tight_layout()

# Plot representing the model belief, and the belief mapped to EI and PoF
plot()
print(constraint_model)
WARNING:tensorflow:From c:\users\icouckuy\documents\projecten\gpflowopt\gpflowopt\acquisition\acquisition.py:362: calling reduce_prod (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
name.kern.lengthscales transform:+ve prior:None
[0.18194828 0.14835351]
name.kern.variance transform:+ve prior:None
[0.63086542]
name.likelihood.variance transform:+ve prior:Ga([0.25],[1.])
[0.16823107]

_images/notebooks_constrained_bo_8_1.png

Running Bayesian Optimizer

Running the Bayesian optimization is the next step. For this, we must set up an appropriate strategy to optimize the joint acquisition function. Sometimes this can be a bit challenging as often large non-varying areas may occur. A typical strategy is to apply a Monte Carlo optimization step first, then optimize the point with the best value (several variations exist). This approach is followed here. We then run the Bayesian Optimization and allow it to select up to 50 additional decisions.

The joint acquisition function assures the feasibility (w.r.t the constraint) is taken into account while selecting decisions for optimality.

In [5]:
# First setup the optimization strategy for the acquisition function
# Combining MC step followed by L-BFGS-B
acquisition_opt = gpflowopt.optim.StagedOptimizer([gpflowopt.optim.MCOptimizer(domain, 200),
                                                   gpflowopt.optim.SciPyOptimizer(domain)])

# Then run the BayesianOptimizer for 50 iterations
optimizer = gpflowopt.BayesianOptimizer(domain, joint, optimizer=acquisition_opt, verbose=True)
result = optimizer.optimize([townsend, constraint], n_iter=50)

print(result)
iter #  0 - MLL [-15.4, -16.4] - fmin [-1.21] - constraints [-0.916]
iter #  1 - MLL [-17.3, -17.7] - fmin [-1.21] - constraints [-0.916]
iter #  2 - MLL [-14.4, -12.7] - fmin [-1.21] - constraints [-0.916]
iter #  3 - MLL [-11.2, -13.4] - fmin [-1.54] - constraints [-0.778]
iter #  4 - MLL [-15.2, -14.5] - fmin [-1.54] - constraints [-0.778]
iter #  5 - MLL [-15.8, -14.6] - fmin [-1.63] - constraints [-0.615]
iter #  6 - MLL [-17.4, -14.8] - fmin [-1.63] - constraints [-0.615]
iter #  7 - MLL [-18.4, -14.4] - fmin [-1.63] - constraints [-0.615]
iter #  8 - MLL [-19.3, -15.1] - fmin [-1.63] - constraints [-0.615]
iter #  9 - MLL [-21.0, -15.2] - fmin [-1.63] - constraints [-0.615]
iter # 10 - MLL [-22.4, -15.1] - fmin [-1.63] - constraints [-0.615]
iter # 11 - MLL [-23.5, -15.2] - fmin [-1.63] - constraints [-0.615]
iter # 12 - MLL [-22.8, -14.3] - fmin [-1.63] - constraints [-0.615]
iter # 13 - MLL [-23.6, -14.9] - fmin [-1.88] - constraints [-0.921]
iter # 14 - MLL [-24.7, -15.7] - fmin [-1.88] - constraints [-0.921]
iter # 15 - MLL [-25.8, -16.2] - fmin [-1.88] - constraints [-0.921]
iter # 16 - MLL [-25.5, -15.3] - fmin [-1.88] - constraints [-0.921]
iter # 17 - MLL [-26.4, -15.1] - fmin [-2.3] - constraints [-0.889]
iter # 18 - MLL [-26.5, -15.2] - fmin [-2.65] - constraints [-0.567]
iter # 19 - MLL [-25.5, -15.3] - fmin [-2.85] - constraints [-0.393]
iter # 20 - MLL [-24.7, -15.4] - fmin [-2.96] - constraints [-0.299]
iter # 21 - MLL [-26.2, -18.3] - fmin [-2.96] - constraints [-0.299]
iter # 22 - MLL [-26.5, -17.1] - fmin [-3.17] - constraints [-0.493]
iter # 23 - MLL [-29.2, -18.2] - fmin [-3.17] - constraints [-0.493]
iter # 24 - MLL [-26.6, -15.9] - fmin [-3.18] - constraints [-0.64]
iter # 25 - MLL [-27.7, -16.5] - fmin [-3.18] - constraints [-0.64]
iter # 26 - MLL [-29.1, -18.1] - fmin [-3.18] - constraints [-0.64]
iter # 27 - MLL [-29.9, -18.7] - fmin [-3.18] - constraints [-0.64]
iter # 28 - MLL [-31.0, -18.6] - fmin [-3.18] - constraints [-0.64]
iter # 29 - MLL [-32.3, -19.6] - fmin [-3.18] - constraints [-0.64]
iter # 30 - MLL [-34.1, -18.8] - fmin [-3.18] - constraints [-0.64]
iter # 31 - MLL [-28.8, -14.1] - fmin [-3.2] - constraints [-0.571]
iter # 32 - MLL [-29.7, -14.2] - fmin [-3.2] - constraints [-0.571]
iter # 33 - MLL [-26.8, -10.7] - fmin [-3.2] - constraints [-0.571]
iter # 34 - MLL [-27.2, -11.0] - fmin [-3.2] - constraints [-0.571]
iter # 35 - MLL [-27.2, -11.9] - fmin [-3.2] - constraints [-0.571]
iter # 36 - MLL [-24.9, -9.19] - fmin [-3.2] - constraints [-0.571]
iter # 37 - MLL [-19.0, -3.94] - fmin [-3.2] - constraints [-0.571]
iter # 38 - MLL [-18.5, -3.12] - fmin [-3.2] - constraints [-0.571]
iter # 39 - MLL [-19.8, -3.26] - fmin [-3.2] - constraints [-0.571]
iter # 40 - MLL [-21.5, -2.7] - fmin [-3.2] - constraints [-0.571]
iter # 41 - MLL [-22.1, -1.64] - fmin [-3.2] - constraints [-0.571]
iter # 42 - MLL [-22.6, -0.981] - fmin [-3.2] - constraints [-0.571]
iter # 43 - MLL [-26.5, -4.06] - fmin [-3.2] - constraints [-0.571]
iter # 44 - MLL [-26.5, -3.61] - fmin [-3.2] - constraints [-0.571]
iter # 45 - MLL [-25.5, -1.62] - fmin [-3.2] - constraints [-0.571]
iter # 46 - MLL [-26.3, -1.42] - fmin [-3.2] - constraints [-0.571]
iter # 47 - MLL [-27.8, 0.0245] - fmin [-3.2] - constraints [-0.571]
iter # 48 - MLL [-28.4, 0.894] - fmin [-3.2] - constraints [-0.571]
iter # 49 - MLL [-25.9, 3.7] - fmin [-3.2] - constraints [-0.571]
 constraints: array([-0.57055951])
         fun: array([-3.19946737])
     message: 'OK'
        nfev: 50
     success: True
           x: array([[-2.25      , -1.29638397]])

Results

If we now plot the belief, we clearly see the constraint model has improved significantly. More specifically, its PoF mapping is an accurate representation of the true constraint function. By multiplying the EI by the PoF, the search is restricted to the feasible regions.

In [6]:
# Plotting belief again
print(constraint_model)
plot()
name.kern.lengthscales transform:+ve prior:None
[0.44676004 0.4866224 ]
name.kern.variance transform:+ve prior:None
[8.70831234]
name.likelihood.variance transform:+ve prior:Ga([0.25],[1.])
[1.96119947e-06]

_images/notebooks_constrained_bo_12_1.png

If we inspect the sampling distribution, we can see that the amount of samples in the infeasible regions is limited. The optimization has focussed on the feasible areas. In addition, it has been active mostly in two optimal regions.

In [7]:
# Plot function, overlayed by the constraint. Also plot the samples
axes = plotfx()
valid = joint.feasible_data_index()
axes.scatter(joint.data[0][valid,0], joint.data[0][valid,1], label='feasible data', c='w')
axes.scatter(joint.data[0][np.logical_not(valid),0], joint.data[0][np.logical_not(valid),1], label='data', c='r');
axes.legend()
Out[7]:
<matplotlib.legend.Legend at 0x1b941465dd8>
_images/notebooks_constrained_bo_14_1.png

Finally, the evolution of the best value over the number of iterations clearly shows a very good solution is already found after only a few evaluations.

In [8]:
f, axes = plt.subplots(1, 1, figsize=(7, 5))
f = joint.data[1][:,0]
f[joint.data[1][:,1] > 0] = np.inf
axes.plot(np.arange(0, joint.data[0].shape[0]), np.minimum.accumulate(f))
axes.set_ylabel('fmin')
axes.set_xlabel('Number of evaluated points');
_images/notebooks_constrained_bo_16_0.png
In [ ]:

Defining new acquisition functions

Joachim van der Herten

Introduction

GPflowOpt implements supports some acquisition functions for common scenarios, such as EI and PoF. However, it is straightforward to implement your own strategy. For most strategies, it is sufficient to implement the Acquisition interface. In case a more sophisticated model is needed, this can easily be achieved with GPflow.

In [1]:
%matplotlib inline
import numpy as np

import matplotlib.pyplot as plt
import copy
import gpflow
import gpflowopt
import tensorflow as tf
In [2]:
rng = np.random.RandomState(5)
def camelback(X):
    f = (4. - 2.1*X[:,0]**2 + 0.3* X[:,0]**4) * X[:,0]**2 + np.prod(X,axis=1) + 4 * (X[:,1]**2-1) * X[:,1]**2
    return f[:,None] + rng.rand(X.shape[0], 1) * 0.25

# Setup input domain
domain = gpflowopt.domain.ContinuousParameter('x1', -3, 3) + \
         gpflowopt.domain.ContinuousParameter('x2', -2, 2)

Augmented expected improvement

As an example on how to implement a custom acquisition function, we illustrate the Augmented EI (Huang et al. 2006), a modification for Expected Improvement for optimization of noisy functions. It is defined as

\[\alpha_{\text{aEI}}(\mathbf x_{\star}) = \alpha_{\text{EI}}(\mathbf x_{\star}) \left( 1 - \frac{\sigma_n}{\sqrt{\text{Var}\left[ f_{\star}\,|\, \mathbf x, \mathbf y, \mathbf x_{\star} \right] + \sigma_n^2}}\right)\]

This definition can be interpreted as rescaling of the EI score, with respect to the noise variance. For \(\sigma^2_n=0\), the rescale term equals 1 and normal EI is recovered. For \(\sigma^2_n > 0\), small prediction variances are punished, which decreases concentration of sampling and enhances exploration.

To implement this acquisition function, we could override the build_acquisition method of Acquisition, however in this case its easier to override ExpectedImprovement and obtain the EI score through a call to build_acquisition of the parent class.

In [3]:
class AugmentedEI(gpflowopt.acquisition.ExpectedImprovement):
    def __init__(self, model):
        super(AugmentedEI, self).__init__(model)

    def build_acquisition(self, Xcand):
        ei = super(AugmentedEI, self).build_acquisition(Xcand)
        X = self.models[0].input_transform.build_forward(Xcand)
        _, pvar = self.models[0].wrapped.build_predict(X)
        return tf.multiply(ei, 1 - tf.sqrt(self.models[0].likelihood.variance)
                           / (tf.sqrt(pvar + self.models[0].likelihood.variance)))

Results

This small experiment on the six hump camelback illustrates impact of the penalty term.

In [4]:
design = gpflowopt.design.LatinHyperCube(9, domain)
X = design.generate()
Y = camelback(X)
m = gpflow.gpr.GPR(X, Y, gpflow.kernels.Matern52(2, ARD=True, lengthscales=[10,10], variance=10000))
m.likelihood.variance = 1
m.likelihood.variance.fixed = True
ei = gpflowopt.acquisition.ExpectedImprovement(m)
m = gpflow.gpr.GPR(X, Y, gpflow.kernels.Matern52(2, ARD=True, lengthscales=[10,10], variance=10000))
m.likelihood.variance = 1
m.likelihood.variance.fixed = False
aei = AugmentedEI(m)

opt = gpflowopt.optim.StagedOptimizer([gpflowopt.optim.MCOptimizer(domain, 200),
                                       gpflowopt.optim.SciPyOptimizer(domain)])

bopt1 = gpflowopt.BayesianOptimizer(domain, ei, optimizer=opt)
with bopt1.silent():
    bopt1.optimize(camelback, n_iter=50)

bopt2 = gpflowopt.BayesianOptimizer(domain, aei, optimizer=opt)
with bopt2.silent():
    bopt2.optimize(camelback, n_iter=50)

f, axes = plt.subplots(1,2, figsize=(14,7))

Xeval = gpflowopt.design.FactorialDesign(101, domain).generate()
Yeval = camelback(Xeval)
titles = ['EI', 'AEI']
shape = (101, 101)

for ax, t, acq in zip(axes, titles, [ei, aei]):
    pred = acq.models[0].predict_f(Xeval)[0]
    ax.contourf(Xeval[:,0].reshape(shape), Xeval[:,1].reshape(shape),
                pred.reshape(shape))
    ax.set_xlabel('x1')
    ax.set_ylabel('x2')
    ax.set_title(t)
    ax.scatter(acq.data[0][:,0], acq.data[0][:,1], c='w')
_images/notebooks_new_acquisition_7_0.png

Bayesian Optimization of Hyperparameters

Vincent Dutordoir, Joachim van der Herten

Introduction

The paper Practical Bayesian Optimization of Machine Learning algorithms by Snoek et al. 2012 describes the use of Bayesian optimization for hyperparameter optimization. In this paper, the (at the time) state-of-the-art test error for convolutional neural networks on the CIFAR-10 dataset was improved significantly by optimizing the parameters of the training process with Bayesian optimization.

In this notebook we demonstrate the principle by optimizing the starting point of the maximum likelihood estimation of a GP. Note that we use a GP to optimize the initial hyperparameter values of another GP.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

# Loading airline data
import numpy as np
data = np.load('airline.npz')
X_train, Y_train = data['X_train'], data['Y_train']
D = Y_train.shape[1];

Data set

The data to illustrate hyperparameter optimization is the well-known airline passenger volume data. It is a one-dimensional time series of the passenger volumes of airlines over time. We wish to use it to make forecasts. Plotting the data below, it is clear that the data contains a pattern.

In [2]:
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xlabel('Time (years)')
ax.set_ylabel('Airline passengers ($10^3$)')
ax.plot(X_train.flatten(),Y_train.flatten(), c='b')
ax.set_xticklabels([1949, 1952, 1955, 1958, 1961, 1964])
plt.tight_layout()
_images/notebooks_hyperopt_4_0.png

Modeling

To forecast this timeseries, we will pick up its pattern using a Spectral Mixture kernel (Wilson et al, 2013). Essentially, this kernel is a sum of \(Q\) products of Cosine and RBF kernels. For this one-dimensional problem each term of the sum has 4 hyperparameters. To account for the upward trend, we also add a Linear and a Bias kernel.

In [3]:
from gpflow.kernels import RBF, Cosine, Linear, Bias, Matern52
from gpflow import transforms
from gpflow.gpr import GPR

Q = 10 # nr of terms in the sum
max_iters = 1000

# Trains a model with a spectral mixture kernel, given an ndarray of 2Q frequencies and lengthscales
def create_model(hypers):
    f = np.clip(hypers[:Q], 0, 5)
    weights = np.ones(Q) / Q
    lengths = hypers[Q:]

    kterms = []
    for i in range(Q):
        rbf = RBF(D, lengthscales=lengths[i], variance=1./Q)
        rbf.lengthscales.transform = transforms.Exp()
        cos = Cosine(D, lengthscales=f[i])
        kterms.append(rbf * cos)

    k = np.sum(kterms) + Linear(D) + Bias(D)
    m = GPR(X_train, Y_train, kern=k)
    return m

X_test, X_complete = data['X_test'], data['X_complete']
def plotprediction(m):
    # Perform prediction
    mu, var = m.predict_f(X_complete)

    # Plot
    fig = plt.figure()
    ax = fig.add_subplot(111)
    ax.set_xlabel('Time (years)')
    ax.set_ylabel('Airline passengers ($10^3$)')
    ax.set_xticklabels([1949, 1952, 1955, 1958, 1961, 1964, 1967, 1970, 1973])
    ax.plot(X_train.flatten(),Y_train.flatten(), c='b')
    ax.plot(X_complete.flatten(), mu.flatten(), c='g')
    lower = mu - 2*np.sqrt(var)
    upper = mu + 2*np.sqrt(var)
    ax.plot(X_complete, upper, 'g--', X_complete, lower, 'g--', lw=1.2)
    ax.fill_between(X_complete.flatten(), lower.flatten(), upper.flatten(),
                    color='g', alpha=.1)
    plt.tight_layout()

m = create_model(np.ones((2*Q,)))
m.optimize(maxiter=max_iters)
plotprediction(m)
_images/notebooks_hyperopt_6_0.png

In total, a lot of hyperparameters must be optimized. Furthermore, the optimization surface of the spectral mixture is highly multimodal. Starting from the default hyperparameter values the optimized GP is able to pick up the linear trend, and the RBF kernels perform local interpolation. However, the kernel is not able to extrapolate away from the data. In sum, with this starting point, the likelihood optimization ends in a local minimum.

Hyperparameter optimization

This issue is a known problem of the spectram mixture kernel, and several recommendations exist on how to improve the starting point. Here, we will use GPflowOpt to optimize the initial values for the lengthscales of the RBF and the Cosine kernel (i.e., the frequencies of the latter kernel). The other hyperparameters (rbf and cosine variances, likelihood variances and the linear and bias terms) are kept at their defaults and will be optimized by the standard likelihood optimization.

First, we setup the objective function accepting proposed starting points. The objective function returns the negative log likelihood, obtained by optimizing the hyperparameters from the given starting point. Then, we setup the 20D optimization domain for the frequencies and lengthscales.

In [4]:
from gpflowopt.domain import ContinuousParameter
from gpflowopt.objective import batch_apply

# Objective function for our optimization
# Input: N x 2Q ndarray, output: N x 1.
# returns the negative log likelihood obtained by training with given frequencies and rbf lengthscales
# Applies some tricks for stability similar to GPy's jitchol
@batch_apply
def objectivefx(freq):
    m = create_model(freq)
    for i in [0] + [10**exponent for exponent in range(6,1,-1)]:
        try:
            mean_diag = np.mean(np.diag(m.kern.compute_K_symm(X_train)))
            m.likelihood.variance = 1 + mean_diag * i
            m.optimize(maxiter=max_iters)
            return -m.compute_log_likelihood()
        except:
            pass
    raise RuntimeError("Frequency combination failed indefinately.")

# Setting up optimization domain.
lower = [0.]*Q
upper = [5.]*int(Q)
df = np.sum([ContinuousParameter('freq{0}'.format(i), l, u) for i, l, u in zip(range(Q), lower, upper)])

lower = [1e-5]*Q
upper = [300]*int(Q)
dl = np.sum([ContinuousParameter('l{0}'.format(i), l, u) for i, l, u in zip(range(Q), lower, upper)])
domain = df + dl
domain
Out[4]:
NameTypeValues
freq0Continuous[ 0. 5.]
freq1Continuous[ 0. 5.]
freq2Continuous[ 0. 5.]
freq3Continuous[ 0. 5.]
freq4Continuous[ 0. 5.]
freq5Continuous[ 0. 5.]
freq6Continuous[ 0. 5.]
freq7Continuous[ 0. 5.]
freq8Continuous[ 0. 5.]
freq9Continuous[ 0. 5.]
l0Continuous[ 1.00000000e-05 3.00000000e+02]
l1Continuous[ 1.00000000e-05 3.00000000e+02]
l2Continuous[ 1.00000000e-05 3.00000000e+02]
l3Continuous[ 1.00000000e-05 3.00000000e+02]
l4Continuous[ 1.00000000e-05 3.00000000e+02]
l5Continuous[ 1.00000000e-05 3.00000000e+02]
l6Continuous[ 1.00000000e-05 3.00000000e+02]
l7Continuous[ 1.00000000e-05 3.00000000e+02]
l8Continuous[ 1.00000000e-05 3.00000000e+02]
l9Continuous[ 1.00000000e-05 3.00000000e+02]

High-dimensional Bayesian optimization is tricky, although the complexity of the problem is significantly reduced due to symmetry in the optimization domain (interchanging frequencies does not make a difference) and because we still optimize the likelihood given the starting point. Therefore, getting near a mode is sufficient. Furthermore we disable ARD of the kernel of the model approximating the objective function to avoid optimizing a lot of lengthscales with little data. We then use EI to pick new candidate starting points and evaluate our objective.

In [5]:
from gpflowopt.design import LatinHyperCube
from gpflowopt.acquisition import ExpectedImprovement
from gpflowopt import optim, BayesianOptimizer
design = LatinHyperCube(6, domain)
X = design.generate()

Y = objectivefx(X)
m = GPR(X, Y, kern=Matern52(domain.size, ARD=False))
ei = ExpectedImprovement(m)
opt = optim.StagedOptimizer([optim.MCOptimizer(domain, 5000), optim.SciPyOptimizer(domain)])
optimizer = BayesianOptimizer(domain, ei, optimizer=opt)
with optimizer.silent():
    result = optimizer.optimize(objectivefx, n_iter=24)
Warning: inf or nan in gradient: replacing with zeros
Warning: inf or nan in gradient: replacing with zeros
Warning: inf or nan in gradient: replacing with zeros
Warning: inf or nan in gradient: replacing with zeros
In [6]:
m = create_model(result.x[0,:])
m.optimize()
plotprediction(m)
_images/notebooks_hyperopt_11_0.png

Clearly, the optimization point identified with BO is a lot better than the default values. We now obtain a proper forecasting. By inspecting the evolution of the best likelihood value obtained so far, we see the solution is identified quickly.

In [7]:
f, axes = plt.subplots(1, 1, figsize=(7, 5))
f = ei.data[1][:,0]
axes.plot(np.arange(0, ei.data[0].shape[0]), np.minimum.accumulate(f))
axes.set_ylabel('fmin')
axes.set_xlabel('Number of evaluated points');
_images/notebooks_hyperopt_13_0.png

Bayesian Multiobjective Optimization

Joachim van der Herten, Ivo Couckuyt

Introduction

This notebook demonstrates the multiobjective optimization of an analytical function using the hypervolume-based probability of improvement function.

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt

import gpflow
import gpflowopt
import numpy as np

We setup the Veldhuizen and Lamont multiobjective optimization problem 2 (vlmop2). The objectives of vlmop2 are very easy to model. Ideal for illustrating Bayesian multiobjective optimization.

In [2]:
# Objective
def vlmop2(x):
    transl = 1 / np.sqrt(2)
    part1 = (x[:, [0]] - transl) ** 2 + (x[:, [1]] - transl) ** 2
    part2 = (x[:, [0]] + transl) ** 2 + (x[:, [1]] + transl) ** 2
    y1 = 1 - np.exp(-1 * part1)
    y2 = 1 - np.exp(-1 * part2)
    return np.hstack((y1, y2))

# Setup input domain
domain = gpflowopt.domain.ContinuousParameter('x1', -2, 2) + \
         gpflowopt.domain.ContinuousParameter('x2', -2, 2)

# Plot
def plotfx():
    X = gpflowopt.design.FactorialDesign(101, domain).generate()
    Z = vlmop2(X)
    shape = (101, 101)

    axes = []
    plt.figure(figsize=(15, 5))
    for i in range(Z.shape[1]):
        axes = axes + [plt.subplot2grid((1, 2), (0, i))]

        axes[-1].contourf(X[:,0].reshape(shape), X[:,1].reshape(shape), Z[:,i].reshape(shape))
        axes[-1].set_title('Objective {}'.format(i+1))
        axes[-1].set_xlabel('x1')
        axes[-1].set_ylabel('x2')
        axes[-1].set_xlim([domain.lower[0], domain.upper[0]])
        axes[-1].set_ylim([domain.lower[1], domain.upper[1]])

    return axes

plotfx();
_images/notebooks_multiobjective_4_0.png

Multiobjective acquisition function

We can model the belief of each objective by one GP prior or model each objective separately using a GP prior. We illustrate the latter approach here. A set of data points arranged in a Latin Hypercube is evaluated on the vlmop2 function.

In multiobjective optimization the definition of improvement is ambigious. Here we define improvement using the contributing hypervolume which will determine the focus on density and uniformity of the Pareto set during sampling. For instance, due to the nature of the contributing hypervolume steep slopes in the Pareto front will be sampled less densely. The hypervolume-based probability of improvement is based on the model(s) of the objective functions (vlmop2) and aggregates all the information in one cost function which is a balance between:

  • improving our belief of the objectives (high uncertainty)
  • favoring points improving the Pareto set (large contributing hypervolume with a likely higher uncertainty)
  • focussing on augmenting the Pareto set (small contributing hypervolume but with low uncertainty).
In [3]:
# Initial evaluations
design = gpflowopt.design.LatinHyperCube(11, domain)
X = design.generate()
Y = vlmop2(X)

# One model for each objective
objective_models = [gpflow.gpr.GPR(X.copy(), Y[:,[i]].copy(), gpflow.kernels.Matern52(2, ARD=True)) for i in range(Y.shape[1])]
for model in objective_models:
    model.likelihood.variance = 0.01

hvpoi = gpflowopt.acquisition.HVProbabilityOfImprovement(objective_models)

Running the Bayesian optimizer

The optimization surface of multiobjective acquisition functions can be even more challenging than, e.g., standard expected improvement. Hence, a hybrid optimization scheme is preferred: a Monte Carlo optimization step first, then optimize the point with the best value.

We then run the Bayesian Optimization and allow it to select up to 20 additional decisions.

In [4]:
# First setup the optimization strategy for the acquisition function
# Combining MC step followed by L-BFGS-B
acquisition_opt = gpflowopt.optim.StagedOptimizer([gpflowopt.optim.MCOptimizer(domain, 1000),
                                                   gpflowopt.optim.SciPyOptimizer(domain)])

# Then run the BayesianOptimizer for 20 iterations
optimizer = gpflowopt.BayesianOptimizer(domain, hvpoi, optimizer=acquisition_opt, verbose=True)
result = optimizer.optimize([vlmop2], n_iter=20)

print(result)
print(optimizer.acquisition.pareto.front.value)
WARNING:tensorflow:From c:\users\icouckuy\documents\projecten\gpflowopt\gpflowopt\acquisition\hvpoi.py:124: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
iter #  0 - MLL [-16.1, -15.6] - fmin [0.385, 0.0171] (size 5)
iter #  1 - MLL [-17.1, -15.7] - fmin [0.385, 0.0171] (size 4)
iter #  2 - MLL [-17.2, -15.6] - fmin [0.385, 0.0171] (size 5)
iter #  3 - MLL [-16.9, -15.4] - fmin [0.104, 0.0171] (size 5)
iter #  4 - MLL [-15.6, -14.2] - fmin [0.104, 0.0171] (size 6)
iter #  5 - MLL [-12.8, -12.9] - fmin [0.0173, 0.0171] (size 7)
iter #  6 - MLL [-10.4, -11.6] - fmin [0.0173, 0.0171] (size 8)
iter #  7 - MLL [-9.08, -8.63] - fmin [0.0173, 0.0171] (size 9)
iter #  8 - MLL [-6.04, -5.29] - fmin [0.0173, 0.0171] (size 10)
iter #  9 - MLL [-3.42, -2.42] - fmin [0.0173, 0.0171] (size 11)
iter # 10 - MLL [0.634, 1.23] - fmin [0.0173, 0.0171] (size 12)
iter # 11 - MLL [3.85, 5.12] - fmin [0.0173, 0.0171] (size 13)
iter # 12 - MLL [6.22, 9.49] - fmin [0.0173, 0.0075] (size 13)
iter # 13 - MLL [9.52, 13.4] - fmin [0.0173, 0.0075] (size 14)
iter # 14 - MLL [13.1, 18.1] - fmin [0.0173, 0.0075] (size 15)
iter # 15 - MLL [16.8, 22.0] - fmin [0.0173, 0.0075] (size 16)
iter # 16 - MLL [20.5, 26.1] - fmin [0.0173, 0.0075] (size 17)
iter # 17 - MLL [25.5, 29.4] - fmin [0.000251, 0.0075] (size 18)
iter # 18 - MLL [30.7, 33.5] - fmin [0.000251, 0.0075] (size 19)
iter # 19 - MLL [35.4, 38.3] - fmin [0.000251, 0.0075] (size 20)
 constraints: array([], shape=(20, 0), dtype=float64)
         fun: array([[8.76019513e-01, 2.95088501e-01],
       [6.18804120e-01, 6.49890805e-01],
       [4.18727630e-01, 7.97346364e-01],
       [1.03929952e-01, 9.38953409e-01],
       [7.80270729e-01, 4.47691202e-01],
       [1.72680317e-02, 9.69541151e-01],
       [2.37219162e-01, 8.95331698e-01],
       [9.22995807e-01, 1.49785271e-01],
       [5.06696960e-01, 7.43243621e-01],
       [6.92444402e-01, 5.68593315e-01],
       [3.15497625e-01, 8.53859475e-01],
       [8.36671433e-01, 3.53351848e-01],
       [9.78800658e-01, 7.50373941e-03],
       [7.46198817e-01, 5.04272230e-01],
       [9.44585237e-01, 8.72091101e-02],
       [5.45778434e-01, 7.10310655e-01],
       [8.83126953e-01, 2.48920466e-01],
       [2.50661555e-04, 9.81376183e-01],
       [1.78561229e-01, 9.12201485e-01],
       [6.54328288e-01, 6.09221800e-01]])
     message: 'OK'
        nfev: 20
     success: True
           x: array([[-0.18500334, -0.42945409],
       [ 0.07215097, -0.0420748 ],
       [ 0.1856047 ,  0.18694199],
       [ 0.50560643,  0.44417276],
       [-0.18945651, -0.13641757],
       [ 0.62198831,  0.60624197],
       [ 0.25806047,  0.44415823],
       [-0.46055195, -0.38855214],
       [ 0.17235267,  0.05851627],
       [-0.09686607, -0.02277475],
       [ 0.24052228,  0.3054078 ],
       [-0.29630595, -0.19019709],
       [-0.73494906, -0.62490677],
       [-0.0533329 , -0.18336251],
       [-0.52263076, -0.46790591],
       [ 0.05291566,  0.10610416],
       [-0.33700177, -0.32075719],
       [ 0.69332471,  0.71490084],
       [ 0.36196052,  0.42858939],
       [-0.02203825, -0.02132529]])
[[2.52652535e-04 9.81379457e-01]
 [1.72742593e-02 9.69536115e-01]
 [1.03922043e-01 9.38960356e-01]
 [1.78575115e-01 9.12193438e-01]
 [2.37205200e-01 8.95338579e-01]
 [3.15507528e-01 8.53853294e-01]
 [4.18722475e-01 7.97352438e-01]
 [5.06692411e-01 7.43241571e-01]
 [5.45781765e-01 7.10308458e-01]
 [6.18806827e-01 6.49893408e-01]
 [6.92443661e-01 5.68586537e-01]
 [7.46198193e-01 5.04265827e-01]
 [7.80263435e-01 4.47719401e-01]
 [8.36683447e-01 3.53311767e-01]
 [8.76024836e-01 2.95069786e-01]
 [8.83110550e-01 2.48989866e-01]
 [9.23006049e-01 1.49705150e-01]
 [9.44580266e-01 8.72705698e-02]
 [9.78801698e-01 7.49811797e-03]]

For multiple objectives the returned OptimizeResult object contains the identified Pareto set instead of just a single optimum. Note that this is computed on the raw data Y.

The hypervolume-based probability of improvement operates on the Pareto set derived from the model predictions of the training data (to handle noise). This latter Pareto set can be found as optimizer.acquisition.pareto.front.value.

Plotting the results

Lets plot the belief of the final models and acquisition function.

In [5]:
def plot():
    grid_size = 51  # 101
    shape = (grid_size, grid_size)

    Xeval = gpflowopt.design.FactorialDesign(grid_size, domain).generate()

    Yeval_1, _ = hvpoi.models[0].predict_f(Xeval)
    Yeval_2, _ = hvpoi.models[1].predict_f(Xeval)

    Yevalc = hvpoi.evaluate(Xeval)

    plots = [((0,0), 1, 1, 'Objective 1 model', Yeval_1[:,0]),
             ((0,1), 1, 1, 'Objective 2 model', Yeval_2[:,0]),
             ((1,0), 2, 2, 'hypervolume-based PoI', Yevalc)]

    plt.figure(figsize=(7,7))
    for i, (plot_pos, plot_rowspan, plot_colspan, plot_title, plot_data) in enumerate(plots):
        data = hvpoi.data[0]

        ax = plt.subplot2grid((3, 2), plot_pos, rowspan=plot_rowspan, colspan=plot_colspan)
        ax.contourf(Xeval[:,0].reshape(shape), Xeval[:,1].reshape(shape), plot_data.reshape(shape))
        ax.scatter(data[:,0], data[:,1], c='w')
        ax.set_title(plot_title)
        ax.set_xlabel('x1')
        ax.set_ylabel('x2')
        ax.set_xlim([domain.lower[0], domain.upper[0]])
        ax.set_ylim([domain.lower[1], domain.upper[1]])
    plt.tight_layout()

# Plot representing the model belief, and the belief mapped to EI and PoF
plot()

for model in objective_models:
    print(model)
name.kern.lengthscales transform:+ve prior:None
[0.5838501  0.59262748]
name.kern.variance transform:+ve prior:None
[3.88573726]
name.likelihood.variance transform:+ve prior:None
[1.00000005e-06]

name.kern.lengthscales transform:+ve prior:None
[0.60790817 0.58374295]
name.kern.variance transform:+ve prior:None
[3.34354369]
name.likelihood.variance transform:+ve prior:None
[1.00000004e-06]

_images/notebooks_multiobjective_10_1.png

Finally, we can extract and plot the Pareto front ourselves using the pareto.non_dominated_sort function on the final data matrix Y.

The non-dominated sort returns the Pareto set (non-dominated solutions) as well as a dominance vector holding the number of dominated points for each point in Y. For example, we could only select the points with dom == 2, or dom == 0 (the latter retrieves the non-dominated solutions). Here we choose to use the dominance vector to color the points.

In [6]:
# plot pareto front
plt.figure(figsize=(9, 4))

R = np.array([1.5, 1.5])
print('R:', R)
hv = hvpoi.pareto.hypervolume(R)
print('Hypervolume indicator:', hv)

plt.figure(figsize=(7, 7))

pf, dom = gpflowopt.pareto.non_dominated_sort(hvpoi.data[1])

plt.scatter(hvpoi.data[1][:,0], hvpoi.data[1][:,1], c=dom)
plt.title('Pareto set')
plt.xlabel('Objective 1')
plt.ylabel('Objective 2')
R: [1.5 1.5]
WARNING:tensorflow:From c:\users\icouckuy\documents\projecten\gpflowopt\gpflowopt\pareto.py:264: calling reduce_min (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
Hypervolume indicator: 1.5606701453857534
Out[6]:
Text(0,0.5,'Objective 2')
<Figure size 648x288 with 0 Axes>
_images/notebooks_multiobjective_12_3.png

API and architecture

The structure of GPflowOpt

Joachim van der Herten

In this document, the structure of the GPflowOpt library is explained, including some small examples. First the Domain and Optimizer interfaces are shortly illustrated, followed by a description of the BayesianOptimizer. At the end, a step-by-step walkthrough of the BayesianOptimizer is given.

Optimization

The underlying design principles of GPflowOpt were chosen to address the following task: optimizing problems of the form

\[\underset{\boldsymbol{x} \in \mathcal{X}}{\operatorname{argmin}} f(\boldsymbol{x}).\]

The objective function \(f: \mathcal{X} \rightarrow \mathbb{R}^p\) maps a candidate optimum to a score (or multiple). Here \(\mathcal{X}\) represents the input domain. This domain encloses all candidate solutions to the optimization problem and can be entirely continuous (i.e., a \(d\)-dimensional hypercube) but may also consist of discrete and categorical parameters.

In GPflowOpt, the Domain and Optimizer interfaces and corresponding subclasses are used to explicitly represent the optimization problem.

Objective function

The objective function itself is provided and must be implemented as any python callable (function or object with implemented call operator), accepting a two dimensional numpy array with shape \((n, d)\) as an input, with \(n\) the batch size. It returns a tuple of numpy arrays: the first element of the tuple has shape \((n, p)\) and returns the objective scores for each point to evaluate. The second element is the gradient in every point, shaped either \((n, d)\) if we have a single-objective optimization problem, or \((n, d, p)\) in case of a multi-objective function. If the objective function is passed on to a gradient-free optimization method, the gradient is automatically discarded. GPflowOpt provides decorators which handle batch application of a function along the n points of the input matrix, or dealing with functions which accept each feature as function argument.

Here, we define a simple quadratic objective function:

In [5]:
import numpy as np

def fx(X):
    X = np.atleast_2d(X)
    # Return objective & gradient
    return np.sum(np.square(X), axis=1)[:,None], 2*X
Domain

Then, we represent \(\mathcal{X}\) by composing parameters. This is how a simple continuous square domain is defined:

In [6]:
from gpflowopt.domain import ContinuousParameter
domain = ContinuousParameter('x1', -2, 2) + ContinuousParameter('x2', -1, 2)
domain
Out[6]:
NameTypeValues
x1Continuous[-2. 2.]
x2Continuous[-1. 2.]
Optimize

Based on the domain and a valid objective function, we can now easily apply one of the included optimizers to optimize objective functions. GPflowOpt defines an intuitive Optimizer interface which can be used to specify the domain, the initial point(s), constraints (to be implemented) etc. Some popular optimization approaches are provided. Here is how our function is optimized using one of the available methods of SciPy’s minimize:

In [7]:
from gpflowopt.optim import SciPyOptimizer

optimizer = SciPyOptimizer(domain, method='SLSQP')
optimizer.set_initial([-1,-1])
optimizer.optimize(fx)
Out[7]:
fun: 0.0
     jac: array([ 0.,  0.])
 message: 'Optimization terminated successfully.'
    nfev: 3
     nit: 2
    njev: 2
  status: 0
 success: True
       x: array([[ 0.,  0.]])

And here is how we optimize it Monte-Carlo. We can pass the same function as the gradients are automatically discarded.

In [8]:
from gpflowopt.optim import MCOptimizer
optimizer = MCOptimizer(domain, 200)
optimizer.optimize(fx)
Out[8]:
fun: array([[ 0.02115951]])
 message: 'OK'
    nfev: 200
 success: True
       x: array([[ 0.05731395, -0.13369599]])

Bayesian Optimization

In Bayesian Optimization (BO), the typical assumption is that \(f\) is expensive to evaluate and no gradients are available. The typical approach is to sequentially select a limited set of decisions \(\boldsymbol{x}_0, \boldsymbol{x}_1, ... \boldsymbol{x}_{n-1}\) using a sampling policy. Hence each decision \(\boldsymbol{x}_i \in \mathcal{X}\) itself is the result of an optimization problem

\[\boldsymbol{x}_i = \underset{\boldsymbol{x}}{\operatorname{argmax}} \alpha_i(\boldsymbol{x})\]

Each iteration, a function \(\alpha_i\) which is cheap-to-evaluate acts as a surrogate for the expensive function. It is typically a mapping of the predictive distribution of a (Bayesian) model built on all decisions and their corresponding (noisy) evaluations. The mapping introduces an order in \(\mathcal{X}\) to obtain a certain goal. The typical goal within the context of BO is the search for optimality or feasibility while keeping the amount of required evaluations (\(n\)) a small number. As we can have several functions \(f\) representing objectives and constraints, BO may invoke several models and mappings \(\alpha\). These mappings are typically referred to as acquisition functions (or infill criteria). GPflowOpt defines an Acquisition interface to implement these mappings and provides implementations of some default choices. In combination with a special Optimizer implementation for BO, following steps are required for a typical workflow:

  1. Define the problem domain. Its dimensionality matches the input to the objective and constraint functions. (like normal optimization)
  2. Specify the (GP) models for the constraints and objectives. This involves choice of kernels, priors, fixes, transforms… this step follows the standard way of setting up GPflow models. GPflowOpt does not further wrap models hence it is possible to implement custom models in GPflow and use them directly in GPflowOpt
  3. Set up the acquisition function(s) using the available built-in implementations in GPflowOpt, or design your own by implementing the Acquisition interface.
  4. Set up an optimizer for the acquisition function.
  5. Run the high-level BayesianOptimizer which implements a typical BO flow. BayesianOptimizer in itself is compliant with the Optimizer interface. Exceptionally, the BayesianOptimizer requires that the objective function returns no gradients.

Alternatively, advanced users requiring finer control can easily implement their own flow based on the low-level interfaces of GPflowOpt, as the coupling between these objects was intentionally kept loose.

As illustration of the described flow, the previous example is optimized using Bayesian optimization instead, with the well-known Expected Improvement acquisition function:

In [9]:
from gpflowopt.bo import BayesianOptimizer
from gpflowopt.design import FactorialDesign
from gpflowopt.acquisition import ExpectedImprovement
import gpflow

# The Bayesian Optimizer does not expect gradients to be returned
def fx(X):
    X = np.atleast_2d(X)
    # Return objective & gradient
    return np.sum(np.square(X), axis=1)[:,None]


X = FactorialDesign(2, domain).generate()
Y = fx(X)

# initializing a standard BO model, Gaussian Process Regression with
# Matern52 ARD Kernel
model = gpflow.gpr.GPR(X, Y, gpflow.kernels.Matern52(2, ARD=True))
alpha = ExpectedImprovement(model)

# Now we must specify an optimization algorithm to optimize the acquisition
# function, each iteration.
acqopt = SciPyOptimizer(domain)

# Now create the Bayesian Optimizer
optimizer = BayesianOptimizer(domain, alpha, optimizer=acqopt)
with optimizer.silent():
    r = optimizer.optimize(fx, n_iter=15)
print(r)
     fun: array([ 0.17684308])
 message: 'OK'
    nfev: 15
 success: True
       x: array([[ 0.        ,  0.42052714]])

This brief snippet code starts from a 2-level grid (corner points of the domain) and uses a GP model to model the response surface of the objective function. The BayesianOptimizer follows the same interface as other optimizers and is initialized with a domain, the acquisition function and an additional optimization method to optimize the acquisition function each iteration. Finally, the optimizer performs 10 iterations to optimize fx.

The code to evaluate the acquisition function on the model is written in TensorFlow, allowing gradient-based optimization without additional effort due to the automated differentation.

Step-by-step description of the BayesianOptimizer

Prior to running BayesianOptimizer.optimize(), the acquisition function is initialized with an underlying model. Any data previously included in the model (through the GPModel.__init__ constructor in GPflow) is used as initial data. When optimize(function, n_iter) is called:

  1. Any data points returned by get_initial() are evaluated. Afterwards the evaluated points are added to the models by calling _update_model_data().
  2. n_iter iterations are performed. Each iteration the acquisition function is optimized, and the models are updated by calling _update_model_data()

The updating of a model through _update_model_data() calls set_data(X, Y) on the acquisition function. This covers following aspects:

  • GPModel.X and GPModel.Y are updated
  • Each of the contained models are returned to the state when the acquisition function was initialized and optimized. If the optimize_restarts parameter of the Acquisition.__init__() was set to \(n>1\), the state of the model is randomized and optimized \(n-1\) times. Finally, the state resulting in the best log_likelihood() is the new model state
  • Call Acquisition.setup() to perform any pre-calculation of quantities independent of candidate points, which can be used in build_acquisition().

The GPflow tree

The Acquisition interface, mapping the belief of the model(s) to a score indicating areas of optimality/feasibility, is implemented as part of the GPflow tree structure. More specifically it implements the Parameterized interface permitting the use of the useful AutoFlow decorator. The build_acquisition() method to be implemented by subclasses is a TensorFlow method, allowing automated differentiation of the acquisition function which enables gradient-based optimization thereof (not of the objective!). It may directly access the graph for computing the predictive distribution of a model by calling build_predict().

Bayesian Optimizer

class gpflowopt.BayesianOptimizer(domain, acquisition, optimizer=None, initial=None, scaling=True, hyper_draws=None, callback=<function jitchol_callback>, verbose=False)

A traditional Bayesian optimization framework implementation.

Like other optimizers, this optimizer is constructed for optimization over a domain. Additionally, it is configured with a separate optimizer for the acquisition function.

Attributes:
domain

The current domain the optimizer operates on.

Methods

failsafe() Context to provide a safe way for optimization.
get_initial() Return the initial set of points.
gradient_enabled() Returns if the optimizer is a gradient-based algorithm or not.
optimize(objectivefx[, n_iter]) Run Bayesian optimization for a number of iterations.
set_initial(initial) Set the initial set of points.
silent() Context for performing actions on an optimizer (such as optimize) with all stdout discarded.
__init__(domain, acquisition, optimizer=None, initial=None, scaling=True, hyper_draws=None, callback=<function jitchol_callback>, verbose=False)
Parameters:
  • domain (Domain) – The optimization space.
  • acquisition (Acquisition) – The acquisition function to optimize over the domain.
  • optimizer (Optimizer) – (optional) optimization approach for the acquisition function. If not specified, SciPyOptimizer is used. This optimizer will run on the same domain as the BayesianOptimizer object.
  • initial (Design) – (optional) The initial design of candidates to evaluate before the optimization loop runs. Note that if the underlying model contains already some data from an initial design, it is augmented with the evaluations obtained by evaluating the points as specified by the design.
  • scaling (bool) – (boolean, default true) if set to true, the outputs are normalized, and the inputs are scaled to a unit cube. This only affects model training: calls to acquisition.data, as well as returned optima are unscaled (see DataScaler for more details.). Note, the models contained by acquisition are modified directly, and so the references to the model outside of BayesianOptimizer now point to scaled models.
  • hyper_draws (int) – (optional) Enable marginalization of model hyperparameters. By default, point estimates are used. If this parameter set to n, n hyperparameter draws from the likelihood distribution are obtained using Hamiltonian MC. (see GPflow documentation for details) for each model. The acquisition score is computed for each draw, and averaged.
  • callback (callable) – (optional) this function or object will be called, after the data of all models has been updated with all models as retrieved by acquisition.models as argument without the wrapping model handling any scaling . This allows custom model optimization strategies to be implemented. All manipulations of GPflow models are permitted. Combined with the optimize_restarts parameter of Acquisition this allows several scenarios: do the optimization manually from the callback (optimize_restarts equals 0), or choose the starting point + some random restarts (optimize_restarts > 0).
domain

The current domain the optimizer operates on.

Returns::class:’~.domain.Domain` object
failsafe()

Context to provide a safe way for optimization.

If a RuntimeError is generated, the data of the acquisition object is saved to the disk. in the current directory. This allows the data to be re-used (which makes sense for expensive data).

The data can be used to experiment with fitting a GPflow model first (analyse the data, set sensible initial hyperparameter values and hyperpriors) before retrying Bayesian Optimization again.

optimize(objectivefx, n_iter=20)

Run Bayesian optimization for a number of iterations.

Before the loop is initiated, first all points retrieved by get_initial() are evaluated on the objective and black-box constraints. These points are then added to the acquisition function by calling set_data() (and hence, the underlying models).

Each iteration a new data point is selected for evaluation by optimizing an acquisition function. This point updates the models.

Parameters:
  • objectivefx – (list of) expensive black-box objective and constraint functions. For evaluation, the responses of all the expensive functions are aggregated column wise. Unlike the typical Optimizer interface, these functions should not return gradients.
  • n_iter – number of iterations to run
Returns:

OptimizeResult object

Acquisition functions

The gpflowopt package currently supports a limited number of popular acquisition functions. These are summarized in the table below. Detailed description for each can be found below.

Method Objective Constraint # Outputs
gpflowopt.acquisition.ExpectedImprovement   1
gpflowopt.acquisition.ProbabilityOfFeasibility   1
gpflowopt.acquisition.ProbabilityOfImprovement   1
gpflowopt.acquisition.LowerConfidenceBound   1
gpflowopt.acquisition.MinValueEntropySearch   1
gpflowopt.acquisition.HVProbabilityOfImprovement   > 1

Single-objective

Expected Improvement
class gpflowopt.acquisition.ExpectedImprovement(model)

Expected Improvement acquisition function for single-objective global optimization. Introduced by (Mockus et al, 1975).

Key reference:

@article{Jones:1998,
     title={Efficient global optimization of expensive black-box functions},
     author={Jones, Donald R and Schonlau, Matthias and Welch, William J},
     journal={Journal of Global optimization},
     volume={13},
     number={4},
     pages={455--492},
     year={1998},
     publisher={Springer}
}

This acquisition function is the expectation of the improvement over the current best observation w.r.t. the predictive distribution. The definition is closely related to the ProbabilityOfImprovement, but adds a multiplication with the improvement w.r.t the current best observation to the integral.

\[\alpha(\mathbf x_{\star}) = \int \max(f_{\min} - f_{\star}, 0) \, p( f_{\star}\,|\, \mathbf x, \mathbf y, \mathbf x_{\star} ) \, d f_{\star}\]
Attributes:
data

The training data of the models.

data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

models

The GPflow models representing our beliefs of the optimization problem.

name

An automatically generated name, given by the reference of the _parent to this instance.

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

build_prior() Build a tf expression for the prior by summing all child-parameter priors.
constraint_indices() Method returning the indices of the model outputs which correspond to the (expensive) constraint functions.
enable_scaling(domain) Enables and configures the DataScaler objects wrapping the GP models.
evaluate(Xcand) AutoFlow method to compute the acquisition scores for candidates, without returning the gradients.
evaluate_with_gradients(Xcand) AutoFlow method to compute the acquisition scores for candidates, also returns the gradients.
feasible_data_index() Returns a boolean array indicating which data points are considered feasible (according to the acquisition function(s) ) and which not.
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
objective_indices() Method returning the indices of the model outputs which are objective functions.
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_data(X, Y) Update the training data of the contained models
set_state(x) Set the values of all the parameters by recursion
tf_mode() A context for building models.
build_acquisition  
get_parameter_dict  
set_parameter_dict  
update_feed_dict  
__init__(model)
Parameters:model – GPflow model (single output) representing our belief of the objective
Probability of Feasibility
class gpflowopt.acquisition.ProbabilityOfFeasibility(model, threshold=0.0, minimum_pof=0.5)

Probability of Feasibility acquisition function for sampling feasible regions. Standard acquisition function for Bayesian Optimization with black-box expensive constraints.

Key reference:

@article{Schonlau:1997,
    title={Computer experiments and global optimization},
    author={Schonlau, Matthias},
    year={1997},
    publisher={University of Waterloo}
}

The acquisition function measures the probability of the latent function being smaller than a threshold for a candidate point.

\[\alpha(\mathbf x_{\star}) = \int_{-\infty}^{0} \, p(f_{\star}\,|\, \mathbf x, \mathbf y, \mathbf x_{\star} ) \, d f_{\star}\]
Attributes:
data

The training data of the models.

data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

models

The GPflow models representing our beliefs of the optimization problem.

name

An automatically generated name, given by the reference of the _parent to this instance.

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

build_prior() Build a tf expression for the prior by summing all child-parameter priors.
constraint_indices() Method returning the indices of the model outputs which correspond to the (expensive) constraint functions.
enable_scaling(domain) Enables and configures the DataScaler objects wrapping the GP models.
evaluate(Xcand) AutoFlow method to compute the acquisition scores for candidates, without returning the gradients.
evaluate_with_gradients(Xcand) AutoFlow method to compute the acquisition scores for candidates, also returns the gradients.
feasible_data_index() Returns a boolean array indicating which points are feasible (True) and which are not (False).
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
objective_indices() Method returning the indices of the model outputs which are objective functions.
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_data(X, Y) Update the training data of the contained models
set_state(x) Set the values of all the parameters by recursion
tf_mode() A context for building models.
build_acquisition  
get_parameter_dict  
set_parameter_dict  
update_feed_dict  
__init__(model, threshold=0.0, minimum_pof=0.5)
Parameters:
  • model – GPflow model (single output) representing our belief of the constraint
  • threshold – Observed values lower than the threshold are considered valid
  • minimum_pof – minimum pof score required for a point to be valid. For more information, see docstring of feasible_data_index
constraint_indices()

Method returning the indices of the model outputs which correspond to the (expensive) constraint functions. By default there are no constraint functions

feasible_data_index()

Returns a boolean array indicating which points are feasible (True) and which are not (False).

Answering the question which points are feasible? is slightly troublesome in case noise is present. Directly relying on the noisy data and comparing it to self.threshold does not make much sense.

Instead, we rely on the model belief using the PoF (a probability between 0 and 1). As the implementation of the PoF corresponds to the cdf of the (normal) predictive distribution in a point evaluated at the threshold, requiring a minimum pof of 0.5 implies the mean of the predictive distribution is below the threshold, hence it is marked as feasible. A minimum pof of 0 marks all points valid. Setting it to 1 results in all invalid.

Returns:boolean ndarray (size N)
Probability of Improvement
class gpflowopt.acquisition.ProbabilityOfImprovement(model)

Probability of Improvement acquisition function for single-objective global optimization.

Key reference:

@article{Kushner:1964,
    author = "Kushner, Harold J",
    journal = "Journal of Basic Engineering",
    number = "1",
    pages = "97--106",
    publisher = "American Society of Mechanical Engineers",
    title = "{A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise}",
    volume = "86",
    year = "1964"
}
\[\alpha(\mathbf x_{\star}) = \int_{-\infty}^{f_{\min}} \, p( f_{\star}\,|\, \mathbf x, \mathbf y, \mathbf x_{\star} ) \, d f_{\star}\]
Attributes:
data

The training data of the models.

data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

models

The GPflow models representing our beliefs of the optimization problem.

name

An automatically generated name, given by the reference of the _parent to this instance.

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

build_prior() Build a tf expression for the prior by summing all child-parameter priors.
constraint_indices() Method returning the indices of the model outputs which correspond to the (expensive) constraint functions.
enable_scaling(domain) Enables and configures the DataScaler objects wrapping the GP models.
evaluate(Xcand) AutoFlow method to compute the acquisition scores for candidates, without returning the gradients.
evaluate_with_gradients(Xcand) AutoFlow method to compute the acquisition scores for candidates, also returns the gradients.
feasible_data_index() Returns a boolean array indicating which data points are considered feasible (according to the acquisition function(s) ) and which not.
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
objective_indices() Method returning the indices of the model outputs which are objective functions.
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_data(X, Y) Update the training data of the contained models
set_state(x) Set the values of all the parameters by recursion
tf_mode() A context for building models.
build_acquisition  
get_parameter_dict  
set_parameter_dict  
update_feed_dict  
__init__(model)
Parameters:model – GPflow model (single output) representing our belief of the objective
Lower Confidence Bound
class gpflowopt.acquisition.LowerConfidenceBound(model, sigma=2.0)

Lower confidence bound acquisition function for single-objective global optimization.

Key reference:

@inproceedings{Srinivas:2010,
    author = "Srinivas, Niranjan and Krause, Andreas and Seeger, Matthias and Kakade, Sham M.",
    booktitle = "{Proceedings of the 27th International Conference on Machine Learning (ICML-10)}",
    editor = "F{"u}rnkranz, Johannes and Joachims, Thorsten",
    pages = "1015--1022",
    publisher = "Omnipress",
    title = "{Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design}",
    year = "2010"
}
\[\alpha(\mathbf x_{\star}) =\mathbb{E} \left[ f_{\star}\,|\, \mathbf x, \mathbf y, \mathbf x_{\star} \right] - \sigma \mbox{Var} \left[ f_{\star}\,|\, \mathbf x, \mathbf y, \mathbf x_{\star} \right]\]
Attributes:
data

The training data of the models.

data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

models

The GPflow models representing our beliefs of the optimization problem.

name

An automatically generated name, given by the reference of the _parent to this instance.

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

build_prior() Build a tf expression for the prior by summing all child-parameter priors.
constraint_indices() Method returning the indices of the model outputs which correspond to the (expensive) constraint functions.
enable_scaling(domain) Enables and configures the DataScaler objects wrapping the GP models.
evaluate(Xcand) AutoFlow method to compute the acquisition scores for candidates, without returning the gradients.
evaluate_with_gradients(Xcand) AutoFlow method to compute the acquisition scores for candidates, also returns the gradients.
feasible_data_index() Returns a boolean array indicating which data points are considered feasible (according to the acquisition function(s) ) and which not.
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
objective_indices() Method returning the indices of the model outputs which are objective functions.
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_data(X, Y) Update the training data of the contained models
set_state(x) Set the values of all the parameters by recursion
tf_mode() A context for building models.
build_acquisition  
get_parameter_dict  
set_parameter_dict  
update_feed_dict  
__init__(model, sigma=2.0)
Parameters:
  • model – GPflow model (single output) representing our belief of the objective
  • sigma – See formula, the higher the more exploration

Multi-objective

Hypervolume-based Probability of Improvement
class gpflowopt.acquisition.HVProbabilityOfImprovement(models)

Hypervolume-based Probability of Improvement.

A multiobjective acquisition function for multiobjective optimization. It is used to identify a complete Pareto set of non-dominated solutions.

Key reference:

@article{Couckuyt:2014,
    title={Fast calculation of multiobjective probability of improvement and expected improvement criteria for Pareto optimization},
    author={Couckuyt, Ivo and Deschrijver, Dirk and Dhaene, Tom},
    journal={Journal of Global Optimization},
    volume={60},
    number={3},
    pages={575--594},
    year={2014},
    publisher={Springer}
}

For a Pareto set \(\mathcal{P}\), the non-dominated section of the objective space is denoted by \(A\). The hypervolume() of the dominated part of the space is denoted by \(\mathcal{H}\) and can be used as indicator for the optimality of the Pareto set (the higher the better).

\[\begin{split}\boldsymbol{\mu} &= \left[ \mathbb{E} \left[ f^{(1)}_{\star}\,|\, \mathbf x, \mathbf y, \mathbf x_{\star} \right], ..., \mathbb{E} \left[ f^{(p)}_{\star}\,|\, \mathbf x, \mathbf y, \mathbf x_{\star} \right]\right] \\ I\left(\boldsymbol{\mu}, \mathcal{P}\right) &= \begin{cases} \left( \mathcal{H} \left( \mathcal{P} \cup \boldsymbol{\mu} \right) - \mathcal{H} \left( \mathcal{P} \right)) \right) ~ if ~ \boldsymbol{\mu} \in A \\ 0 ~ \mbox{otherwise} \end{cases} \\ \alpha(\mathbf x_{\star}) &= I\left(\boldsymbol{\mu}, \mathcal{P}\right) p\left(\mathbf x_{\star} \in A \right)\end{split}\]
Attributes:
pareto: An instance of Pareto.
Attributes:
data

The training data of the models.

data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

models

The GPflow models representing our beliefs of the optimization problem.

name

An automatically generated name, given by the reference of the _parent to this instance.

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

build_prior() Build a tf expression for the prior by summing all child-parameter priors.
constraint_indices() Method returning the indices of the model outputs which correspond to the (expensive) constraint functions.
enable_scaling(domain) Enables and configures the DataScaler objects wrapping the GP models.
evaluate(Xcand) AutoFlow method to compute the acquisition scores for candidates, without returning the gradients.
evaluate_with_gradients(Xcand) AutoFlow method to compute the acquisition scores for candidates, also returns the gradients.
feasible_data_index() Returns a boolean array indicating which data points are considered feasible (according to the acquisition function(s) ) and which not.
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
objective_indices() Method returning the indices of the model outputs which are objective functions.
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_data(X, Y) Update the training data of the contained models
set_state(x) Set the values of all the parameters by recursion
tf_mode() A context for building models.
build_acquisition  
get_parameter_dict  
set_parameter_dict  
update_feed_dict  
__init__(models)
Parameters:models – A list of (possibly multioutput) GPflow representing our belief of the objectives.
Pareto module
class gpflowopt.pareto.BoundedVolumes(lb, ub)
Attributes:
data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

name

An automatically generated name, given by the reference of the _parent to this instance.

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

append(lb, ub) Add new bounded volumes.
build_prior() Build a tf expression for the prior by summing all child-parameter priors.
clear() Clears all stored bounded volumes
empty(dim, dtype) Returns an empty bounded volume (hypercube).
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_state(x) Set the values of all the parameters by recursion
size()
return:volume of each bounded volume
tf_mode() A context for building models.
get_parameter_dict  
set_parameter_dict  
update_feed_dict  
append(lb, ub)

Add new bounded volumes.

Parameters:
  • lb – the lowerbounds of the volumes
  • ub – the upperbounds of the volumes
clear()

Clears all stored bounded volumes

classmethod empty(dim, dtype)

Returns an empty bounded volume (hypercube).

Parameters:
  • dim – dimension of the volume
  • dtype – dtype of the coordinates
Returns:

an empty BoundedVolumes

size()
Returns:volume of each bounded volume
class gpflowopt.pareto.Pareto(Y, threshold=0)
Attributes:
data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

name

An automatically generated name, given by the reference of the _parent to this instance.

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

bounds_2d() Computes the cells covering the non-dominated region for the specific case of only two objectives.
build_prior() Build a tf expression for the prior by summing all child-parameter priors.
divide_conquer_nd() Divide and conquer strategy to compute the cells covering the non-dominated region.
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
hypervolume(reference) Autoflow method to calculate the hypervolume indicator
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_state(x) Set the values of all the parameters by recursion
tf_mode() A context for building models.
update([Y, generic_strategy]) Update with new output data.
get_parameter_dict  
set_parameter_dict  
update_feed_dict  
bounds_2d()

Computes the cells covering the non-dominated region for the specific case of only two objectives.

Assumes the Pareto set has been sorted in ascending order on the first objective. This implies the second objective is sorted in descending order.

divide_conquer_nd()

Divide and conquer strategy to compute the cells covering the non-dominated region.

Generic version: works for an arbitrary number of objectives.

hypervolume(reference)

Autoflow method to calculate the hypervolume indicator

The hypervolume indicator is the volume of the dominated region.

Parameters:reference – reference point to use Should be equal or bigger than the anti-ideal point of the Pareto set For comparing results across runs the same reference point must be used
Returns:hypervolume indicator (the higher the better)
update(Y=None, generic_strategy=False)

Update with new output data.

Computes the Pareto set and if it has changed recalculates the cell bounds covering the non-dominated region. For the latter, a direct algorithm is used for two objectives, otherwise a generic divide and conquer strategy is employed.

Parameters:
  • Y – output data points
  • generic_strategy – Force the generic divide and conquer strategy regardless of the number of objectives (default False)
gpflowopt.pareto.non_dominated_sort(objectives)

Computes the non-dominated set for a set of data points

Parameters:objectives – data points
Returns:tuple of the non-dominated set and the degree of dominance, dominances gives the number of dominating points for each data point
Pareto.hypervolume(reference)

Autoflow method to calculate the hypervolume indicator

The hypervolume indicator is the volume of the dominated region.

Parameters:reference – reference point to use Should be equal or bigger than the anti-ideal point of the Pareto set For comparing results across runs the same reference point must be used
Returns:hypervolume indicator (the higher the better)

Initial Designs

Latin Hypercube design

class gpflowopt.design.LatinHyperCube(size, domain, max_seed_size=None)

Latin hypercube with optimized minimal distance between points.

Created with the Translational Propagation algorithm to avoid lengthy generation procedures. For dimensions smaller or equal to 6, this algorithm finds the quasi-optimal LHD with overwhelming probability. To increase this probability, if a design for a domain with dimensionality D is requested, D different designs are generated using seed sizes 1,2,…D (unless a maximum seed size 1<= S <= D is specified. The seeds themselves are small Latin hypercubes generated with the same algorithm.

Beyond 6D, the probability of finding the optimal LHD fades, although the resulting designs are still acceptable. Somewhere beyond 15D this algorithm tends to slow down a lot and become very memory demanding. Key reference is

@article{Viana:2010,
     title={An algorithm for fast optimal Latin hypercube design of experiments},
     author={Viana, Felipe AC and Venter, Gerhard and Balabanov, Vladimir},
     journal={International Journal for Numerical Methods in Engineering},
     volume={82},
     number={2},
     pages={135--156},
     year={2010},
     publisher={John Wiley & Sons, Ltd.}
}

For pre-generated LHDs please see the following website.

Attributes:
generative_domain
return:Domain object representing [1, N]^D, the generative domain for the TPLHD algorithm.

Methods

create_design() Generate several LHDs with increasing seed.
generate() Creates the design in the domain specified during construction.
__init__(size, domain, max_seed_size=None)
Parameters:
  • size – requested size N for the LHD
  • domain – domain to generate the LHD for, must be continuous
  • max_seed_size – the maximum size 1 <= S <= D for the seed, . If unspecified, equals the dimensionality D of the domain. During generation, S different designs are generated. Seeds with sizes 1,2,…S are used. Each seed itself is a small LHD.
create_design()

Generate several LHDs with increasing seed.

Maximum S = min(dimensionality,max_seed_size). From S candidate designs, the one with the best intersite distance is returned

Returns:data matrix, size N x D.
generative_domain
Returns:Domain object representing [1, N]^D, the generative domain for the TPLHD algorithm.

Factorial design

class gpflowopt.design.FactorialDesign(levels, domain)

A k-level grid-based design.

Design with the optimal minimal distance between points (a simple grid), however it risks collapsing points when removing parameters. Its size is a power of the domain dimensionality.

Attributes:
generative_domain
return:Domain object representing the domain associated with the points generated in create_design().

Methods

create_design() Returns a design generated in the generative domain.
generate() Creates the design in the domain specified during construction.
__init__(levels, domain)
Parameters:
  • size – number of data points to generate
  • domain – domain to generate data points in.
create_design()

Returns a design generated in the generative domain.

This method should be implemented in the subclasses.

Returns:data matrix, N x D
generative_domain
Returns:Domain object representing the domain associated with the points generated in create_design().

Defaults to [0,1]^D, can be overwritten by subclasses

Random design

class gpflowopt.design.RandomDesign(size, domain)

Random space-filling design.

Generates points drawn from the standard uniform distribution U(0,1).

Attributes:
generative_domain
return:Domain object representing the domain associated with the points generated in create_design().

Methods

create_design() Returns a design generated in the generative domain.
generate() Creates the design in the domain specified during construction.
__init__(size, domain)
Parameters:
  • size – number of data points to generate
  • domain – domain to generate data points in.
create_design()

Returns a design generated in the generative domain.

This method should be implemented in the subclasses.

Returns:data matrix, N x D

Data Transformations

Transforms

class gpflowopt.transforms.LinearTransform(A, b)

A simple linear transform of the form

\[\mathbf Y = (\mathbf A \mathbf X^{T})^{T} + \mathbf b \otimes \mathbf 1_{N}^{T}\]
Attributes:
data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

name

An automatically generated name, given by the reference of the _parent to this instance.

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

assign(other) Assign the parameters of another LinearTransform.
backward(Y) Overwrites the default backward approach, to avoid an explicit matrix inversion.
build_backward(Y) TensorFlow implementation of the inverse mapping
build_backward_variance(Yvar) Additional method for scaling variance backward (used in Normalizer).
build_forward(X) Tensorflow graph for the transformation of U -> V
build_prior() Build a tf expression for the prior by summing all child-parameter priors.
forward(X) Performs the transformation of U -> V
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_state(x) Set the values of all the parameters by recursion
tf_mode() A context for building models.
get_parameter_dict  
set_parameter_dict  
update_feed_dict  

Normalizer

class gpflowopt.scaling.DataScaler(model, domain=None, normalize_Y=False)

Model-wrapping class, primarily intended to assure the data in GPflow models is scaled.

One DataScaler wraps one GPflow model, and can scale the input as well as the output data. By default, if any kind of object attribute is not found in the datascaler object, it is searched on the wrapped model.

The datascaler supports both input as well as output scaling, although both scalings are set up differently:

  • For input, the transform is not automatically generated. By default, the input transform is the identity transform. The input transform can be set through the setter property, or by specifying a domain in the constructor. For the latter, the input transform will be initialized as the transform from the specified domain to a unit cube. When X is updated, the transform does not change.
  • If enabled: for output the data is always scaled to zero mean and unit variance. This means that if the Y property is set, the output transform is first calculated, then the data is scaled.

By default, Acquisition objects will always wrap each model received. However, the input and output transforms will be the identity transforms, and output normalization is switched off. It is up to the user (or specialized classes such as the BayesianOptimizer) to correctly configure the datascalers involved.

By carrying out the scaling at such a deep level in the framework, it is possible to keep the scaling hidden throughout the rest of GPflowOpt. This means that, during implementation of acquisition functions it is safe to assume the data is not scaled, and is within the configured optimization domain. There is only one exception: the hyperparameters are determined on the scaled data, and are NOT automatically unscaled by this class because the datascaler does not know what model is wrapped and what kernels are used. Should hyperparameters of the model be required, it is the responsibility of the implementation to rescale the hyperparameters. Additionally, applying hyperpriors should anticipate for the scaled data.

Attributes:
X

Returns the input data of the model, unscaled.

Y

Returns the output data of the wrapped model, unscaled.

data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

Returns an instance of the ParentHook instead of the usual reference to a Parentable.

input_transform

Get the current input transform

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

name

An automatically generated name, given by the reference of the _parent to this instance.

normalize_output
return:boolean, indicating if output is automatically scaled to zero mean and unit variance.
output_transform

Get the current output transform

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

build_predict(Xnew[, full_cov]) build_predict builds the TensorFlow graph for prediction.
build_prior() Build a tf expression for the prior by summing all child-parameter priors.
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
predict_density(Xnew, Ynew) Compute the (log) density of the data Ynew at the points Xnew
predict_f(Xnew) Compute the mean and variance of held-out data at the points Xnew
predict_f_full_cov(Xnew) Compute the mean and variance of held-out data at the points Xnew
predict_y(Xnew) Compute the mean and variance of held-out data at the points Xnew
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_state(x) Set the values of all the parameters by recursion
tf_mode() A context for building models.
get_parameter_dict  
set_parameter_dict  
update_feed_dict  

Interfaces

Domain

class gpflowopt.domain.Domain(parameters)

A domain representing the mathematical space over which is optimized.

Attributes:
highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

lower

Lower bound of the domain, corresponding to a numpy array with the lower value of each parameter

name

An automatically generated name, given by the reference of the _parent to this instance.

size

Returns the dimensionality of the domain

upper

Upper bound of the domain, corresponding to a numpy array with the upper value of each parameter

value
class gpflowopt.domain.Parameter(label, xinit)

Abstract class representing a parameter (which corresponds to a one-dimensional domain) This class can be derived for continuous, discrete and categorical parameters

Attributes:
highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

lower

Lower bound of the domain, corresponding to a numpy array with the lower value of each parameter

name

An automatically generated name, given by the reference of the _parent to this instance.

size

One parameter has a dimensionality of 1

upper

Upper bound of the domain, corresponding to a numpy array with the upper value of each parameter

value

Optimizer

class gpflowopt.optim.Optimizer(domain, exclude_gradient=False)

An optimization algorithm.

Starts from an initial (set of) point(s) it performs an optimization over a domain. May be gradient-based or gradient-free.

Attributes:
domain

The current domain the optimizer operates on.

Methods

get_initial() Return the initial set of points.
gradient_enabled() Returns if the optimizer is a gradient-based algorithm or not.
optimize(objectivefx, **kwargs) Optimize a given function f over a domain.
set_initial(initial) Set the initial set of points.
silent() Context for performing actions on an optimizer (such as optimize) with all stdout discarded.

Acquisition

class gpflowopt.acquisition.Acquisition(models=[], optimize_restarts=5)

An acquisition function maps the belief represented by a Bayesian model into a score indicating how promising a point is for evaluation.

In Bayesian Optimization this function is typically optimized over the optimization domain to determine the next point for evaluation. An object of this class holds a list of GPflow models. Subclasses implement a build_acquisition function which computes the acquisition function (usually from the predictive distribution) using TensorFlow. Optionally, a method setup can be implemented which computes some quantities which are used to compute the acquisition, but do not depend on candidate points.

Acquisition functions can be combined through addition or multiplication to construct joint criteria. For instance, for constrained optimization. The objects then form a tree hierarchy.

Acquisition models implement a lazy strategy to optimize models and run setup. This is implemented by a _needs_setup attribute (similar to the _needs_recompile in GPflow). Calling set_data() sets this flag to True. Calling methods marked with the setup_require decorator (such as evaluate) optimize all models, then call setup if this flag is set. In hierarchies, first acquisition objects handling constraint objectives are set up, then the objects handling objectives.

Attributes:
data

The training data of the models.

data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

models

The GPflow models representing our beliefs of the optimization problem.

name

An automatically generated name, given by the reference of the _parent to this instance.

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

build_prior() Build a tf expression for the prior by summing all child-parameter priors.
constraint_indices() Method returning the indices of the model outputs which correspond to the (expensive) constraint functions.
enable_scaling(domain) Enables and configures the DataScaler objects wrapping the GP models.
evaluate(Xcand) AutoFlow method to compute the acquisition scores for candidates, without returning the gradients.
evaluate_with_gradients(Xcand) AutoFlow method to compute the acquisition scores for candidates, also returns the gradients.
feasible_data_index() Returns a boolean array indicating which data points are considered feasible (according to the acquisition function(s) ) and which not.
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
objective_indices() Method returning the indices of the model outputs which are objective functions.
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_data(X, Y) Update the training data of the contained models
set_state(x) Set the values of all the parameters by recursion
tf_mode() A context for building models.
build_acquisition  
get_parameter_dict  
set_parameter_dict  
update_feed_dict  

Design

class gpflowopt.design.Design(size, domain)

Design of size N (number of points) generated within a D-dimensional domain.

Users should call generate() which auto-scales the design to the domain specified in the constructor. To implement new design methodologies subclasses should implement create_design(), which returns the design on the domain specified by the generative_domain method (which defaults to a unit cube).

Attributes:
generative_domain
return:Domain object representing the domain associated with the points generated in create_design().

Methods

create_design() Returns a design generated in the generative domain.
generate() Creates the design in the domain specified during construction.

Transform

class gpflowopt.transforms.DataTransform

Maps data in Domain U to Domain V.

Useful for scaling of data between domains.

Attributes:
data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

A reference to the top of the tree, usually a Model instance

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

name

An automatically generated name, given by the reference of the _parent to this instance.

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

backward(Y) Performs the transformation of V -> U.
build_forward(X) Tensorflow graph for the transformation of U -> V
build_prior() Build a tf expression for the prior by summing all child-parameter priors.
forward(X) Performs the transformation of U -> V
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_state(x) Set the values of all the parameters by recursion
tf_mode() A context for building models.
assign  
get_parameter_dict  
set_parameter_dict  
update_feed_dict  

ModelWrapper

class gpflowopt.models.ModelWrapper(model)

Class for fast implementation of a wrapper for models defined in GPflow.

Once wrapped, all lookups for attributes which are not found in the wrapper class are automatically forwarded to the wrapped model. To influence the I/O of methods on the wrapped class, simply implement the method in the wrapper and call the appropriate methods on the wrapped class. Specific logic is included to make sure that if AutoFlow methods are influenced following this pattern, the original AF storage (if existing) is unaffected and a new storage is added to the subclass.

Attributes:
data_holders

Return a list of all the child DataHolders

fixed

A boolean attribute to determine if all the child parameters of this node are fixed

highest_parent

Returns an instance of the ParentHook instead of the usual reference to a Parentable.

long_name

This is a unique identifier for a param object within a structure, made by concatenating the names through the tree.

name

An automatically generated name, given by the reference of the _parent to this instance.

sorted_params

Return a list of all the child parameters, sorted by id.

Methods

build_prior() Build a tf expression for the prior by summing all child-parameter priors.
get_feed_dict_keys() Recursively generate a dictionary of {object: _tf_array} pairs that can be used in update_feed_dict
get_free_state() Recurse get_free_state on all child parameters, and hstack them.
get_param_index(param_to_index) Given a parameter, compute the position of that parameter on the free-state vector.
get_samples_df(samples) Given a numpy array where each row is a valid free-state vector, return a pandas.DataFrame which contains the parameter name and associated samples in the correct form (e.g.
make_tf_array(X) Distribute a flat tensorflow array amongst all the child parameter of this instance.
randomize([distributions, skipfixed]) Calls randomize on all parameters in model hierarchy.
set_state(x) Set the values of all the parameters by recursion
tf_mode() A context for building models.
get_parameter_dict  
set_parameter_dict  
update_feed_dict  
__eq__(other)

Return self==value.

__getattr__(item)

If an attribute is not found in this class, it is searched in the wrapped model

__init__(model)
Parameters:model – model to be wrapped
__setattr__(key, value)
  1. If setting wrapped attribute, point parent to this object (the ModelWrapper).
  2. Setting attributes in the right objects. The following rules are processed in order: (a) If attribute exists in wrapper, set in wrapper. (b) If no object has been wrapped (wrapper is None), set attribute in the wrapper. (c) If attribute is found in the wrapped object, set it there. This rule is ignored for AF storages. (d) Set attribute in wrapper.
highest_parent

Returns an instance of the ParentHook instead of the usual reference to a Parentable.

name

An automatically generated name, given by the reference of the _parent to this instance.