This example shows the variational compression effect of so-called 'sparse' Gaussian processes. In particular we show how using the variational free energy framework of Titsias, 2009 we can compress a Gaussian process fit. First we set up the notebook with a fixed random seed, and import GPy.
In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
import GPy
import numpy as np
np.random.seed(101)
In [2]:
N = 50
noise_var = 0.05
X = np.linspace(0,10,50)[:,None]
k = GPy.kern.RBF(1)
y = np.random.multivariate_normal(np.zeros(N),k.K(X)+np.eye(N)*np.sqrt(noise_var)).reshape(-1,1)
In [3]:
m_full = GPy.models.GPRegression(X,y)
m_full.optimize('bfgs')
m_full.plot()
print m_full
In [4]:
Z = np.hstack((np.linspace(2.5,4.,3),np.linspace(7,8.5,3)))[:,None]
m = GPy.models.SparseGPRegression(X,y,Z=Z)
m.likelihood.variance = noise_var
m.plot()
print m
In [5]:
m.inducing_inputs.fix()
m.optimize('bfgs')
m.plot()
print m
The poor location of the inducing inputs causes the model to 'underfit' the data. The lengthscale is much longer than the full GP, and the noise variance is larger. This is because in this case the Kullback Leibler term in the objective free energy is dominating, and requires a larger lengthscale to improve the quality of the approximation. This is due to the poor location of the inducing inputs.
Firstly we try optimzing the location of the inducing inputs to fix the problem, however we still get a larger lengthscale than the Gaussian process we sampled from (or the full GP fit we did at the beginning).
In [6]:
m.randomize()
m.Z.unconstrain()
m.optimize('bfgs')
m.plot()
Out[6]:
In [7]:
Z = np.random.rand(12,1)*12
m = GPy.models.SparseGPRegression(X,y,Z=Z)
m.optimize('bfgs')
m.plot()
m_full.plot()
print m.log_likelihood(), m_full.log_likelihood()
This time, we have enough inducing points and the fit resembles that of the GP. This is verified by the fact that the bound on the marginal likelihood is tight, which means that our variational approximation must be good (the difference between the bound and the true likelihood is the Kullback Leibler divergence between the approximation and the truth).
In [ ]: