James Hensman, 2015-16
Converted to candlegp Thomas Viehmann
It's possible to construct a very flexible models with Gaussian processes by combining them with different likelihoods (sometimes called 'families' in the GLM literature). This makes inference of the GP intractable since the likelihoods is not generally conjugate to the Gaussian process. The general form of the model is $$\theta \sim p(\theta)\\f \sim \mathcal {GP}(m(x; \theta),\, k(x, x'; \theta))\\y_i \sim p(y | g(f(x_i))\,.$$
To perform inference in this model, we'll run MCMC using Hamiltonian Monte Carlo (HMC) over the function-values and the parameters $\theta$ jointly. Key to an effective scheme is rotation of the field using the Cholesky decomposition. We write
$$\theta \sim p(\theta)\\v \sim \mathcal {N}(0,\, I)\\LL^\top = K\\f = m + Lv\\y_i \sim p(y | g(f(x_i))\,.$$Joint HMC over v and the function values is not widely adopted in the literature becate of the difficulty in differentiating $LL^\top=K$. We've made this derivative available in tensorflow, and so application of HMC is relatively straightforward.
The first illustration in this notebook is 'Exponential Regression'. The model is $$\theta \sim p(\theta)\\f \sim \mathcal {GP}(0, k(x, x'; \theta))\\f_i = f(x_i)\\y_i \sim \mathcal {Exp} (e^{f_i})$$
We'll use MCMC to deal with both the kernel parameters $\theta$ and the latent function values $f$. first, generate a data set.
In [1]:
import sys, os
sys.path.append(os.path.join(os.getcwd(),'..'))
import candlegp
import candlegp.training.hmc
import numpy
import torch
from torch.autograd import Variable
from matplotlib import pyplot
pyplot.style.use('ggplot')
%matplotlib inline
X = Variable(torch.linspace(-3,3,20,out=torch.DoubleTensor()))
Y = Variable(torch.from_numpy(numpy.random.exponential(((X.data.sin())**2).numpy())))
GPflow's model for fully-Bayesian MCMC is called GPMC. It's constructed like any other model, but contains a parameter V which represents the centered values of the function.
In [2]:
#build the model
k = candlegp.kernels.Matern32(1,ARD=False).double() + candlegp.kernels.Bias(1).double()
l = candlegp.likelihoods.Exponential()
m = candlegp.models.GPMC(X[:,None], Y[:,None], k, l)
m
Out[2]:
The V parameter already has a prior applied. We'll add priors to the parameters also (these are rather arbitrary, for illustration).
In [3]:
m.kern.kern_list[0].lengthscales.prior = candlegp.priors.Gamma(1., 1., ttype=torch.DoubleTensor)
m.kern.kern_list[0].variance.prior = candlegp.priors.Gamma(1.,1., ttype=torch.DoubleTensor)
m.kern.kern_list[1].variance.prior = candlegp.priors.Gamma(1.,1., ttype=torch.DoubleTensor)
m.V.prior = candlegp.priors.Gaussian(0.,1., ttype=torch.DoubleTensor)
m
Out[3]:
Running HMC is as easy as hitting m.sample(). GPflow only has HMC sampling for the moment, and it's a relatively vanilla implementation (no NUTS, for example). There are two setting to tune, the step size (epsilon) and the maximum noumber of steps Lmax. Each proposal will take a random number of steps between 1 and Lmax, each of length epsilon.
We'll use the verbose setting so that we can see the acceptance rate.
In [4]:
# start near MAP
opt = torch.optim.LBFGS(m.parameters(), lr=1e-2, max_iter=40)
def eval_model():
obj = m()
opt.zero_grad()
obj.backward()
return obj
for i in range(50):
obj = m()
opt.zero_grad()
obj.backward()
opt.step(eval_model)
if i%5==0:
print(i,':',obj.data[0])
m
Out[4]:
In [5]:
res = candlegp.training.hmc.hmc_sample(m,500,0.2,burn=50, thin=10)
In [6]:
xtest = torch.linspace(-4,4,100).double().unsqueeze(1)
f_samples = []
for i in range(len(res[0])):
for j,mp in enumerate(m.parameters()):
mp.set(res[j+1][i])
f_samples.append(m.predict_f_samples(Variable(xtest), 5).squeeze(0).t())
f_samples = torch.cat(f_samples, dim=0)
In [7]:
rate_samples = torch.exp(f_samples)
pyplot.figure(figsize=(12, 6))
line, = pyplot.plot(xtest.numpy(), rate_samples.data.mean(0).numpy(), lw=2)
pyplot.fill_between(xtest[:,0], numpy.percentile(rate_samples.data.numpy(), 5, axis=0), numpy.percentile(rate_samples.data.numpy(), 95, axis=0), color=line.get_color(), alpha = 0.2)
pyplot.plot(X.data.numpy(), Y.data.numpy(), 'kx', mew=2)
pyplot.ylim(-0.1, numpy.max(numpy.percentile(rate_samples.data.numpy(), 95, axis=0)))
Out[7]:
In [8]:
import pandas
df = pandas.DataFrame(res[1:],index=[n for n,p in m.named_parameters()]).transpose()
df[:10]
Out[8]:
In [9]:
df["kern.kern_list.1.variance"].apply(lambda x: x[0]).hist(bins=20)
Out[9]:
In [64]:
Z = torch.linspace(-3,3,5).double().unsqueeze(1)
k2 = candlegp.kernels.Matern32(1,ARD=False).double() + candlegp.kernels.Bias(1).double()
l2 = candlegp.likelihoods.Exponential()
m2 = candlegp.models.SGPMC(X[:,None], Y[:,None], k2, l2, Z)
m2.kern.kern_list[0].lengthscales.prior = candlegp.priors.Gamma(1., 1., ttype=torch.DoubleTensor)
m2.kern.kern_list[0].variance.prior = candlegp.priors.Gamma(1.,1., ttype=torch.DoubleTensor)
m2.kern.kern_list[1].variance.prior = candlegp.priors.Gamma(1.,1., ttype=torch.DoubleTensor)
m2.V.prior = candlegp.priors.Gaussian(0.,1., ttype=torch.DoubleTensor)
m2
Out[64]:
In [66]:
# start near MAP
opt = torch.optim.LBFGS(m2.parameters(), lr=1e-2, max_iter=40)
def eval_model():
obj = m2()
opt.zero_grad()
obj.backward()
return obj
for i in range(50):
obj = m2()
opt.zero_grad()
obj.backward()
opt.step(eval_model)
if i%5==0:
print(i,':',obj.data[0])
m2
Out[66]:
In [67]:
res = candlegp.training.hmc.hmc_sample(m,500,0.2,burn=50, thin=10)
In [68]:
xtest = torch.linspace(-4,4,100).double().unsqueeze(1)
f_samples = []
for i in range(len(res[0])):
for j,mp in enumerate(m.parameters()):
mp.set(res[j+1][i])
f_samples.append(m.predict_f_samples(Variable(xtest), 5).squeeze(0).t())
f_samples = torch.cat(f_samples, dim=0)
In [74]:
rate_samples = torch.exp(f_samples)
pyplot.figure(figsize=(12, 6))
line, = pyplot.plot(xtest.numpy(), rate_samples.data.mean(0).numpy(), lw=2)
pyplot.fill_between(xtest[:,0], numpy.percentile(rate_samples.data.numpy(), 5, axis=0), numpy.percentile(rate_samples.data.numpy(), 95, axis=0), color=line.get_color(), alpha = 0.2)
pyplot.plot(X.data.numpy(), Y.data.numpy(), 'kx', mew=2)
pyplot.plot(m2.Z.get().data.numpy(),numpy.zeros(m2.num_inducing),'o')
pyplot.ylim(-0.1, numpy.max(numpy.percentile(rate_samples.data.numpy(), 95, axis=0)))
Out[74]:
In [ ]: