This example is the simplest form of using an RBF kernel in an AbstractVariationalGP
module for classification. This basic model is usable when there is not much training data and no advanced techniques are required.
In this example, we’re modeling a unit wave with period 1/2 centered with positive values @ x=0. We are going to classify the points as either +1 or -1.
Variational inference uses the assumption that the posterior distribution factors multiplicatively over the input variables. This makes approximating the distribution via the KL divergence possible to obtain a fast approximation to the posterior. For a good explanation of variational techniques, sections 4-6 of the following may be useful: https://www.cs.princeton.edu/courses/archive/fall11/cos597C/lectures/variational-inference-i.pdf
In [1]:
import math
import torch
import gpytorch
from matplotlib import pyplot as plt
%matplotlib inline
In [2]:
train_x = torch.linspace(0, 1, 10)
train_y = torch.sign(torch.cos(train_x * (4 * math.pi)))
The next cell demonstrates the simplist way to define a classification Gaussian process model in GPyTorch. If you have already done the GP regression tutorial, you have already seen how GPyTorch model construction differs from other GP packages. In particular, the GP model expects a user to write out a forward
method in a way analogous to PyTorch models. This gives the user the most possible flexibility.
Since exact inference is intractable for GP classification, GPyTorch approximates the classification posterior using variational inference. We believe that variational inference is ideal for a number of reasons. Firstly, variational inference commonly relies on gradient descent techniques, which take full advantage of PyTorch's autograd. This reduces the amount of code needed to develop complex variational models. Additionally, variational inference can be performed with stochastic gradient decent, which can be extremely scalable for large datasets.
If you are unfamiliar with variational inference, we recommend the following resources:
For most variational GP models, you will need to construct the following GPyTorch objects:
gpytorch.models.AbstractVariationalGP
) - This handles basic variational inference.gpytorch.variational.VariationalDistribution
) - This tells us what form the variational distribution q(u) should take.gpytorch.variational.VariationalStrategy
) - This tells us how to transform a distribution q(u) over the inducing point values to a distribution q(f) over the latent function values for some input x.gpytorch.likelihoods.BernoulliLikelihood
) - This is a good likelihood for binary classificationgpytorch.means.ConstantMean()
is a good place to start.gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
is a good place to start.gpytorch.distributions.MultivariateNormal
) - This is the object used to represent multivariate normal distributions.The AbstractVariationalGP
model is GPyTorch's simplist approximate inference model. It approximates the true posterior with a distribution specified by a VariationalDistribution
, which is most commonly some form of MultivariateNormal distribution. The model defines all the variational parameters that are needed, and keeps all of this information under the hood.
The components of a user built AbstractVariationalGP
model in GPyTorch are:
An __init__
method that constructs a mean module, a kernel module, a variational distribution object and a variational strategy object. This method should also be responsible for construting whatever other modules might be necessary.
A forward
method that takes in some $n \times d$ data x
and returns a MultivariateNormal with the prior mean and covariance evaluated at x
. In other words, we return the vector $\mu(x)$ and the $n \times n$ matrix $K_{xx}$ representing the prior mean and covariance matrix of the GP.
(For those who are unfamiliar with GP classification: even though we are performing classification, the GP model still returns a MultivariateNormal
. The likelihood transforms this latent Gaussian variable into a Bernoulli variable)
Here we present a simple classification model, but it is posslbe to construct more complex models. See some of the scalable classification examples or deep kernel learning examples for some other examples.
In [3]:
from gpytorch.models import AbstractVariationalGP
from gpytorch.variational import CholeskyVariationalDistribution
from gpytorch.variational import VariationalStrategy
class GPClassificationModel(AbstractVariationalGP):
def __init__(self, train_x):
variational_distribution = CholeskyVariationalDistribution(train_x.size(0))
variational_strategy = VariationalStrategy(self, train_x, variational_distribution)
super(GPClassificationModel, self).__init__(variational_strategy)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module = gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel())
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
latent_pred = gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
return latent_pred
# Initialize model and likelihood
model = GPClassificationModel(train_x)
likelihood = gpytorch.likelihoods.BernoulliLikelihood()
In the next cell, we optimize the variational parameters of our Gaussian process. In addition, this optimization loop also performs Type-II MLE to train the hyperparameters of the Gaussian process.
The most obvious difference here compared to many other GP implementations is that, as in standard PyTorch, the core training loop is written by the user. In GPyTorch, we make use of the standard PyTorch optimizers as from torch.optim
, and all trainable parameters of the model should be of type torch.nn.Parameter
. The variational parameters are predefined as part of the VariationalGP
model.
In most cases, the boilerplate code below will work well. It has the same basic components as the standard PyTorch training loop:
However, defining custom training loops allows for greater flexibility. For example, it is possible to learn the variational parameters and kernel hyperparameters with different learning rates.
In [4]:
from gpytorch.mlls.variational_elbo import VariationalELBO
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)
# "Loss" for GPs - the marginal log likelihood
# num_data refers to the amount of training data
mll = VariationalELBO(likelihood, model, train_y.numel())
training_iter = 50
for i in range(training_iter):
# Zero backpropped gradients from previous iteration
optimizer.zero_grad()
# Get predictive output
output = model(train_x)
# Calc loss and backprop gradients
loss = -mll(output, train_y)
loss.backward()
print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iter, loss.item()))
optimizer.step()
In the next cell, we make predictions with the model. To do this, we simply put the model and likelihood in eval mode, and call both modules on the test data.
In .eval()
mode, when we call model()
- we get GP's latent posterior predictions. These will be MultivariateNormal distributions. But since we are performing binary classification, we want to transform these outputs to classification probabilities using our likelihood.
When we call likelihood(model())
, we get a torch.distributions.Bernoulli
distribution, which represents our posterior probability that the data points belong to the positive class.
f_preds = model(test_x)
y_preds = likelihood(model(test_x))
f_mean = f_preds.mean
f_samples = f_preds.sample(sample_shape=torch.Size((1000,))
In [5]:
# Go into eval mode
model.eval()
likelihood.eval()
with torch.no_grad():
# Test x are regularly spaced by 0.01 0,1 inclusive
test_x = torch.linspace(0, 1, 101)
# Get classification predictions
observed_pred = likelihood(model(test_x))
# Initialize fig and axes for plot
f, ax = plt.subplots(1, 1, figsize=(4, 3))
ax.plot(train_x.numpy(), train_y.numpy(), 'k*')
# Get the predicted labels (probabilites of belonging to the positive class)
# Transform these probabilities to be 0/1 labels
pred_labels = observed_pred.mean.ge(0.5).float().mul(2).sub(1)
ax.plot(test_x.numpy(), pred_labels.numpy(), 'b')
ax.set_ylim([-3, 3])
ax.legend(['Observed Data', 'Mean', 'Confidence'])
In [ ]: