For 1D functions, SKI (or KISS-GP) is a great way to scale a GP up to very large datasets (100,000+ data points). Kernel interpolation for scalable structured Gaussian processes (KISS-GP) was introduced in this paper: http://proceedings.mlr.press/v37/wilson15.pdf
SKI is asymptotically very fast (nearly linear), very precise (error decays cubically), and easy to use in GPyTorch!
As you will see in this tutorial, it's really easy to apply SKI to an existing model. All you have to do is wrap your kernel module with a GridInterpolationKernel
.
In [1]:
import math
import torch
import gpytorch
from matplotlib import pyplot as plt
# Make plots inline
%matplotlib inline
In [2]:
train_x = torch.linspace(0, 1, 1000)
train_y = torch.sin(train_x * (4 * math.pi) + torch.randn(train_x.size()) * 0.2)
The model should be somewhat similar to the ExactGP
model in the simple regression example.
The only difference: we're wrapping our kernel module in a GridInterpolationKernel
. This signals to GPyTorch that you want to approximate this kernel matrix with SKI.
SKI has only one hyperparameter that you need to worry about: the grid size. For 1D functions, a good starting place is to use as many grid points as training points. (Don't worry - the grid points are really cheap to use!). You can use the gpytorch.utils.grid.choose_grid_size
helper to get a good starting point.
If you want, you can also explicitly determine the grid bounds of the SKI approximation using the grid_bounds
argument. However, it's easier if you don't use this argument - then GPyTorch automatically chooses the best bounds for you.
In [3]:
class GPRegressionModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)
# SKI requires a grid size hyperparameter. This util can help with that. Here we are using a grid that has the same number of points as the training data (a ratio of 1.0). Performance can be sensitive to this parameter, so you may want to adjust it for your own problem on a validation set.
grid_size = gpytorch.utils.grid.choose_grid_size(train_x,1.0)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module = gpytorch.kernels.GridInterpolationKernel(
gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel()),
grid_size=grid_size, num_dims=1,
)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = GPRegressionModel(train_x, train_y, likelihood)
In [4]:
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam([
{'params': model.parameters()}, # Includes GaussianLikelihood parameters
], lr=0.1)
# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
training_iterations = 30
for i in range(training_iterations):
optimizer.zero_grad()
output = model(train_x)
loss = -mll(output, train_y)
loss.backward()
print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
optimizer.step()
SKI is especially well-suited for predictions. It can comnpute predictive means in constant time, and with LOVE enabled (see this notebook), predictive variances are also constant time.
In [5]:
# Put model & likelihood into eval mode
model.eval()
likelihood.eval()
# Initalize plot
f, ax = plt.subplots(1, 1, figsize=(4, 3))
# The gpytorch.settings.fast_pred_var flag activates LOVE (for fast variances)
# See https://arxiv.org/abs/1803.06058
with torch.no_grad(), gpytorch.settings.fast_pred_var():
test_x = torch.linspace(0, 1, 51)
prediction = likelihood(model(test_x))
mean = prediction.mean
# Get lower and upper predictive bounds
lower, upper = prediction.confidence_region()
# Plot the training data as black stars
ax.plot(train_x.detach().numpy(), train_y.detach().numpy(), 'k*')
# Plot predictive means as blue line
ax.plot(test_x.detach().numpy(), mean.detach().numpy(), 'b')
# Plot confidence bounds as lightly shaded region
ax.fill_between(test_x.detach().numpy(), lower.detach().numpy(), upper.detach().numpy(), alpha=0.5)
ax.set_ylim([-3, 3])
ax.legend(['Observed Data', 'Mean', 'Confidence'])
None
In [ ]: