For 1D functions, SKI (or KISS-GP) is a great way to scale a GP up to very large datasets (100,000+ data points). Kernel interpolation for scalable structured Gaussian processes (KISS-GP) was introduced in this paper: http://proceedings.mlr.press/v37/wilson15.pdf
SKI is asymptotically very fast (nearly linear), very precise (error decays cubically), and easy to use in GPyTorch!
As you will see in this tutorial, it's really easy to apply SKI to an existing model. All you have to do is wrap your kernel module with a GridInterpolationKernel
.
This is the same as the standard KISSGP 1D notebook, but where everything is super-charged with CUDA. SKI is especially fast on the GPU.
The only thing required for GPU acceleration: train_x.cuda()
, train_y.cuda()
, and model.cuda()
.
In [1]:
import math
import torch
import gpytorch
from matplotlib import pyplot as plt
# Make plots inline
%matplotlib inline
In [2]:
train_x = torch.linspace(0, 1, 1000).cuda()
train_y = torch.sin(train_x * (4 * math.pi) + torch.randn(train_x.size()).cuda() * 0.2)
In [3]:
class GPRegressionModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)
# SKI requires a grid size hyperparameter. This util can help with that. Here we are using the same number of grid points as training points (a ratio of 1). Performance can be sensitive to this parameter (the number of grid points), so you may want to adjust it to your problem on a validation set.
grid_size = gpytorch.utils.grid.choose_grid_size(train_x,1.0)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module = gpytorch.kernels.GridInterpolationKernel(
gpytorch.kernels.ScaleKernel(gpytorch.kernels.RBFKernel()),
grid_size=grid_size, num_dims=1,
)
def forward(self, x):
mean_x = self.mean_module(x)
covar_x = self.covar_module(x)
return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = GPRegressionModel(train_x, train_y, likelihood).cuda()
In [4]:
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam([
{'params': model.parameters()}, # Includes GaussianLikelihood parameters
], lr=0.1)
# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
def train():
training_iterations = 30
for i in range(training_iterations):
optimizer.zero_grad()
output = model(train_x)
loss = -mll(output, train_y)
loss.backward()
print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
optimizer.step()
%time train()
In [5]:
# Put model & likelihood into eval mode
model.eval()
likelihood.eval()
# Initalize plot
f, ax = plt.subplots(1, 1, figsize=(4, 3))
# The gpytorch.settings.fast_pred_var flag activates LOVE (for fast variances)
# See https://arxiv.org/abs/1803.06058
with torch.no_grad(), gpytorch.settings.fast_pred_var():
test_x = torch.linspace(0, 1, 51).cuda()
prediction = likelihood(model(test_x))
mean = prediction.mean
# Get lower and upper predictive bounds
lower, upper = prediction.confidence_region()
# Plot the training data as black stars
ax.plot(train_x.detach().cpu().numpy(), train_y.detach().cpu().numpy(), 'k*')
# Plot predictive means as blue line
ax.plot(test_x.detach().cpu().numpy(), mean.detach().cpu().numpy(), 'b')
# Plot confidence bounds as lightly shaded region
ax.fill_between(test_x.detach().cpu().numpy(), lower.detach().cpu().numpy(), upper.detach().cpu().numpy(), alpha=0.5)
ax.set_ylim([-3, 3])
ax.legend(['Observed Data', 'Mean', 'Confidence'])
None
In [ ]: