Sparse Gaussian Process Regression (SGPR)


In this notebook, we'll overview how to use SGPR, the method of in which the inducing point locations are learned.

import math
import torch
import gpytorch
from matplotlib import pyplot as plt

# Make plots inline
%matplotlib inline

Loading Data

For this example notebook, we'll be using the elevators UCI dataset used in the paper. Running the next cell downloads a copy of the dataset that has already been scaled and normalized appropriately. For this notebook, we'll simply be splitting the data using the first 80% of the data as training and the last 20% as testing.

Note: Running the next cell will attempt to download a ~400 KB dataset file to the current directory.

import urllib.request
import os.path
from import loadmat
from math import floor

if not os.path.isfile('elevators.mat'):
    print('Downloading \'elevators\' UCI dataset...')
    urllib.request.urlretrieve('', 'elevators.mat')
data = torch.Tensor(loadmat('elevators.mat')['data'])
X = data[:, :-1]
X = X - X.min(0)[0]
X = 2 * (X / X.max(0)[0]) - 1
y = data[:, -1]

train_n = int(floor(0.8*len(X)))

train_x = X[:train_n, :].contiguous().cuda()
train_y = y[:train_n].contiguous().cuda()

test_x = X[train_n:, :].contiguous().cuda()
test_y = y[train_n:].contiguous().cuda()

torch.Size([16599, 18])

Defining the GP Model

We now define the GP model. For more details on the use of GP models, see our simpler examples. This model constructs a base scaled RBF kernel, and then simply wraps it in an InducingPointKernel. Other than this, everything should look the same as in the simple GP models.

from gpytorch.means import ConstantMean
from gpytorch.kernels import ScaleKernel, RBFKernel, InducingPointKernel
from gpytorch.distributions import MultivariateNormal

class GPRegressionModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(GPRegressionModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = ConstantMean()
        self.base_covar_module = ScaleKernel(RBFKernel())
        self.covar_module = InducingPointKernel(self.base_covar_module, inducing_points=train_x[:500, :], likelihood=likelihood)

    def forward(self, x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return MultivariateNormal(mean_x, covar_x)

likelihood = gpytorch.likelihoods.GaussianLikelihood().cuda()
model = GPRegressionModel(train_x, train_y, likelihood).cuda()

Training the model

# Find optimal model hyperparameters

# Use the adam optimizer
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

training_iterations = 25
def train():
    for i in range(training_iterations):
        # Zero backprop gradients
        # Get output from model
        output = model(train_x)
        # Calc loss and backprop derivatives
        loss = -mll(output, train_y)
        print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iterations, loss.item()))
# See dkl_mnist.ipynb for explanation of this flag
with gpytorch.settings.use_toeplitz(True):
    %time train()

Iter 1/25 - Loss: 0.796
Iter 2/25 - Loss: 0.786
Iter 3/25 - Loss: 0.773
Iter 4/25 - Loss: 0.762
Iter 5/25 - Loss: 0.748
Iter 6/25 - Loss: 0.735
Iter 7/25 - Loss: 0.724
Iter 8/25 - Loss: 0.711
Iter 9/25 - Loss: 0.698
Iter 10/25 - Loss: 0.685
Iter 11/25 - Loss: 0.670
Iter 12/25 - Loss: 0.657
Iter 13/25 - Loss: 0.645
Iter 14/25 - Loss: 0.631
Iter 15/25 - Loss: 0.617
Iter 16/25 - Loss: 0.602
Iter 17/25 - Loss: 0.588
Iter 18/25 - Loss: 0.574
Iter 19/25 - Loss: 0.561
Iter 20/25 - Loss: 0.545
Iter 21/25 - Loss: 0.529
Iter 22/25 - Loss: 0.515
Iter 23/25 - Loss: 0.500
Iter 24/25 - Loss: 0.484
Iter 25/25 - Loss: 0.470
CPU times: user 10.2 s, sys: 13.2 s, total: 23.4 s
Wall time: 3.29 s

Making Predictions

The next cell makes predictions with SKIP. We use the same max_root_decomposition size, and we also demonstrate increasing the max preconditioner size. Increasing the preconditioner size on this dataset is not necessary, but can make a big difference in final test performance, and is often preferable to increasing the number of CG iterations if you can afford the space.

with gpytorch.settings.max_preconditioner_size(10), torch.no_grad():
    with gpytorch.settings.use_toeplitz(False), gpytorch.settings.max_root_decomposition_size(30), gpytorch.settings.fast_pred_var():
        preds = model(test_x)

print('Test MAE: {}'.format(torch.mean(torch.abs(preds.mean - test_y))))

Test MAE: 0.0909833088517189

