GP Regression with a Spectral Mixture Kernel

Introduction

This example shows how to use a SpectralMixtureKernel module on an ExactGP model. This module is designed for

  • When you want to use exact inference (e.g. for regression)
  • When you want to use a more sophisticated kernel than RBF

Function to be modeled is $sin(2\pi x)$

The Spectral Mixture (SM) kernel was invented and discussed in this paper: https://arxiv.org/pdf/1302.4245.pdf


In [1]:
import math
import torch
import gpytorch
from matplotlib import pyplot as plt

%matplotlib inline
%load_ext autoreload
%autoreload 2

Set up training data

In the next cell, we set up the training data for this example. We'll be using 15 regularly spaced points on [0,1] which we evaluate the function on and add Gaussian noise to get the training labels.


In [2]:
train_x = torch.linspace(0, 1, 15)
train_y = torch.sin(train_x * (2 * math.pi))

Set up the model

The model should be very similar to the ExactGP model in the simple regression example.

The only difference is here, we're using a more complex kernel (the SpectralMixtureKernel). This kernel requires careful initialization to work properly. To that end, in the model __init__ function, we call

self.covar_module = gpytorch.kernels.SpectralMixtureKernel(n_mixtures=4)
self.covar_module.initialize_from_data(train_x, train_y)

This ensures that, when we perform optimization to learn kernel hyperparameters, we will be starting from a reasonable initialization.


In [3]:
class SpectralMixtureGPModel(gpytorch.models.ExactGP):
    def __init__(self, train_x, train_y, likelihood):
        super(SpectralMixtureGPModel, self).__init__(train_x, train_y, likelihood)
        self.mean_module = gpytorch.means.ConstantMean()
        self.covar_module = gpytorch.kernels.SpectralMixtureKernel(num_mixtures=4)
        self.covar_module.initialize_from_data(train_x, train_y)

    def forward(self,x):
        mean_x = self.mean_module(x)
        covar_x = self.covar_module(x)
        return gpytorch.distributions.MultivariateNormal(mean_x, covar_x)

    
likelihood = gpytorch.likelihoods.GaussianLikelihood()
model = SpectralMixtureGPModel(train_x, train_y, likelihood)

Training the model

In the next cell, we handle using Type-II MLE to train the hyperparameters of the Gaussian process. The spectral mixture kernel's hyperparameters start from what was specified in initialize_from_data.

See the simple regression example for more info on this step.


In [4]:
# Find optimal model hyperparameters
model.train()
likelihood.train()

# Use the adam optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)

training_iter = 100
for i in range(training_iter):
    optimizer.zero_grad()
    output = model(train_x)
    loss = -mll(output, train_y)
    loss.backward()
    print('Iter %d/%d - Loss: %.3f' % (i + 1, training_iter, loss.item()))
    optimizer.step()


Iter 1/100 - Loss: 1.339
Iter 2/100 - Loss: 1.303
Iter 3/100 - Loss: 1.248
Iter 4/100 - Loss: 1.219
Iter 5/100 - Loss: 1.170
Iter 6/100 - Loss: 1.109
Iter 7/100 - Loss: 1.088
Iter 8/100 - Loss: 1.059
Iter 9/100 - Loss: 1.008
Iter 10/100 - Loss: 0.953
Iter 11/100 - Loss: 0.906
Iter 12/100 - Loss: 0.887
Iter 13/100 - Loss: 0.811
Iter 14/100 - Loss: 0.762
Iter 15/100 - Loss: 0.715
Iter 16/100 - Loss: 0.683
Iter 17/100 - Loss: 0.639
Iter 18/100 - Loss: 0.585
Iter 19/100 - Loss: 0.542
Iter 20/100 - Loss: 0.498
Iter 21/100 - Loss: 0.467
Iter 22/100 - Loss: 0.402
Iter 23/100 - Loss: 0.349
Iter 24/100 - Loss: 0.313
Iter 25/100 - Loss: 0.251
Iter 26/100 - Loss: 0.214
Iter 27/100 - Loss: 0.172
Iter 28/100 - Loss: 0.148
Iter 29/100 - Loss: 0.122
Iter 30/100 - Loss: 0.036
Iter 31/100 - Loss: -0.020
Iter 32/100 - Loss: -0.073
Iter 33/100 - Loss: -0.102
Iter 34/100 - Loss: -0.163
Iter 35/100 - Loss: -0.146
Iter 36/100 - Loss: -0.174
Iter 37/100 - Loss: -0.216
Iter 38/100 - Loss: -0.289
Iter 39/100 - Loss: -0.393
Iter 40/100 - Loss: -0.430
Iter 41/100 - Loss: -0.331
Iter 42/100 - Loss: -0.388
Iter 43/100 - Loss: -0.504
Iter 44/100 - Loss: -0.629
Iter 45/100 - Loss: -0.570
Iter 46/100 - Loss: -0.578
Iter 47/100 - Loss: -0.728
Iter 48/100 - Loss: -0.787
Iter 49/100 - Loss: -0.186
Iter 50/100 - Loss: -0.532
Iter 51/100 - Loss: -0.850
Iter 52/100 - Loss: -0.914
Iter 53/100 - Loss: -0.879
Iter 54/100 - Loss: -0.815
Iter 55/100 - Loss: -0.804
Iter 56/100 - Loss: -0.808
Iter 57/100 - Loss: -0.850
Iter 58/100 - Loss: -0.939
Iter 59/100 - Loss: -1.047
Iter 60/100 - Loss: -1.128
Iter 61/100 - Loss: -1.181
Iter 62/100 - Loss: -1.210
Iter 63/100 - Loss: -1.181
Iter 64/100 - Loss: -1.044
Iter 65/100 - Loss: -0.988
Iter 66/100 - Loss: -1.113
Iter 67/100 - Loss: -1.085
Iter 68/100 - Loss: -1.284
Iter 69/100 - Loss: -1.252
Iter 70/100 - Loss: -1.305
Iter 71/100 - Loss: -1.318
Iter 72/100 - Loss: -1.300
Iter 73/100 - Loss: -1.312
Iter 74/100 - Loss: -1.365
Iter 75/100 - Loss: -1.418
Iter 76/100 - Loss: -1.484
Iter 77/100 - Loss: -1.564
Iter 78/100 - Loss: -1.599
Iter 79/100 - Loss: -1.678
Iter 80/100 - Loss: -1.731
Iter 81/100 - Loss: -1.793
Iter 82/100 - Loss: -1.790
Iter 83/100 - Loss: -1.801
Iter 84/100 - Loss: -1.832
Iter 85/100 - Loss: -1.916
Iter 86/100 - Loss: -1.974
Iter 87/100 - Loss: -2.030
Iter 88/100 - Loss: -2.108
Iter 89/100 - Loss: -2.166
Iter 90/100 - Loss: -2.152
Iter 91/100 - Loss: -2.119
Iter 92/100 - Loss: -2.088
Iter 93/100 - Loss: -2.101
Iter 94/100 - Loss: -2.174
Iter 95/100 - Loss: -2.247
Iter 96/100 - Loss: -2.223
Iter 97/100 - Loss: -1.789
Iter 98/100 - Loss: -2.284
Iter 99/100 - Loss: -2.083
Iter 100/100 - Loss: -2.164

Make predictions, and plot the model fit

Now that we've learned good hyperparameters, it's time to use our model to make predictions. The spectral mixture kernel is especially good at extrapolation. To that end, we'll see how well the model extrapolates past the interval [0, 1].

In the next cell, we plot the mean and confidence region of the Gaussian process model. The confidence_region method is a helper method that returns 2 standard deviations above and below the mean.


In [5]:
# Test points every 0.1 between 0 and 5
test_x = torch.linspace(0, 5, 51)

# Get into evaluation (predictive posterior) mode
model.eval()
likelihood.eval()

# The gpytorch.settings.fast_pred_var flag activates LOVE (for fast variances)
# See https://arxiv.org/abs/1803.06058
with torch.no_grad(), gpytorch.settings.fast_pred_var():
    # Make predictions
    observed_pred = likelihood(model(test_x))

    # Initialize plot
    f, ax = plt.subplots(1, 1, figsize=(4, 3))

    # Get upper and lower confidence bounds
    lower, upper = observed_pred.confidence_region()
    # Plot training data as black stars
    ax.plot(train_x.numpy(), train_y.numpy(), 'k*')
    # Plot predictive means as blue line
    ax.plot(test_x.numpy(), observed_pred.mean.numpy(), 'b')
    # Shade between the lower and upper confidence bounds
    ax.fill_between(test_x.numpy(), lower.numpy(), upper.numpy(), alpha=0.5)
    ax.set_ylim([-3, 3])
    ax.legend(['Observed Data', 'Mean', 'Confidence'])



In [ ]: