This notebook demonstrates how to perform "Hadamard" multitask regression with kernels.IndexKernel.
This differs from the multitask gp regression example notebook in one key way:
Multitask regression, first introduced in this paper learns similarities in the outputs simultaneously. It's useful when you are performing regression on multiple functions that share the same inputs, especially if they have similarities (such as being sinusodial).
Given inputs $x$ and $x'$, and tasks $i$ and $j$, the covariance between two datapoints and two tasks is given by
\begin{equation*} k([x, i], [x', j]) = k_\text{inputs}(x, x') * k_\text{tasks}(i, j) \end{equation*}where $k_\text{inputs}$ is a standard kernel (e.g. RBF) that operates on the inputs.
$k_\text{task)$ is a special kernel - the IndexKernel
- which is a lookup table containing inter-task covariance.
In [1]:
import math
import torch
import gpytorch
from matplotlib import pyplot as plt
%matplotlib inline
%load_ext autoreload
%autoreload 2
In [2]:
train_x = torch.linspace(0, 1, 100)
train_y1 = torch.sin(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2
train_y2 = torch.cos(train_x * (2 * math.pi)) + torch.randn(train_x.size()) * 0.2
The model should be somewhat similar to the ExactGP
model in the simple regression example.
The differences:
In [3]:
class MultitaskGPModel(gpytorch.models.ExactGP):
def __init__(self, train_x, train_y, likelihood):
super(MultitaskGPModel, self).__init__(train_x, train_y, likelihood)
self.mean_module = gpytorch.means.ConstantMean()
self.covar_module = gpytorch.kernels.RBFKernel()
# We learn an IndexKernel for 2 tasks
# (so we'll actually learn 2x2=4 tasks with correlations)
self.task_covar_module = gpytorch.kernels.IndexKernel(num_tasks=2, rank=1)
def forward(self,x,i):
mean_x = self.mean_module(x)
# Get input-input covariance
covar_x = self.covar_module(x)
# Get task-task covariance
covar_i = self.task_covar_module(i)
# Multiply the two together to get the covariance we want
covar = covar_x.mul(covar_i)
return gpytorch.distributions.MultivariateNormal(mean_x, covar)
likelihood = gpytorch.likelihoods.GaussianLikelihood()
# Here we want outputs for every input and task
# This is not the most efficient model for this: it's better to use the model in the ./Multitask_GP_Regression.ipynb notebook
# Since we are learning two tasks we feed in the x_data twice, along with the
# y_data along with its indices
train_i_task1 = torch.full_like(train_x, dtype=torch.long, fill_value=0)
train_i_task2 = torch.full_like(train_x, dtype=torch.long, fill_value=1)
full_train_x = torch.cat([train_x, train_x])
full_train_i = torch.cat([train_i_task1, train_i_task2])
full_train_y = torch.cat([train_y1, train_y2])
# Here we have two iterms that we're passing in as train_inputs
model = MultitaskGPModel((full_train_x, full_train_i), full_train_y, likelihood)
In the next cell, we handle using Type-II MLE to train the hyperparameters of the Gaussian process.
The spectral mixture kernel's hyperparameters start from what was specified in initialize_from_data
.
See the simple regression example for more info on this step.
In [4]:
# Find optimal model hyperparameters
model.train()
likelihood.train()
# Use the adam optimizer
optimizer = torch.optim.Adam([
{'params': model.parameters()}, # Includes GaussianLikelihood parameters
], lr=0.1)
# "Loss" for GPs - the marginal log likelihood
mll = gpytorch.mlls.ExactMarginalLogLikelihood(likelihood, model)
for i in range(50):
optimizer.zero_grad()
output = model(full_train_x, full_train_i)
loss = -mll(output, full_train_y)
loss.backward()
print('Iter %d/50 - Loss: %.3f' % (i + 1, loss.item()))
optimizer.step()
In [5]:
# Set into eval mode
model.eval()
likelihood.eval()
# Initialize plots
f, (y1_ax, y2_ax) = plt.subplots(1, 2, figsize=(8, 3))
# Test points every 0.02 in [0,1]
test_x = torch.linspace(0, 1, 51)
tast_i_task1 = torch.full_like(test_x, dtype=torch.long, fill_value=0)
test_i_task2 = torch.full_like(test_x, dtype=torch.long, fill_value=1)
# Make predictions - one task at a time
# We control the task we cae about using the indices
# The gpytorch.settings.fast_pred_var flag activates LOVE (for fast variances)
# See https://arxiv.org/abs/1803.06058
with torch.no_grad(), gpytorch.settings.fast_pred_var():
observed_pred_y1 = likelihood(model(test_x, tast_i_task1))
observed_pred_y2 = likelihood(model(test_x, test_i_task2))
# Define plotting function
def ax_plot(ax, train_y, rand_var, title):
# Get lower and upper confidence bounds
lower, upper = rand_var.confidence_region()
# Plot training data as black stars
ax.plot(train_x.detach().numpy(), train_y.detach().numpy(), 'k*')
# Predictive mean as blue line
ax.plot(test_x.detach().numpy(), rand_var.mean.detach().numpy(), 'b')
# Shade in confidence
ax.fill_between(test_x.detach().numpy(), lower.detach().numpy(), upper.detach().numpy(), alpha=0.5)
ax.set_ylim([-3, 3])
ax.legend(['Observed Data', 'Mean', 'Confidence'])
ax.set_title(title)
# Plot both tasks
ax_plot(y1_ax, train_y1, observed_pred_y1, 'Observed Values (Likelihood)')
ax_plot(y2_ax, train_y2, observed_pred_y2, 'Observed Values (Likelihood)')
In [ ]: