Valentine Svensson 2015
(with small edits by James Hensman November 2015)
This illustrates use of the OMGP model described in
Overlapping Mixtures of Gaussian Processes for the data association problem
Miguel Lázaro-Gredilla, Steven Van Vaerenbergh, Neil D. Lawrence
Pattern Recognition 2012
The GPclust implementation makes use of the collapsed variational mixture model for GP assignment.
In [33]:
%matplotlib inline
import GPy
from GPclust import OMGP
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12,6)
from matplotlib import pyplot as plt
In [2]:
XY = np.loadtxt('../data/split_data_test.csv', delimiter=',', skiprows=1, usecols=[1, 2])
X = XY[:, 0, None]
Y = XY[:, 1, None]
In [3]:
plt.scatter(X, Y);
We define a model assuming K = 2 trends. By default the model will be populated by K RBF kernels. The OMGP implementation is compatible with most kernels in GPy, so that you for example can encode periodicity in the model.
In [10]:
m = OMGP(X, Y, K=2, variance=0.01, prior_Z='DP')
m.log_likelihood()
Out[10]:
A simple plot function is included which illustrates the asignment probability for each data point, it also shows the posterior mean and confidence intervals for each Gaussian Process.
In [11]:
m.plot()
There is also a function for plotting the assignment probability for a given GP directly. Since we haven't optimized the mixture parameters yet the assignment probability is just a random draw from the prior.
In [12]:
m.plot_probs(gp_num=0)
We can first performa a quick optimization to find the rough trends.
In [13]:
m.optimize(step_length=0.01, maxiter=20)
In [14]:
m.plot()
In [15]:
m.plot_probs()
The model identifies the branches of the time series, and in particular the non-branched region have ambigous GP assignment. In this region the two trends share information for prediction.
Like any GPy model the hyper parameters can be inspected.
In [16]:
m
Out[16]:
We continue by letting the model optimize some more, and also allow it to optimize the hyper parameters. The hyper parameter optimization works best if the mixture parameters have converged or are close to converging.
In [17]:
m.optimize(step_length=0.01, maxiter=200)
In [18]:
m
Out[18]:
In [19]:
m.plot()
In [20]:
m.plot_probs()
In [21]:
x1 = np.random.uniform(0, 10, (100, 1))
x2 = np.random.uniform(0, 10, (100, 1))
y1 = 4 * np.random.randn(*x1.shape)
y2 = 3 * np.sin(x2) + 0.5 * np.random.randn(*x2.shape)
x = np.vstack((x1, x2))
y = np.vstack((y1, y2))
In [23]:
plt.scatter(x, y);
First we make a model with only one mixture component / kernel. This is equivalent to normal GP regression.
In [24]:
kernels = [GPy.kern.RBF(1)]
m = OMGP(x, y, K=1, prior_Z='DP', kernels=kernels)
m.variance = 3
m.hyperparam_interval = 100
m.rbf.lengthscale = 2
In [25]:
m.optimize(verbose=False)
In [27]:
m.plot()
Now we in stead view this is a mixture problem, and consider two different kinds of kernels for the different GP components. One encoding white noise, and another which can encode a trend over time (an RBF kernel in this case).
In [28]:
kernels = [GPy.kern.White(1, name='Noise'), GPy.kern.RBF(1, name='Signal')]
m = OMGP(x, y, K=2, prior_Z='DP', kernels=kernels)
m.variance = 3
m.hyperparam_interval = 250
m.Signal.lengthscale = 2
In [29]:
m.plot(0)
In [30]:
m.optimize(step_length=0.01, verbose=False)
In [31]:
m
Out[31]:
In [32]:
m.plot()
The trend over time is much more noticable, and the confidence intervals are smaller.
Noisy points will have high assignment probability to the 'noise GP', while the assignment of the sinusoidal points is ambiguous. We can use this to seperate the points which are more likely to be noise from the remaining points.
In [34]:
m.plot_probs(0)
plt.axhline(0.75);
In [36]:
thr = 0.75
idx = np.where(m.phi[:,0] < thr)[0]
nidx = np.where(m.phi[:,0] >= thr)[0]
In [37]:
plt.figure(figsize=(12,10))
plt.subplot(211)
plt.scatter(x[idx], y[idx]);
plt.title('Signal')
plt.subplot(212, sharey=plt.gca())
plt.scatter(x[nidx], y[nidx]);
plt.title('Noise');
In [ ]: