Overlapping Mixtures of Gaussian Processses

Valentine Svensson 2015
(with small edits by James Hensman November 2015)

This illustrates use of the OMGP model described in

Overlapping Mixtures of Gaussian Processes for the data association problem
Miguel Lázaro-Gredilla, Steven Van Vaerenbergh, Neil D. Lawrence
Pattern Recognition 2012

The GPclust implementation makes use of the collapsed variational mixture model for GP assignment.


In [33]:
%matplotlib inline
import GPy
from GPclust import OMGP
import matplotlib
matplotlib.rcParams['figure.figsize'] = (12,6)
from matplotlib import pyplot as plt

Diverging trend seperation

One application of the OMGP model could be to find diverging trends among populations over time. Imagine for example two species evolving from a common ancestor over time.

We load some pre-generated data which diverge over time.


In [2]:
XY = np.loadtxt('../data/split_data_test.csv', delimiter=',', skiprows=1, usecols=[1, 2])
X = XY[:, 0, None]
Y = XY[:, 1, None]

In [3]:
plt.scatter(X, Y);


We define a model assuming K = 2 trends. By default the model will be populated by K RBF kernels. The OMGP implementation is compatible with most kernels in GPy, so that you for example can encode periodicity in the model.


In [10]:
m = OMGP(X, Y, K=2, variance=0.01, prior_Z='DP')
m.log_likelihood()


Out[10]:
921.94730839800775

A simple plot function is included which illustrates the asignment probability for each data point, it also shows the posterior mean and confidence intervals for each Gaussian Process.


In [11]:
m.plot()


There is also a function for plotting the assignment probability for a given GP directly. Since we haven't optimized the mixture parameters yet the assignment probability is just a random draw from the prior.


In [12]:
m.plot_probs(gp_num=0)


We can first performa a quick optimization to find the rough trends.


In [13]:
m.optimize(step_length=0.01, maxiter=20)


iteration 20 bound=1106.26056208 grad=179.192598532, beta=0.631240148582 maxiter exceeded

In [14]:
m.plot()



In [15]:
m.plot_probs()


The model identifies the branches of the time series, and in particular the non-branched region have ambigous GP assignment. In this region the two trends share information for prediction.

Like any GPy model the hyper parameters can be inspected.


In [16]:
m


Out[16]:

Model: OMGP
Log-likelihood: 1106.26056208
Number of Parameters: 5
Number of Optimization Parameters: 5
Updates: True

OMGP. Value Constraint Prior Tied to
variance 0.01 +ve
rbf_1.variance 1.0 +ve
rbf_1.lengthscale 1.0 +ve
rbf.variance 1.0 +ve
rbf.lengthscale 1.0 +ve

We continue by letting the model optimize some more, and also allow it to optimize the hyper parameters. The hyper parameter optimization works best if the mixture parameters have converged or are close to converging.


In [17]:
m.optimize(step_length=0.01, maxiter=200)


iteration 200 bound=1588.80940759 grad=197.3883808, beta=0.0 maxiter exceeded
 :0: FutureWarning:IPython widgets are experimental and may change in the future.

In [18]:
m


Out[18]:

Model: OMGP
Log-likelihood: 1588.80940759
Number of Parameters: 5
Number of Optimization Parameters: 5
Updates: True

OMGP. Value Constraint Prior Tied to
variance 0.0014606243382 +ve
rbf_1.variance 1.02213521375 +ve
rbf_1.lengthscale 0.595341949099 +ve
rbf.variance 1.16851232834 +ve
rbf.lengthscale 0.448887117217 +ve

In [19]:
m.plot()



In [20]:
m.plot_probs()


Separating signal from noise

An interesting application of the OMGP model pointed out in the original publication is the use for robust GP regression.

Let's illustrate this by creating sinusoidal test data with background noise.


In [21]:
x1 = np.random.uniform(0, 10, (100, 1))
x2 = np.random.uniform(0, 10, (100, 1))

y1 = 4 * np.random.randn(*x1.shape)
y2 = 3 * np.sin(x2) + 0.5 * np.random.randn(*x2.shape)
x = np.vstack((x1, x2))
y = np.vstack((y1, y2))

In [23]:
plt.scatter(x, y);


First we make a model with only one mixture component / kernel. This is equivalent to normal GP regression.


In [24]:
kernels = [GPy.kern.RBF(1)]
m = OMGP(x, y, K=1, prior_Z='DP', kernels=kernels)
m.variance = 3
m.hyperparam_interval = 100
m.rbf.lengthscale = 2

In [25]:
m.optimize(verbose=False)

In [27]:
m.plot()


Now we in stead view this is a mixture problem, and consider two different kinds of kernels for the different GP components. One encoding white noise, and another which can encode a trend over time (an RBF kernel in this case).


In [28]:
kernels = [GPy.kern.White(1, name='Noise'), GPy.kern.RBF(1, name='Signal')]
m = OMGP(x, y, K=2, prior_Z='DP', kernels=kernels)
m.variance = 3
m.hyperparam_interval = 250
m.Signal.lengthscale = 2

In [29]:
m.plot(0)



In [30]:
m.optimize(step_length=0.01, verbose=False)

In [31]:
m


Out[31]:

Model: OMGP
Log-likelihood: -644.812245454
Number of Parameters: 4
Number of Optimization Parameters: 4
Updates: True

OMGP. Value Constraint Prior Tied to
variance 0.322524350505 +ve
Noise.variance 8.84138047261 +ve
Signal.variance 3.02502159938 +ve
Signal.lengthscale 1.50887475231 +ve

In [32]:
m.plot()


The trend over time is much more noticable, and the confidence intervals are smaller.

Noisy points will have high assignment probability to the 'noise GP', while the assignment of the sinusoidal points is ambiguous. We can use this to seperate the points which are more likely to be noise from the remaining points.


In [34]:
m.plot_probs(0)
plt.axhline(0.75);



In [36]:
thr = 0.75
idx = np.where(m.phi[:,0] < thr)[0]
nidx = np.where(m.phi[:,0] >= thr)[0]

In [37]:
plt.figure(figsize=(12,10))
plt.subplot(211)
plt.scatter(x[idx], y[idx]);
plt.title('Signal')

plt.subplot(212, sharey=plt.gca())
plt.scatter(x[nidx], y[nidx]);
plt.title('Noise');



In [ ]: