Alexis Boukouvalas, 2017
Branching GP regression with Gaussian noise on the hematopoiesis data described in the paper "BGP: Gaussian processes for identifying branching dynamics in single cell data".
This notebook shows how to build a BGP model and plot the posterior model fit and posterior branching times.
In [1]:
import pandas as pd
import numpy as np
import pickle
from matplotlib import pyplot as plt
from BranchedGP import VBHelperFunctions as bplot
plt.style.use('ggplot')
%matplotlib inline
In [2]:
datafile='syntheticdata/synthetic20.csv'
data = pd.read_csv(datafile, index_col=[0])
G = data.shape[1] - 2 # all data - time columns - state column
Y = data.iloc[:, 2:]
trueBranchingTimes = np.array([float(Y.columns[i][-3:]) for i in range(G)])
In [3]:
data.head()
Out[3]:
In [4]:
f, ax = plt.subplots(5, 8, figsize=(10, 8))
ax = ax.flatten()
for i in range(G):
for s in np.unique(data['MonocleState']):
idxs = (s == data['MonocleState'].values)
ax[i].scatter(data['Time'].loc[idxs], Y.iloc[:, i].loc[idxs])
ax[i].set_title(Y.columns[i])
ax[i].set_yticklabels([])
ax[i].set_xticklabels([])
f.suptitle('Branching genes, location=1.1 indicates no branching')
Out[4]:
In [5]:
r = pickle.load(open('syntheticdata/syntheticDataRun.p', "rb"))
In [6]:
r.keys()
Out[6]:
In [7]:
# plot fit for a gene
g = 0
GPy = Y.iloc[:, g][:, None]
GPt = data['Time'].values
globalBranching = data['MonocleState'].values.astype(int)
bmode = r['Bsearch'][np.argmax(r['gpmodels'][g]['loglik'])]
print('True branching time', trueBranchingTimes[g], 'BGP Maximum at b=%.2f' % bmode)
_=bplot.PlotBGPFit(GPy, GPt, r['Bsearch'], r['gpmodels'][g])
We can also plot with the predictive uncertainty of the GP. The dashed lines are the 95% confidence intervals.
In [8]:
g = 0
bmode = r['Bsearch'][np.argmax(r['gpmodels'][g]['loglik'])]
pred = r['gpmodels'][g]['prediction'] # prediction object from GP
_=bplot.plotBranchModel(bmode, GPt, GPy, pred['xtest'], pred['mu'], pred['var'],
r['gpmodels'][g]['Phi'], fPlotPhi=True, fColorBar=True,
fPlotVar = True)
In [9]:
fs, ax = plt.subplots(1, 1, figsize=(5, 5))
for g in range(G):
bmode = r['Bsearch'][np.argmax(r['gpmodels'][g]['loglik'])]
ax.scatter(bmode, g, s=100, color='b') # BGP mode
ax.scatter(trueBranchingTimes[g]+0.05, g, s=100, color='k') # True
In [ ]: