In [1]:
%matplotlib inline
In [2]:
%config InlineBackend.figure_format = 'retina'
In [3]:
from matplotlib import rcParams
rcParams["savefig.dpi"] = 100
rcParams["figure.dpi"] = 100
rcParams["figure.figsize"] = 12, 4
rcParams["font.size"] = 16
rcParams["text.usetex"] = False
rcParams["font.family"] = ["sans-serif"]
rcParams["font.sans-serif"] = ["cmss10"]
rcParams["axes.unicode_minus"] = False
In [4]:
# https://github.com/matplotlib/matplotlib/issues/12039
try:
old_get_unicode_index
except NameError:
print('Patching matplotlib.mathtext.get_unicode_index')
import matplotlib.mathtext as mathtext
old_get_unicode_index = mathtext.get_unicode_index
mathtext.get_unicode_index = lambda symbol, math=True:\
ord('-') if symbol == '-' else old_get_unicode_index(symbol, math)
Let's time how long our custom implementation of a GP takes for a rather long dataset. Create a time array of 10,000
points between 0 and 10 and time how long it takes to sample the prior of the GP for the default kernel parameters (unit amplitude and timescale). Add a bit of noise to the sample and then time how long it takes to evaluate the log likelihood for the dataset. Make sure to store the value of the log likelihood for later.
In [5]:
import numpy as np
from scipy.linalg import cho_factor
def ExpSquaredKernel(t1, t2=None, A=1.0, l=1.0):
"""
Return the ``N x M`` exponential squared
covariance matrix between time vectors `t1`
and `t2`. The kernel has amplitude `A` and
lengthscale `l`.
"""
if t2 is None:
t2 = t1
T2, T1 = np.meshgrid(t2, t1)
return A ** 2 * np.exp(-0.5 * (T1 - T2) ** 2 / l ** 2)
def ln_gp_likelihood(t, y, sigma=0, A=1.0, l=1.0):
"""
Return the log of the GP likelihood of the
data `y(t)` given uncertainty `sigma` and
an Exponential Squared Kernel with amplitude `A`
and length scale `sigma`.
"""
# The covariance and its determinant
npts = len(t)
kernel = ExpSquaredKernel
K = kernel(t, A=A, l=l) + sigma ** 2 * np.eye(npts)
# The marginal log likelihood
log_like = -0.5 * np.dot(y.T, np.linalg.solve(K, y))
log_like -= 0.5 * np.linalg.slogdet(K)[1]
log_like -= 0.5 * npts * np.log(2 * np.pi)
return log_like
def draw_from_gaussian(mu, S, ndraws=1, eps=1e-12):
"""
Generate samples from a multivariate gaussian
specified by covariance ``S`` and mean ``mu``.
(We derived these equations in Day 1, Notebook 01, Exercise 7.)
"""
npts = S.shape[0]
L, _ = cho_factor(S + eps * np.eye(npts), lower=True)
L = np.tril(L)
u = np.random.randn(npts, ndraws)
x = np.dot(L, u) + mu[:, None]
return x.T
def compute_gp(t_train, y_train, t_test, sigma=0, A=1.0, l=1.0):
"""
Compute the mean vector and covariance matrix of a GP
at times `t_test` given training points `y_train(t_train)`.
The training points have uncertainty `sigma` and the
kernel is assumed to be an Exponential Squared Kernel
with amplitude `A` and lengthscale `l`.
"""
# Compute the required matrices
kernel = ExpSquaredKernel
Stt = kernel(t_train, A=1.0, l=1.0)
Stt += sigma ** 2 * np.eye(Stt.shape[0])
Spp = kernel(t_test, A=1.0, l=1.0)
Spt = kernel(t_test, t_train, A=1.0, l=1.0)
# Compute the mean and covariance of the GP
mu = np.dot(Spt, np.linalg.solve(Stt, y_train))
S = Spp - np.dot(Spt, np.linalg.solve(Stt, Spt.T))
return mu, S
In [6]:
%%time
np.random.seed(3)
t = np.linspace(0, 10, 10000)
sigma = np.ones_like(t) * 0.05
gp_mu, gp_S = compute_gp([], [], t, A=1.0, l=1.0)
y = draw_from_gaussian(gp_mu, gp_S)[0] + sigma * np.random.randn(len(t))
In [7]:
%%time
ln_gp_likelihood(t, y, sigma)
Out[7]:
Let's time how long it takes to do the same operations using the george
package (pip install george
).
The kernel we'll use is
kernel = amp ** 2 * george.kernels.ExpSquaredKernel(tau ** 2)
where amp = 1
and tau = 1
in this case.
To instantiate a GP using george
, simply run
gp = george.GP(kernel)
The george
package pre-computes a lot of matrices that are re-used in different operations, so before anything else, ask it to compute the GP model for your timeseries:
gp.compute(t, sigma)
Note that we've only given it the time array and the uncertainties, so as long as those remain the same, you don't have to re-compute anything. This will save you a lot of time in the long run!
Finally, the log likelihood is given by gp.log_likelihood(y)
and a sample can be drawn by calling gp.sample()
.
How do the speeds compare? Did you get the same value of the likelihood (assuming you computed it for the same sample in both cases)?
In [8]:
import george
In [9]:
%%time
kernel = george.kernels.ExpSquaredKernel(1.0)
gp = george.GP(kernel)
gp.compute(t, sigma)
In [10]:
%%time
print(gp.log_likelihood(y))
In [11]:
%%time
gp.sample()
Out[11]:
george
offers a fancy GP solver called the HODLR solver, which makes some approximations that dramatically speed up the matrix algebra. Instantiate the GP object again by passing the keyword solver=george.HODLRSolver
and re-compute the log likelihood. How long did that take?
(I wasn't able to draw samples using the HODLR solver; unfortunately this may not be implemented.)
In [12]:
%%time
gp = george.GP(kernel, solver=george.HODLRSolver)
gp.compute(t, sigma)
In [13]:
%%time
gp.log_likelihood(y)
Out[13]:
The george
package is super useful for GP modeling, and I recommend you read over the docs and examples. It implements several different kernels that come in handy in different situations, and it has support for multi-dimensional GPs. But if all you care about are GPs in one dimension (in this case, we're only doing GPs in the time domain, so we're good), then celerite
is what it's all about:
pip install celerite
Check out the docs here, as well as several tutorials. There is also a paper that discusses the math behind celerite
. The basic idea is that for certain families of kernels, there exist extremely efficient methods of factorizing the covariance matrices. Whereas GP fitting typically scales with the number of datapoints $N$ as $N^3$, celerite
is able to do everything in order $N$ (!!!) This is a huge advantage, especially for datasets with tens or hundreds of thousands of data points. Using george
or any homebuilt GP model for datasets larger than about 10,000
points is simply intractable, but with celerite
you can do it in a breeze.
Repeat the timing tests, but this time using celerite
. Note that the Exponential Squared Kernel is not available in celerite
, because it doesn't have the special form needed to make its factorization fast. Instead, use the Matern 3/2
kernel, which is qualitatively similar, and which can be approximated quite well in terms of the celerite
basis functions:
kernel = celerite.terms.Matern32Term(np.log(1), np.log(1))
Note that celerite
accepts the log of the amplitude and the log of the timescale. Other than this, you should be able to compute the likelihood and draw a sample with the same syntax as george
.
How much faster did it run?
In [14]:
import celerite
from celerite import terms
In [15]:
%%time
kernel = terms.Matern32Term(np.log(1), np.log(1))
gp = celerite.GP(kernel)
gp.compute(t, sigma)
In [16]:
%%time
gp.log_likelihood(y)
Out[16]:
In [17]:
%%time
gp.sample()
Out[17]:
Let's use celerite
for a real application: fitting an exoplanet transit model in the presence of correlated noise.
Below is a (fictitious) light curve for a star with a transiting planet. There is a transit visible to the eye at $t = 0$, which (say) is when you'd expect the planet to transit if its orbit were perfectly periodic. However, a recent paper claims that the planet shows transit timing variations, which are indicative of a second, perturbing planet in the system, and that a transit at $t = 0$ can be ruled out at 3 $\sigma$. Your task is to verify this claim.
Assume you have no prior information on the planet other than the transit occurs in the observation window, the depth of the transit is somewhere in the range $(0, 1)$, and the transit duration is somewhere between $0.1$ and $1$ day. You don't know the exact process generating the noise, but you are certain that there's correlated noise in the dataset, so you'll have to pick a reasonable kernel and estimate its hyperparameters.
Fit the transit with a simple inverted Gaussian with three free parameters:
def transit_shape(depth, t0, dur):
return -depth * np.exp(-0.5 * (t - t0) ** 2 / (0.2 * dur) ** 2)
Read the celerite docs to figure out how to solve this problem efficiently.
HINT: I borrowed heavily from this tutorial, so you might want to take a look at it...
In [18]:
import matplotlib.pyplot as plt
from celerite.modeling import Model
import os
# Define the model
class MeanModel(Model):
parameter_names = ("depth", "t0", "dur")
def get_value(self, t):
return -self.depth * np.exp(-0.5 * (t - self.t0) ** 2 / (0.2 * self.dur) ** 2)
mean_model = MeanModel(depth=0.5, t0=0.05, dur=0.7)
mean_model.parameter_bounds = [(0, 1.0), (-0.1, 0.4), (0.1, 1.0)]
true_params = mean_model.get_parameter_vector()
# Simuate the data
np.random.seed(71)
x = np.sort(np.random.uniform(-1, 1, 70))
yerr = np.random.uniform(0.075, 0.1, len(x))
K = 0.2 * np.exp(-0.5 * (x[:, None] - x[None, :]) ** 2 / 10.5)
K[np.diag_indices(len(x))] += yerr ** 2
y = np.random.multivariate_normal(mean_model.get_value(x), K)
y -= np.nanmedian(y)
# Plot the data
plt.errorbar(x, y, yerr=yerr, fmt=".k", capsize=0)
t = np.linspace(-1, 1, 1000)
plt.plot(t, mean_model.get_value(t))
plt.ylabel(r"$y$")
plt.xlabel(r"$t$")
plt.xlim(-1, 1)
plt.gca().yaxis.set_major_locator(plt.MaxNLocator(5))
plt.title("simulated data");
# Save it
X = np.hstack((x.reshape(-1, 1), y.reshape(-1, 1), yerr.reshape(-1, 1)))
if not (os.path.exists("data")):
os.mkdir("data")
np.savetxt("data/sample_transit.txt", X)
In [19]:
import matplotlib.pyplot as plt
t, y, yerr = np.loadtxt("data/sample_transit.txt", unpack=True)
plt.errorbar(x, y, yerr=yerr, fmt=".k", capsize=0)
plt.xlabel("time")
plt.ylabel("relative flux");
In [20]:
from celerite.modeling import Model
from scipy.optimize import minimize
# Define the transit model as a celerite `Model`
class MeanModel(Model):
parameter_names = ("depth", "t0", "dur")
def get_value(self, t):
return -self.depth * np.exp(-0.5 * (t - self.t0) ** 2 / (0.2 * self.dur) ** 2)
# Instantiate it with some guesses (which are actually the true values in this case!)
mean_model = MeanModel(depth=0.5, t0=0.05, dur=0.7)
mean_model.parameter_bounds = [(0, 1.0), (-0.1, 0.4), (0.1, 1.0)]
true_params = mean_model.get_parameter_vector()
# Set up the GP model
kernel = terms.RealTerm(log_a=np.log(np.var(y)), log_c=0)
gp = celerite.GP(kernel, mean=mean_model, fit_mean=True)
gp.compute(x, yerr)
print("Initial log-likelihood: {0}".format(gp.log_likelihood(y)))
# Define a cost function
def neg_log_like(params, y, gp):
gp.set_parameter_vector(params)
return -gp.log_likelihood(y)
def grad_neg_log_like(params, y, gp):
gp.set_parameter_vector(params)
return -gp.grad_log_likelihood(y)[1]
# Fit for the maximum likelihood parameters
initial_params = gp.get_parameter_vector()
bounds = gp.get_parameter_bounds()
soln = minimize(neg_log_like, initial_params,
method="L-BFGS-B", bounds=bounds, args=(y, gp))
gp.set_parameter_vector(soln.x)
print("Final log-likelihood: {0}".format(-soln.fun))
# Make the maximum likelihood prediction
t = np.linspace(-1, 1, 500)
mu, var = gp.predict(y, t, return_var=True)
std = np.sqrt(var)
# Plot the data
color = "#ff7f0e"
plt.errorbar(x, y, yerr=yerr, fmt=".k", capsize=0)
plt.plot(t, mu, color=color)
plt.fill_between(t, mu+std, mu-std, color=color, alpha=0.3, edgecolor="none")
plt.ylabel(r"$y$")
plt.xlabel(r"$t$")
plt.xlim(-1, 1)
plt.gca().yaxis.set_major_locator(plt.MaxNLocator(5))
plt.title("maximum likelihood prediction");
In [21]:
def log_probability(params):
gp.set_parameter_vector(params)
lp = gp.log_prior()
if not np.isfinite(lp):
return -np.inf
try:
return gp.log_likelihood(y) + lp
except celerite.solver.LinAlgError:
return -np.inf
In [22]:
import emcee
initial = np.array(soln.x)
ndim, nwalkers = len(initial), 32
sampler = emcee.EnsembleSampler(nwalkers, ndim, log_probability)
print("Running burn-in...")
p0 = initial + 1e-8 * np.random.randn(nwalkers, ndim)
p0, lp, _ = sampler.run_mcmc(p0, 1000)
print("Running production...")
sampler.reset()
sampler.run_mcmc(p0, 2000);
In [23]:
# Plot the data.
plt.errorbar(x, y, yerr=yerr, fmt=".k", capsize=0)
# Plot 24 posterior samples.
samples = sampler.flatchain
for s in samples[np.random.randint(len(samples), size=24)]:
gp.set_parameter_vector(s)
mu = gp.predict(y, t, return_cov=False)
plt.plot(t, mu, color=color, alpha=0.3)
plt.ylabel(r"$y$")
plt.xlabel(r"$t$")
plt.xlim(-1, 1)
plt.gca().yaxis.set_major_locator(plt.MaxNLocator(5))
plt.title("posterior predictions");
In [24]:
import corner
names = gp.get_parameter_names()
cols = mean_model.get_parameter_names()
inds = np.array([names.index("mean:"+k) for k in cols])
corner.corner(sampler.flatchain[:, inds], truths=true_params,
labels=[r"depth", r"$t_0$", r"dur"]);
The transit time is inconsistent with 0 at about 3 sigma, so the claim that TTVs are present is probably true. But remember we're making strong assumptions about the nature of the correlated noise (we assumed a specific kernel), so in reality our uncertainty should be a bit higher.