This is a first part of a set of tutorials covering Bayesian exoplanet characterisation using wide-band photometry (transit light curves), radial velocities, and, later, transmission spectroscopy. The tutorials use freely available open source tools built around the scientific python ecosystem, and also demonstrate the use of PyTransit
and LDTk
.
The tutorials are mainly targeted to graduate students working in exoplanet characterisation who already have some experience of Python and Bayesian statistics.
This tutorial requires the basic Python packages for scientific computing and data analysis
The MCMC sampling requires the emcee and acor packages by D. Foreman-Mackey. These can either be installed from github or from PyPI.
The transit modelling is carried out with PyTransit, and global optimisation with PyDE. PyTransit can be installed easily from github. First cd
into the directory you want to clone the code, and then
git clone https://github.com/hpparvi/PyTransit.git; cd PyTransit
python setup.py install; cd -
What comes to assumed prior knowledge, well... I assume you already know a bit about Bayesian statistics (I'll start with a very rough overview of the basics of Bayesian parameter estimation, though), Python (and especially the scientific packages), have a rough idea of MCMC sampling (and how emcee works), and, especially, have the grasp of basic concepts of exoplanets, transits, and photometry.
This first tutorial covers the simple case of an exoplanet system characterisation based on a single photometric timeseries of an exoplanet transit (transit light curve). The system characterisation is a parameter estimation problem, where we assume we have an adequate model to describe the observations, and we want to infer the model parameters with their uncertainties.
We take a Bayesian approach to the parameter estimation, where we want to estimate the posterior probability for the model parameters given their prior probabilities and a set of observations. The posterior probability density given a parameter vector $\theta$ and observational data $D$ is described by the Bayes' theorem as
$$ P(\theta|D) = \frac{P(\theta) P(D|\theta)}{P(D)}, \qquad P(D|\theta) = \prod P(D_i|\theta), $$where $P(\theta)$ is the prior, $P(D|\theta)$ is the likelihood for the data, and $P(D)$ is a normalising factor we don't need to bother with during MCMC-based parameter estimation.
The likelihood is a product of individual observation probabilities, and has the unfortunate tendency to end up being either very small or very big. This causes computational headaches, and it is better to work with log probabilities instead, so that
$$ \log P(\theta|D) = \log P(\theta) + \log P(D|\theta), \qquad \log P(D|\theta) = \sum \log P(D_i|\theta) $$where we have omitted the $P(D)$ term from the posterior density.
Now we still need to decide our likelihood density. If we can assume normally distributed white noise--that is, the errors in the observations are independent and identically distributed--we end up with a log likelihood function
$$ \log P(D|\theta) = -N\log(\sigma) -\frac{N\log 2\pi}{2} - \sum_{i=0}^N \frac{(o_i-m_i)^2}{2\sigma^2}, $$where $N$ is the number of datapoints, $\sigma$ is the white noise standard deviation, $o$ is the observed data, and $m$ is the model.
Note: Unfortunately, the noise is rarely white, but contains systematic components from instrumental and astrophysical sources that should be accounted for by the noise model for robust parameter estimation. This, however, goes beyond a basic tutorial.
In [1]:
%pylab inline
In [55]:
import math as mt
import pandas as pd
import seaborn as sb
with warnings.catch_warnings():
cp = sb.color_palette()
from pathlib import Path
from IPython.display import display, HTML
from numba import njit, prange
from astropy.io import fits as pf
from emcee import EnsembleSampler
from tqdm.auto import tqdm
from corner import corner
from pytransit import QuadraticModel
from pytransit.utils.de import DiffEvol
from pytransit.orbits.orbits_py import as_from_rhop, i_from_ba
from pytransit.param.parameter import (ParameterSet, GParameter, PParameter, LParameter,
NormalPrior as NP,
UniformPrior as UP)
seterr('ignore')
random.seed(0)
In [3]:
@njit(parallel=True, cache=False, fastmath=True)
def lnlike_normal_v(o, m, e):
m = atleast_2d(m)
npv = m.shape[0]
npt = o.size
lnl = zeros(npv)
for i in prange(npv):
for j in range(npt):
lnl[i] += -log(e[i]) - 0.5*log(2*pi) - 0.5*((o[j]-m[i,j])/e[i])**2
return lnl
The log posterior function is the workhorse of the analysis. I implement it as a class that stores the observation data and the priors, contains the methods to calculate the model and evaluate the log posterior probability density, and encapsulates the optimisation and MCMC sampling routines.
In [44]:
class LPFunction:
def __init__(self, name: str, times: ndarray = None, fluxes: ndarray = None):
self.tm = QuadraticModel(klims=(0.05, 0.25), nk=512, nz=512)
# LPF name
# --------
self.name = name
# Declare high-level objects
# --------------------------
self.ps = None # Parametrisation
self.de = None # Differential evolution optimiser
self.sampler = None # MCMC sampler
# Initialize data
# ---------------
self.times = asarray(times)
self.fluxes = asarray(fluxes)
self.tm.set_data(self.times)
# Define the parametrisation
# --------------------------
self.ps = ParameterSet([
GParameter('tc', 'zero_epoch', 'd', NP(0.0, 0.1), (-inf, inf)),
GParameter('pr', 'period', 'd', NP(1.0, 1e-5), (0, inf)),
GParameter('rho', 'stellar_density', 'g/cm^3', UP(0.1, 25.0), (0, inf)),
GParameter('b', 'impact_parameter', 'R_s', UP(0.0, 1.0), (0, 1)),
GParameter('k2', 'area_ratio', 'A_s', UP(0.05**2, 0.25**2), (0.05**2, 0.25**2)),
GParameter('q1', 'q1_coefficient', '', UP(0, 1), bounds=(0, 1)),
GParameter('q2', 'q2_coefficient', '', UP(0, 1), bounds=(0, 1)),
GParameter('loge', 'log10_error', '', UP(-4, 0), bounds=(-4, 0))])
self.ps.freeze()
def create_pv_population(self, npop=50):
return self.ps.sample_from_prior(npop)
def baseline(self, pv):
"""Multiplicative baseline"""
return 1.
def transit_model(self, pv, copy=True):
pv = atleast_2d(pv)
# Map from sampling parametrisation to the transit model parametrisation
# ----------------------------------------------------------------------
pvt = zeros((pv.shape[0], 7))
pvt[:,0] = sqrt(pv[:,4])
pvt[:,1:3] = pv[:,0:2]
pvt[:, 3] = as_from_rhop(pv[:,2], pv[:,1])
pvt[:, 4] = i_from_ba(pv[:,3], pvt[:,3])
# Map the limb darkening
# ----------------------
ldc = zeros((pv.shape[0],2))
a, b = sqrt(pv[:,5]), 2.*pv[:,6]
ldc[:,0] = a * b
ldc[:,1] = a * (1. - b)
return self.tm.evaluate_pv(pvt, ldc)
def flux_model(self, pv):
return self.transit_model(pv) * self.baseline(pv)
def residuals(self, pv):
return self.fluxes - self.flux_model(pv)
def set_prior(self, pid: int, prior) -> None:
self.ps[pid].prior = prior
def lnprior(self, pv):
return self.ps.lnprior(pv)
def lnlikelihood(self, pv):
flux_m = self.flux_model(pv)
wn = 10**(atleast_2d(pv)[:, 7])
return lnlike_normal_v(self.fluxes, flux_m, wn)
def lnposterior(self, pv):
lnp = self.lnprior(pv) + self.lnlikelihood(pv)
return where(isfinite(lnp), lnp, -inf)
def __call__(self, pv):
return self.lnposterior(pv)
def optimize(self, niter=200, npop=50, population=None, label='Global optimisation', leave=False):
if self.de is None:
self.de = DiffEvol(self.lnposterior, clip(self.ps.bounds, -1, 1), npop, maximize=True, vectorize=True)
if population is None:
self.de._population[:, :] = self.create_pv_population(npop)
else:
self.de._population[:,:] = population
for _ in tqdm(self.de(niter), total=niter, desc=label, leave=leave):
pass
def sample(self, niter=500, thin=5, label='MCMC sampling', reset=True, leave=True):
if self.sampler is None:
self.sampler = EnsembleSampler(self.de.n_pop, self.de.n_par, self.lnposterior, vectorize=True)
pop0 = self.de.population
else:
pop0 = self.sampler.chain[:,-1,:].copy()
if reset:
self.sampler.reset()
for _ in tqdm(self.sampler.sample(pop0, iterations=niter, thin=thin), total=niter, desc=label, leave=False):
pass
def posterior_samples(self, burn: int=0, thin: int=1):
fc = self.sampler.chain[:, burn::thin, :].reshape([-1, self.de.n_par])
return pd.DataFrame(fc, columns=self.ps.names)
def plot_mcmc_chains(self, pid: int=0, alpha: float=0.1, thin: int=1, ax=None):
fig, ax = (None, ax) if ax is not None else subplots()
ax.plot(self.sampler.chain[:, ::thin, pid].T, 'k', alpha=alpha)
fig.tight_layout()
return fig
def plot_light_curve(self, model: str = 'de', figsize: tuple = (13, 4)):
fig, ax = subplots(figsize=figsize, constrained_layout=True)
cp = sb.color_palette()
if model == 'de':
pv = self.de.minimum_location
err = 10**pv[7]
elif model == 'mc':
fc = array(self.posterior_samples())
pv = permutation(fc)[:300]
err = 10**median(pv[:, 7], 0)
ax.errorbar(self.times, self.fluxes, err, fmt='.', c=cp[4], alpha=0.75)
if model == 'de':
ax.plot(self.times, self.flux_model(pv), c=cp[0])
if model == 'mc':
flux_pr = self.flux_model(fc[permutation(fc.shape[0])[:1000]])
flux_pc = array(percentile(flux_pr, [50, 0.15,99.85, 2.5,97.5, 16,84], 0))
[ax.fill_between(self.times, *flux_pc[i:i+2,:], alpha=0.2,facecolor=cp[0]) for i in range(1,6,2)]
ax.plot(self.times, flux_pc[0], c=cp[0])
setp(ax, xlim=self.times[[0,-1]], xlabel='Time', ylabel='Normalised flux')
return fig, axs
The priors are contained in a ParameterSet
object from pytransit.param.parameter
. ParameterSet
is a utility class containing a function for calculating the joint prior, etc. We're using only two basic priors: a normal prior NP
, for which $x \sim N(\mu,\sigma)$, a uniform prior UP
, for which $x \sim U(a,b)$.
We could use an informative prior on the planet-star area ratio (squared radius ratio) that we base on the observed NIR transit depth (see below). This is justified since the limb darkening, which affects the observed transit depth, is sufficiently weak in NIR. We would either need to use significantly wider informative prior, or an uninformative one, if we didn't have NIR data.
The model has two components: a multiplicative constant baseline, and a transit shape modelled using the quadratic Mandel & Agol transit model implemented in PyTransit
. The sampling parameterisation is different than the parameterisation used by the transit model, so we need to map the parameters from the sampling space to the model space. Also, we're keeping things simple and assuming a circular orbit. Eccentric orbits will be considered in later tutorials.
The limb darkening uses the parameterisation by Kipping (2013, MNRAS, 435(3), 2152–2160), where the quadratic limb darkening coefficients $u$ and $v$ are mapped from sampling parameters $q_1$ and $q_2$ as
$$ u = 2\sqrt{q_1}q_2, $$$$ v = \sqrt{q_1}(1-2q_2). $$This parameterisation allows us to use uniform priors from 0 to 1 to cover the whole physically sensible $(u,v)$-space.
The log likelihood calculation is carried out by the ll_normal_es
function that evaluates the normal log likelihood given a single error value.
First we need to read in the (mock) observation data stored in obs_data.fits
. The data corresponds to a single transit observed simultaneously in eight passbands (filters). The photometry is saved in extension 1 as a binary table, and we want to read the mid-exposure times and flux values corresponding to different passbands. The time is stored in the time
column, and fluxes are stored in the f_wn_*
columns, where *
is the filter name.
In [36]:
dfile = Path('data').joinpath('obs_data.fits')
data = pf.getdata(dfile, ext=1)
flux_keys = [n for n in data.names if 'f_wn' in n]
filter_names = [k.split('_')[-1] for k in flux_keys]
time = data['time'].astype('d')
fluxes = [data[k].astype('d') for k in flux_keys]
print ('Filter names: ' + ', '.join(filter_names))
First, let's have a quick look at our data, and plot the blue- and redmost passbands.
In [6]:
with sb.axes_style('white'):
fig, axs = subplots(1,2, figsize=(13,5), sharey=True)
axs[0].plot(time,fluxes[0], drawstyle='steps-mid', c=cp[0])
axs[1].plot(time,fluxes[-1], drawstyle='steps-mid', c=cp[2])
setp(axs, xlim=time[[0,-1]])
fig.tight_layout()
Here we see what we'd expect to see. The stronger limb darkening in blue makes the bluemost transit round, while we can spot the end of ingress and the beginning of egress directly by eye from the redmost light curve. Also, the transit is deeper in u' than in Ks, which tells that the impact parameter b is smallish (the transit would be deeper in red than in blue for large b).
First, we create an instance of the log posterior function with the redmost light curve data.
Next, we run the DE optimiser for de_iter
iterations to clump the parameter vector population close to the global posterior maximum, use the DE population to initialise the emcee sampler, and run the sampler for mc_iter
iterations to obtain a posterior sample.
In [56]:
npop, de_iter, mc_reps, mc_iter, thin = 100, 200, 3, 500, 10
lpf = LPFunction('Ks', time, fluxes[-1])
In [57]:
lpf.optimize(de_iter, npop)
In [58]:
lpf.plot_light_curve();
In [59]:
for i in range(mc_reps):
lpf.sample(mc_iter, thin=thin, reset=True, label='MCMC sampling')
In [60]:
lpf.plot_light_curve('mc');
In [50]:
with sb.axes_style('white'):
fig, axs = subplots(2,4, figsize=(13,5), sharex=True)
ls, lc = ['-','--','--'], ['k', '0.5', '0.5']
percs = [percentile(lpf.sampler.chain[:,:,i], [50,16,84], 0) for i in range(8)]
[axs.flat[i].plot(lpf.sampler.chain[:,:,i].T, 'k', alpha=0.01) for i in range(8)]
[[axs.flat[i].plot(percs[i][j], c=lc[j], ls=ls[j]) for j in range(3)] for i in range(8)]
setp(axs, yticks=[], xlim=[0,mc_iter//10])
fig.tight_layout()
Ok, everything looks good. The 16th, 50th and 84th percentiles of the parameter vector population are stable and don't show any significant long-term trends. Now we can flatten the individual chains into one long chain fc
and calculate the median parameter vector.
In [51]:
fc = lpf.sampler.chain.reshape([-1,lpf.sampler.chain.shape[-1]])
mp = median(fc, 0)
Let's also plot the model and the data to see if this all makes sense. To do this, we calculate the conditional distribution of flux using the posterior samples (here, we're using a random subset of samples, although this isn't really necessary), and plot the distribution median and it's median-centred 68%, 95%, and 99.7% central posterior intervals (corresponding approximately to 1, 2, and 3$\sigma$ intervals if the distribution is normal).
In [52]:
flux_pr = lpf.flux_model(fc[permutation(fc.shape[0])[:1000]])
flux_pc = array(percentile(flux_pr, [50, 0.15,99.85, 2.5,97.5, 16,84], 0))
In [53]:
with sb.axes_style('white'):
zx1,zx2,zy1,zy2 = 0.958,0.98, 0.9892, 0.992
fig, ax = subplots(1,1, figsize=(13,4))
cp = sb.color_palette()
ax.errorbar(lpf.times, lpf.fluxes, 10**mp[7], fmt='.', c=cp[4], alpha=0.75)
[ax.fill_between(lpf.times,*flux_pc[i:i+2,:],alpha=0.2,facecolor=cp[0]) for i in range(1,6,2)]
ax.plot(lpf.times, flux_pc[0], c=cp[0])
setp(ax, xlim=lpf.times[[0,-1]], xlabel='Time', ylabel='Normalised flux')
fig.tight_layout()
az = fig.add_axes([0.075,0.18,0.20,0.46])
ax.add_patch(Rectangle((zx1,zy1),zx2-zx1,zy2-zy1,fill=False,edgecolor='k',lw=1,ls='dashed'))
[az.fill_between(lpf.times,*flux_pc[i:i+2,:],alpha=0.2,facecolor=cp[0]) for i in range(1,6,2)]
setp(az, xlim=(zx1,zx2), ylim=(zy1,zy2), yticks=[], xticks=[])
az.plot(lpf.times, flux_pc[0], c=cp[0])
We could (should) also plot the residuals, but I've left them out from the plot for clarity. The plot looks fine, and we can continue to have a look at the parameter estimates.
We start the analysis by making a Pandas data frame df
, using the df.describe
to gen an overview of the estimates, and plotting the posteriors for the most interesting parameters as violin plots.
In [66]:
pd.set_option('display.precision',4)
df = pd.DataFrame(data=fc.copy(), columns=lpf.ps.names)
df['k'] = sqrt(df.k2)
df['u'] = 2*sqrt(df.q1)*df.q2
df['v'] = sqrt(df.q1)*(1-2*df.q2)
df = df.drop('k2', axis=1)
df.describe()
Out[66]:
In [69]:
with sb.axes_style('white'):
fig, axs = subplots(2,3, figsize=(13,5))
pars = 'tc rho b k u v'.split()
[sb.violinplot(y=df[p], inner='quartile', ax=axs.flat[i]) for i,p in enumerate(pars)]
[axs.flat[i].text(0.05,0.9, p, transform=axs.flat[i].transAxes) for i,p in enumerate(pars)]
setp(axs, xticks=[], ylabel='')
fig.tight_layout()
While we're at it, let's plot some correlation plots. The limb darkening coefficients are correlated, and we'd also expect to see a correlation between the impact parameter and radius ratio.
In [70]:
corner(df[['k', 'rho', 'b', 'q1', 'q2']]);
Ok, now, let's do the parameter estimation for all the filters. We wouldn't be doing separate per-filter parameter estimation in real life, since it's much better use of the data to do a simultaneous joint modelling of all the data together (this is something that will be shown in a later tutorial). This will take some time...
In [17]:
chains = []
npop, de_iter, mc_iter, mc_burn, thin = 100, 200, 1500, 1000, 10
for flux in fluxes:
lpf = LPFunction(time, flux)
lpf.optimize(de_iter, npop)
lpf.sample(mc_burn, thin=thin)
lpf.sample(mc_iter, thin=thin, reset=True)
chains.append(lpf.sampler.chain.reshape([-1,lpf.sampler.chain.shape[-1]]))
chains = array(chains)
In [18]:
ids = [list(repeat(filter_names,chains.shape[1])),8*list(range(chains.shape[1]))]
dft = pd.DataFrame(data = concatenate([chains[i,:,:] for i in range(chains.shape[0])]),
index=ids, columns=lpf.ps.names)
dft['es'] *= 1e6
dft['k'] = sqrt(dft.k2)
dft['u'] = 2*sqrt(dft.q1)*dft.q2
dft['v'] = sqrt(dft.q1)*(1-2*dft.q2)
dft = dft.drop('k2', axis=1)
The dataframe creation can probably be done in a nicer way, but we don't need to bother with that. The results are now in a multi-index dataframe, from where we can easily get the per-filter point estimates.
In [19]:
dft.loc['u'].describe()
Out[19]:
In [20]:
with sb.axes_style('white'):
fig, axs = subplots(2,3, figsize=(13,6), sharex=True)
pars = 'tc rho u b k v'.split()
for i,p in enumerate(pars):
sb.violinplot(data=dft[p].unstack().T, inner='quartile', scale='width',
ax=axs.flat[i], order=filter_names)
axs.flat[i].text(0.95,0.9, p, transform=axs.flat[i].transAxes, ha='right')
fig.tight_layout()
As it is, the posterior distributions for different filters agree well with each other. However, the uncertainty in the radius ratio estimate decreases towards redder wavelengths. This is due to the reduced limb darkening, which allows us to estimate the true geometric radius ratio more accurately.
Finally, let's print the parameter estimates for each filter. We'll print the posterior medians with uncertainty estimates based on the central 68% posterior intervals. This matches the posterior mean and its 1-$\sigma$ uncertainty if the posterior is normal (which isn't really the case for many of the posteriors here). In real life, you'd want to report separate + and - uncertainties for the asymmetric posteriors, etc.
In [21]:
def ms(df,p,f):
p = array(percentile(df[p][f], [50,16,84]))
return p[0], abs(p[1:]-p[0]).mean()
def create_row(df,f,pars):
return ('<tr><td>{:}</td>'.format(f)+
''.join(['<td>{:5.4f} ± {:5.4f}</td>'.format(*ms(dft,p,f)) for p in pars])+
'</tr>')
def create_table(df):
pars = 'tc rho b k u v'.split()
return ('<table style="width:100%"><th>Filter</th>'+
''.join(['<th>{:}</th>'.format(p) for p in pars])+
''.join([create_row(df,f,pars) for f in filter_names])+
'</table>')
In [22]:
display(HTML(create_table(dft)))