In [1]:
%matplotlib inline
%config InlineBackend.figure_format = "retina"

from matplotlib import rcParams
rcParams["savefig.dpi"] = 100
rcParams["font.size"] = 20

In their 2010 paper, Hogg, Myers and Bovy introduced the importance sampling method for hierarchal probabilistic inference. The motivation for their paper was the fact that some exoplanet researchers were willing to share posterior samplings for the parameters (period, eccentricity, mass, etc.) of their planetary systems even if they wouldn't share the data. Under these conditions, their method enabled the hierarchical inference of the population of exoplanets while only considering the catalog of posterior samples.

It turns out that this method can be very useful for solving a lot of other problems in hierarchical inference. For example, I used it in my recent paper where we made inferences about the population of exoplanets orbiting Sun-like stars based on a noisy and incomplete catalog. The raw data (all of the light curves downloaded from the Kepler satellite) are publicly available but the procedures used to mine the data for exoplanet signals and measure their properties are extremely computationally expensive. Therefore, in practice, I think that it would be intractable to build a hierarchical model that speaks directly to the data and instead we must rely on catalogs to ease the computational load of our inference and the importance sampling approximation is a perfect tool for the task.

In this post, I'll present the basic idea of the importance sampling method and work through the toy model from Hogg et al. (2010): inferring the eccentricity distribution of exoplanets from a catalog of posterior samples.

The problem

The question that we're trying to answer in this inference problem is:

What can we say about the eccentricity distribution of exoplanets given noisy radial velocity measurements of $K$ targets?

And the graphical model for this problem is pretty simple:


In [2]:
import daft
pgm = daft.PGM([3, 1], node_unit=2, grid_unit=3.5, observed_style="inner")
pgm.add_node(daft.Node("theta", r"$\theta$", 0.5, 0.5, offset=[0, -2.5]))
pgm.add_node(daft.Node("w", r"$\{e_k\}$", 1.5, 0.5, offset=[0, -2.5]))
pgm.add_node(daft.Node("x", r"$\{x_k\}$", 2.5, 0.5, observed=True, offset=[0, -2.5]))
pgm.add_edge("theta", "w")
pgm.add_edge("w", "x")
pgm.render();


Notation   For the rest of this post, I'll be discussing contraints on the rate density $\Gamma_\theta (e) = \mathrm{d}N/\mathrm{d}e$, parameterized by $\theta$, given a set of observations $\{ x_k \}_{k=1}^K$. Each observation $x_k$ is the radial velocity (RV) curve (with an arbitrary number of elements) for the $k$-th system.

The importance sampling method

Instead of working with the observations directly, we'll take advantage of the fact that we have a catalog of measurements. Each element of the catalog is a description of the posterior probability:

$$ p(e_k\,|\,x_k,\,\alpha) = \frac{p(x_k\,|\,w_k)\,p(e_k\,|\,\alpha)}{p(x_k\,|\,\alpha)} $$

where $p(e_k\,|\,\alpha)$ is some interim prior that was chosen when the catalog was made. In particular, we'll assume that this density is represented by a sampling $\{ e_k^{(n)} \}_{n=1}^{N_k}$ where

$$ e_k^{(n)} \sim p(e_k\,|\,x_k,\,\alpha) \quad . $$

Generating a synthetic catalog

Let's start by generate a synthetic catalog of eccentricites from a known underlying distribution. For the true distribution, I'll use a Beta distribution with the shape parameters suggested by Kipping (2013):


In [3]:
from scipy import stats
true_dist = stats.beta(0.867, 3.03)

Now let's generate 150 eccentricites from this distribution and compare the noiseless measurements to the true distribution:


In [5]:
import numpy as np
import matplotlib.pyplot as pl

# Generate the true list of eccentricites.
np.random.seed(1234)
true_e = true_dist.rvs(size=150)

# And plot a histogram of their distribution.
bin_edges = np.linspace(0, 1, 11)
pl.hist(true_e, bin_edges, color="k", histtype="step", normed=True)

# And here's a plot of the true population.
x = np.linspace(0, 1, 5000)
pl.plot(x, true_dist.pdf(x), "k", lw=2)
pl.xlabel("$e$")
pl.ylabel("$p(e)$")
pl.gca().axhline(0, color="k", alpha=0.2)
pl.xlim(0, 1)
pl.ylim(-0.3, 4.3)
pl.title("true values");


Then, we'll add noise to the eccentricity measurements and generate synthetic posterior samples under a prior that is uniform between 0 and 1:


In [6]:
sigma = np.random.uniform(0.1, 0.2, size=len(true_e))
ml_e = np.empty_like(true_e)
post_e = np.empty((len(true_e), 1000))
for i, e in enumerate(true_e):
    # Generate the "maximum likelihood" value.
    ml_e[i] = e + sigma[i] * np.random.randn()

    # Generate fake posterior samples.
    m = np.ones_like(post_e[i], dtype=bool)
    while np.any(m):
        post_e[i, m] = ml_e[i] + sigma[i] * np.random.randn(np.sum(m))
        m = (post_e[i] <= 0) + (post_e[i] >= 1)

And make a plot of the maximum likelihood measurements:


In [8]:
pl.hist(ml_e, bin_edges, color="k", histtype="step", normed=True)
pl.plot(x, true_dist.pdf(x), "--k")
pl.xlabel("$e$")
pl.ylabel("$p(e)$")
pl.gca().axhline(0, color="k", alpha=0.2)
pl.xlim(0, 1)
pl.ylim(-0.3, 4.3)
pl.title("maximum likelihood values");



In [32]:
import emcee
import triangle
import scipy.optimize as op
from scipy.misc import logsumexp

In [33]:
from george_ess import GP
from george import kernels

We'll start by generating some fake data.


In [461]:

Add noise...


In [462]:

Ignoring Uncertainties


In [38]:
def lnprob(theta):
    a, b = np.exp(theta)
    lnp = stats.beta.logpdf(ml_e, a, b)
    return lnp.sum()

In [39]:
ndim, nwalkers = 2, 32
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob)

In [40]:
p0 = [np.log(dist.args) + 0.1 * np.random.randn(ndim) for i in range(nwalkers)]

In [41]:
p0, _, _ = sampler.run_mcmc(p0, 100)

In [42]:
sampler.reset()

In [43]:
p0, _, _ = sampler.run_mcmc(p0, 1000)

In [44]:
fig, axes = pl.subplots(2, 1, sharex=True)
axes[0].plot(sampler.chain[:, :, 0].T, color="k", alpha=0.3)
axes[1].plot(sampler.chain[:, :, 1].T, color="k", alpha=0.3);



In [45]:
triangle.corner(sampler.flatchain, truths=np.log(dist.args));



In [46]:
samples = sampler.flatchain
[pl.plot(x, stats.beta.pdf(x, *(np.exp(sample))), "k", alpha=0.1)
 for sample in samples[np.random.randint(len(samples), size=50)]]
pl.plot(x, dist.pdf(x), "--k")
pl.xlabel("$e$")
pl.ylabel("$p(e)$")
pl.xlim(0, 1)
pl.ylim(0, 5);


Including uncertainties

Write down the probabilistic model.


In [65]:
def lnprob(theta):
    a, b = np.exp(theta)
    lnp = stats.beta.logpdf(post_e, a, b)
    return logsumexp(lnp, axis=1).sum()

In [66]:
ndim, nwalkers = 2, 32
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob)

In [67]:
p0 = np.log(dist.args)[None, :] + 0.1 * np.random.randn(nwalkers, ndim)

In [68]:
p0, _, _ = sampler.run_mcmc(p0, 100)

In [69]:
sampler.reset()

In [70]:
p0, _, _ = sampler.run_mcmc(p0, 1000)

In [71]:
fig, axes = pl.subplots(2, 1, sharex=True)
axes[0].plot(sampler.chain[:, :, 0].T, color="k", alpha=0.3)
axes[1].plot(sampler.chain[:, :, 1].T, color="k", alpha=0.3);