In [1]:
%run notebook_setup

Reproducing the black hole discovery in Thompson et al. 2019

In this science demo tutorial, we will reproduce the results in Thompson et al. 2019, who found and followed-up a candidate stellar-mass black hole companion to a giant star in the Milky Way. We will first use The Joker to constrain the orbit of the system using the TRES follow-up radial velocity data released in their paper and show that we get consistent period and companion mass constraints from modeling these data. We will then do a joint analysis of the TRES and APOGEE data for this source by simultaneously fitting for and marginalizing over an unknown constant velocity offset between the two surveys.

A bunch of imports we will need later:


In [2]:
from astropy.io import ascii
from astropy.time import Time
import astropy.units as u
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np

import pymc3 as pm
import exoplanet.units as xu
import exoplanet as xo
import corner

import thejoker as tj
from twobody.transforms import get_m2_min

In [3]:
# set up a random number generator to ensure reproducibility
seed = 42
rnd = np.random.default_rng(seed=seed)

Load the data

We will start by loading data, copy-pasted from Table S2 in Thompson et al. 2019):


In [4]:
tres_tbl = ascii.read(
    """8006.97517 0.000 0.075
    8023.98151 -43.313 0.075
    8039.89955 -27.963 0.045
    8051.98423 10.928 0.118
    8070.99556 43.782 0.075
    8099.80651 -30.033 0.054
    8106.91698 -42.872 0.135
    8112.81800 -44.863 0.088
    8123.79627 -25.810 0.115
    8136.59960 15.691 0.146
    8143.78352 34.281 0.087""", 
    names=['HJD', 'rv', 'rv_err'])
tres_tbl['rv'].unit = u.km/u.s
tres_tbl['rv_err'].unit = u.km/u.s

In [5]:
apogee_tbl = ascii.read(
    """6204.95544 -37.417 0.011
    6229.92499 34.846 0.010
    6233.87715 42.567 0.010""", 
    names=['HJD', 'rv', 'rv_err'])
apogee_tbl['rv'].unit = u.km/u.s
apogee_tbl['rv_err'].unit = u.km/u.s

In [6]:
tres_data = tj.RVData(
    t=Time(tres_tbl['HJD'] + 2450000, format='jd', scale='tcb'),
    rv=u.Quantity(tres_tbl['rv']), 
    rv_err=u.Quantity(tres_tbl['rv_err']))

apogee_data = tj.RVData(
    t=Time(apogee_tbl['HJD'] + 2450000, format='jd', scale='tcb'),
    rv=u.Quantity(apogee_tbl['rv']), 
    rv_err=u.Quantity(apogee_tbl['rv_err']))

Let's now plot the data from these two instruments:


In [7]:
for d, name in zip([tres_data, apogee_data], ['TRES', 'APOGEE']):
    d.plot(color=None, label=name)
plt.legend(fontsize=18)


Out[7]:
<matplotlib.legend.Legend at 0x112479670>
findfont: Font family ['serif'] not found. Falling back to DejaVu Sans.
findfont: Font family ['cursive'] not found. Falling back to DejaVu Sans.
findfont: Font family ['serif'] not found. Falling back to DejaVu Sans.

Run The Joker with just the TRES data

The two data sets are separated by a large gap in observations between the end of APOGEE and the start of the RV follow-up with TRES. Since there are more observations with TRES, we will start by running The Joker with just data from TRES before using all of the data. Let's plot the TRES data alone:


In [8]:
_ = tres_data.plot()


It is pretty clear that there is a periodic signal in the data, with a period between 10s to ~100 days (from eyeballing the plot above), so this limits the range of periods we need to sample over with The Joker below. The reported uncertainties on the individual RV measurements (plotted above, I swear) are all very small (typically smaller than the markers). So, we may want to allow for the fact that these could be under-estimated. With The Joker, we support this by accepting an additional nonlinear parameter, s, that specifies a global, extra uncertainty that is added in quadrature to the data uncertainties while running the sampler. That is, the uncertainties used for computing the likelihood in The Joker are computed as: $$ \sigma_n = \sqrt{\sigma_{n,0}^2 + s^2} $$ where $\sigma_{n,0}$ are the values reported for each $n$ data point in the tables above. We'll use a log-normal prior on this extra error term, but will otherwise use the default prior form for The Joker:


In [9]:
with pm.Model() as model:
    # Allow extra error to account for under-estimated error bars
    s = xu.with_unit(pm.Lognormal('s', -2, 1),
                     u.km/u.s)
    
    prior = tj.JokerPrior.default(
        P_min=16*u.day, P_max=128*u.day,  # Range of periods to consider
        sigma_K0=30*u.km/u.s, P0=1*u.year,  # scale of the prior on semiamplitude, K
        sigma_v=25*u.km/u.s,  # std dev of the prior on the systemic velocity, v0
        s=s
    )

With the prior set up, we can now generate prior samples, and run the rejection sampling step of The Joker:


In [10]:
# Generate a large number of prior samples:
prior_samples = prior.sample(size=1_000_000,
                             random_state=rnd)

In [11]:
# Run rejection sampling with The Joker:
joker = tj.TheJoker(prior, random_state=rnd)
samples = joker.rejection_sample(tres_data, prior_samples,
                                 max_posterior_samples=256)
samples


Out[11]:
<JokerSamples [P, e, omega, M0, s, K, v0] (1 samples)>

Only 1 sample is returned from the rejection sampling step - let's see how well it matches the data:


In [12]:
_ = tj.plot_rv_curves(samples, data=tres_data)


Let's look at the values of the sample that was returned, and compare that to the values reported in Thompson et al. 2019, included below for convenience: $$ P = 83.205 \pm 0.064\\ e = 0.00476 \pm 0.00255\\ K = 44.615 \pm 0.123 $$


In [13]:
samples.tbl['P', 'e', 'K']


Out[13]:
QTable length=1
PeK
dkm / s
float64float64float64
82.828112349573940.007054391624082691-44.374468266292205

Already these look very consistent with the values inferred in the paper!

Let's now also plot the data phase-folded on the period returned in the one sample we got from The Joker:


In [14]:
_ = tres_data.plot(phase_fold=samples[0]['P'])


WARNING: AstropyDeprecationWarning: The truth value of a Quantity is ambiguous. In the future this will raise a ValueError. [astropy.units.quantity]

At this point, since the data are very constraining, we could use this one Joker sample to initialize standard MCMC to generate posterior samplings in the orbital parameters for this system. We will do that below, but first let's see how things look if we include both TRES and APOGEE data in our modeling.

Run The Joker with TRES+APOGEE data

One of the challenges with incorporating data from the two surveys is that they were taken with two different spectrographs, and there could be instrumental offsets that manifest as shifts in the absolute radial velocities measured between the two instruments. The Joker now supports simultaneously sampling over additional parameters that represent instrumental or calibratrion offsets, so let's take a look at how to run The Joker in this mode.

To start, we can pack the two datasets into a single list that contains data from both surveys:


In [15]:
data = [apogee_data, tres_data]

Before we run anything, let's try phase-folding both datasets on the period value we got from running on the TRES data alone:


In [16]:
tres_data.plot(color=None, phase_fold=np.mean(samples['P']))
apogee_data.plot(color=None, phase_fold=np.mean(samples['P']))


Out[16]:
<matplotlib.axes._subplots.AxesSubplot at 0x1122e4e80>

That looks pretty good, but the period is clearly slightly off and there seems to be a constant velocity offset between the two surveys, given that the APOGEE RV points don't seem to lie in the RV curve. So, let's now try running The Joker on the joined dataset!

To allow for an unknown constant velocity offset between TRES and APOGEE, we have to define a new parameter for this offset and specify a prior. We'll put a Gaussian prior on this offset parameter (named dv0_1 below), with a mean of 0 and a standard deviation of 10 km/s, because it doesn't look like the surveys have a huge offset.


In [17]:
with pm.Model() as model:
    # The parameter that represents the constant velocity offset between
    # APOGEE and TRES:
    dv0_1 = xu.with_unit(pm.Normal('dv0_1', 0, 5.),
                         u.km/u.s)
    
    # The same extra uncertainty parameter as previously defined
    s = xu.with_unit(pm.Lognormal('s', -2, 1),
                     u.km/u.s)
    
    # We can restrict the prior on prior now, using the above
    prior_joint = tj.JokerPrior.default(
        # P_min=16*u.day, P_max=128*u.day,
        P_min=75*u.day, P_max=90*u.day,
        sigma_K0=30*u.km/u.s, P0=1*u.year,
        sigma_v=25*u.km/u.s,
        v0_offsets=[dv0_1],
        s=s
    )
    
prior_samples_joint = prior_joint.sample(size=10_000_000, 
                                         random_state=rnd)

In [18]:
# Run rejection sampling with The Joker:
joker_joint = tj.TheJoker(prior_joint, random_state=rnd)
samples_joint = joker_joint.rejection_sample(data, 
                                             prior_samples_joint,
                                             max_posterior_samples=256)
samples_joint


Out[18]:
<JokerSamples [P, e, omega, M0, s, K, v0, dv0_1] (1 samples)>

Here we again only get one sample back from The Joker, because these ata are so constraining:


In [19]:
_ = tj.plot_rv_curves(samples_joint, data=data)


Now, let's fire up standard MCMC, using the one Joker sample to initialize. We will use the NUTS sampler in pymc3 to run here. When running MCMC to model radial velocities with Keplerian orbits, it is typically important to think about the parametrization. There are several angle parameters in the two-body problem (e.g., argument of pericenter, phase, inclination, etc.) that can be especially hard to sample over naïvely. Here, for running MCMC, we will instead sample over $M_0 - \omega, \omega$ instead of $M_0, \omega$, and we will define these angles as exoplanet.distributions.Angle distributions, which internally transform and sample in $\cos{x}, \sin{x}$ instead:


In [20]:
from exoplanet.distributions import Angle

with pm.Model():
    
    # See note above: when running MCMC, we will sample in the parameters
    # (M0 - omega, omega) instead of (M0, omega)
    M0_m_omega = xu.with_unit(Angle('M0_m_omega'), u.radian)
    omega = xu.with_unit(Angle('omega'), u.radian)
    # M0 = xu.with_unit(Angle('M0'), u.radian)
    M0 = xu.with_unit(pm.Deterministic('M0', M0_m_omega + omega),
                      u.radian)
    
    # The same offset and extra uncertainty parameters as above:
    dv0_1 = xu.with_unit(pm.Normal('dv0_1', 0, 5.), u.km/u.s)
    s = xu.with_unit(pm.Lognormal('s', -2, 0.5),
                     u.km/u.s)
    
    prior_mcmc = tj.JokerPrior.default(
        P_min=16*u.day, P_max=128*u.day,
        sigma_K0=30*u.km/u.s, P0=1*u.year,
        sigma_v=25*u.km/u.s,
        v0_offsets=[dv0_1],
        s=s,
        pars={'M0': M0, 'omega': omega}
    )
    
    joker_mcmc = tj.TheJoker(prior_mcmc, random_state=rnd)
    mcmc_init = joker_mcmc.setup_mcmc(data, samples_joint)
    
    trace = pm.sample(
        tune=1000, draws=1000,
        start=mcmc_init,
        step=xo.get_dense_nuts_step(target_accept=0.95),
        random_seed=seed)


Multiprocess sampling (4 chains in 4 jobs)
NUTS: [v0, K, P, e, s, dv0_1, omega, M0_m_omega]
Sampling 4 chains, 0 divergences: 100%|██████████| 8000/8000 [01:39<00:00, 80.32draws/s] 

We can now use pymc3 to look at some statistics of the MC chains to assess convergence:


In [21]:
pm.summary(trace, var_names=prior_mcmc.par_names)


Out[21]:
mean sd hpd_3% hpd_97% mcse_mean mcse_sd ess_mean ess_sd ess_bulk ess_tail r_hat
P 83.160 0.014 83.133 83.185 0.000 0.000 1356.0 1356.0 1353.0 1983.0 1.0
e 0.003 0.002 0.000 0.007 0.000 0.000 1325.0 1325.0 1050.0 1027.0 1.0
omega 0.509 1.032 -1.231 3.124 0.025 0.028 1713.0 667.0 1895.0 1064.0 1.0
M0 0.008 1.034 -1.731 2.622 0.025 0.029 1699.0 637.0 1895.0 1011.0 1.0
s 0.219 0.055 0.129 0.322 0.001 0.001 1785.0 1785.0 1829.0 2308.0 1.0
K -44.720 0.102 -44.908 -44.529 0.002 0.002 1998.0 1998.0 2017.0 2045.0 1.0
v0 2.178 0.573 1.182 3.346 0.015 0.011 1382.0 1382.0 1374.0 2133.0 1.0
dv0_1 -2.632 0.590 -3.775 -1.538 0.016 0.011 1355.0 1355.0 1341.0 2047.0 1.0

We can then transform the MCMC samples back into a JokerSamples instance so we can manipulate and visualize the samples:


In [22]:
mcmc_samples = joker_mcmc.trace_to_samples(trace, data=data)
mcmc_samples.wrap_K()


Out[22]:
<JokerSamples [P, e, omega, M0, s, K, v0, dv0_1] (4000 samples)>

For example, we can make a corner plot of the orbital parameters (note the strong degenceracy between M0 and omega! But also note that we don't sample in these parameters explicitly, so this shouldn't affect convergence):


In [23]:
df = mcmc_samples.tbl.to_pandas()
_ = corner.corner(df)


We can also use the median MCMC sample to fold the data and plot residuals relative to our inferred RV model:


In [24]:
fig, axes = plt.subplots(2, 1, figsize=(6, 8), sharex=True)

_ = tj.plot_phase_fold(mcmc_samples.median(), data, ax=axes[0], add_labels=False)
_ = tj.plot_phase_fold(mcmc_samples.median(), data, ax=axes[1], residual=True)

for ax in axes:
    ax.set_ylabel(f'RV [{apogee_data.rv.unit:latex_inline}]')
    
axes[1].axhline(0, zorder=-10, color='tab:green', alpha=0.5)
axes[1].set_ylim(-1, 1)


Out[24]:
(-1.0, 1.0)

Finally, let's convert our orbit samples into binary mass function, $f(M)$, values to compare with one of the main conclusions of the Thompson et al. paper. We can do this by first converting the samples to KeplerOrbit objects, and then using the .m_f attribute to get the binary mass function values:


In [ ]:
mfs = u.Quantity([mcmc_samples.get_orbit(i).m_f
                  for i in np.random.choice(len(mcmc_samples), 1024)])
plt.hist(mfs.to_value(u.Msun), bins=32);
plt.xlabel(rf'$f(M)$ [{u.Msun:latex_inline}]');

# Values from Thompson et al., showing 1-sigma region
plt.axvline(0.766, zorder=100, color='tab:orange')
plt.axvspan(0.766 - 0.00637,
            0.766 + 0.00637,
            zorder=10, color='tab:orange', 
            alpha=0.4, lw=0)

In the end, using both the APOGEE and TRES data, we confirm the results from the paper, and find that the binary mass function value suggests a large mass companion. A success for reproducible science!