In [ ]:
import numpy as np
import pandas as pd
from scipy.optimize import minimize
import emcee
import george
from george import kernels
import corner
import matplotlib.pyplot as plt
%matplotlib notebook
In this lecture we will examine alternative methods to search for periodic signals in astronomical time series. The problems will provide a particular focus on a relatively new technique, which is to model the periodic behavior as a Gaussian Process, and then sample the posterior to identify the optimal period via Markov Chain Monte Carlo analysis. A lot of this work has been pioneered by previous DSFP lecturer Suzanne Aigrain.
The string length method (Dworetsky 1983), phase folds the data at trial periods and then minimizes the distance to connect the phase-ordered observations.
Phase Dispersion Minimization (PDM; Jurkevich 1971, Stellingwerth 1978), like LS, folds the data at a large number of trial frequencies $f$.
The phased data are then binned, and the variance is calculated in each bin, combined, and compared to the overall variance of the signal. No functional form of the signal is assumed, and thus, non-sinusoidal signals can be found.
Challenge: how to select the number of bins?
Analysis of Variance (AOV; Schwarzenberg-Czerny 1989) is similar to PDM. Optimal periods are defined via hypothesis testing, and these methods are found to perform best for certain types of astronomical signals.
Supersmoother (Reimann) is a least-squares approach wherein a flexible, non-parametric model is fit to the folded observations at many trial frequncies. The use of this flexible model reduces aliasing issues relative to models that assume a sinusoidal shape, however, this comes at the cost of requiring considerable computational time.
Conditional Entropy (CE; Graham et al. 2013), and other entropy based methods, aim to minimize the entropy in binned (normalized magnitude, phase) space. CE, in particular, is good at supressing signal due to the window function.
When tested on real observations, CE outperforms most of the alternatives (e.g., LS, PDM, etc).
There have been some efforts to frame the period-finding problem in a Bayesian framework. Bretthorst 1988 developed Bayesian generalized LS models, while Gregory & Loredo 1992 applied Bayesian techniques to phase-binned models.
More recently, efforts to use Gaussian processes (GPs) to model and extract a period from the light curve have been developed (Wang et al. 2012). These methods have proved to be especially useful for detecting stellar rotation in Kepler light curves (Angus et al. 2018).
[Think of Suzanne's lectures during session 4]
For this lecture we will focus on the use GPs, combined with an MCMC analysis (and we will take some shortcuts in the interest of time), to identify periodic signals in astronomical data.
Problem 1a
Adjust the variable ncores
to match the number of CPUs on your machine.
In [ ]:
ncores = # adjust to number of CPUs on your machine
np.random.seed(23)
Problem 1b
Create a function gen_periodic_data
that returns
where $C$, $A$, and $P$ are constants, $x$ is input data and $\sigma_y$ represents Gaussian noise.
Hint - this should only require a minor adjustment to your function from lecture 1.
In [ ]:
def gen_periodic_data( # complete
y = # complete
return y
Problem 1c
Later, we will be using MCMC. Execute the following cell which will plot the chains from emcee
to follow the MCMC walkers.
In [ ]:
def plot_chains(sampler, nburn, paramsNames):
Nparams = len(paramsNames) # + 1
fig, ax = plt.subplots(Nparams,1, figsize = (8,2*Nparams), sharex = True)
fig.subplots_adjust(hspace = 0)
ax[0].set_title('Chains')
xplot = range(len(sampler.chain[0,:,0]))
for i,p in enumerate(paramsNames):
for w in range(sampler.chain.shape[0]):
ax[i].plot(xplot[:nburn], sampler.chain[w,:nburn,i], color="0.5", alpha = 0.4, lw = 0.7, zorder = 1)
ax[i].plot(xplot[nburn:], sampler.chain[w,nburn:,i], color="k", alpha = 0.4, lw = 0.7, zorder = 1)
ax[i].set_ylabel(p)
fig.tight_layout()
return ax
Problem 1d
Using gen_periodic_data
generate 250 observations taken at random times between 0 and 10, with $C = 10$, $A = 2$, $P = 0.4$, and variance of the noise = 0.1. Create an uncertainty array dy
with the same length as y
and each value equal to $\sqrt{0.1}$.
Plot the resulting data over the exact (noise-free) signal.
In [ ]:
x = # complete
y = # complete
dy = # complete
# complete
fig, ax = plt.subplots()
ax.errorbar( # complete
ax.plot( # complete
# complete
# complete
fig.tight_layout()
A common approach$^\dagger$ in the literature for problems where there is good reason to place a strong prior on the signal (i.e. to only try and fit a single model) is maximum likelihood optimization [this is sometimes also called $\chi^2$ minimization].
$^\dagger$The fact that this approach is commonly used, does not mean it should be commonly used.
In this case, where we are fitting for a known signal in simulated data, we are justified in assuming an extremely strong prior and fitting a sinusoidal model to the data.
Problem 2a
Write a function, correct_model
, that returns the expected signal for our data given input time $t$:
where $a, b, c$ are model parameters.
Hint - store the model parameters in a single variable (this will make things easier later).
In [ ]:
def correct_model( # complete
# complete
return # complete
For these data the log likelihood of the data can be written as:
$$\ln \mathcal{L} = -\frac{1}{2} \sum \left(\frac{y - f(t)}{\sigma_y}\right)^2$$Ultimately, it is easier to minimize the negative log likelihood, so we will do that.
Problem 2b
Write a function, lnlike1
, that returns the log likelihood for the data given model parameters $\theta$, and $t, y, \sigma_y$.
Write a second function, nll
, that returns the negative log likelihood.
In [ ]:
def lnlike1( # complete
return # complete
def nll( # complete
return # complete
Problem 2c
Use the minimize
function from scipy.optimize
to determine maximum likelihood estimates for the model parameters for the data simulated in problem 1d. What is the best fit period?
The optimization routine requires an initial guess for the model parameters, use 10 for the offset, 1 for the amplitude of variations, and 0.39 for the period.
Hint - as arguments, minimize
takes the function, nll
, the initial guess, and optional keyword args
, which should be (x, y, dy)
in this case.
In [ ]:
initial_theta = # complete
res = minimize( # complete
print("The maximum likelihood estimate for the period is: {:.5f}".format( # complete
Problem 2d
Plot the input model, the noisy data, and the maximum likelihood model.
How does the model fit look?
In [ ]:
fig, ax = plt.subplots()
ax.errorbar( # complete
ax.plot( # complete
ax.plot( # complete
ax.set_xlabel('x')
ax.set_ylabel('y')
fig.tight_layout()
Problem 2e
Repeat the maximum likelihood optimization, but this time use an initial guess of 10 for the offset, 1 for the amplitude of variations, and 0.393 for the period.
In [ ]:
initial_theta = # complete
res = minimize( # complete
print("The ML estimate for a, b, c is: {:.5f}, {:.5f}, {:.5f}".format( # complete
Given the lecture order this is a little late, but we have now identified the fundamental challenge in identifying periodic signals in astrophysical observations:
periodic models are highly non-linear!
This can easily be seen in the LS periodograms from the previous lecture: period estimates essentially need to be perfect to properly identify the signal. Take for instance the previous example, where we adjusted the initial guess for the period by less than 1% and it made the difference between correct estimates catastrophic errors.
This also means that classic optimization procedures (e.g., gradient decent) are helpless for this problem. If you guess the wrong period there is no obvious way to know whether the subsequent guess should use a larger or smaller period.
Given our lack of success with maximum likelihood techniques, we will now attempt a Bayesian approach. As a brief reminder, Bayes theorem tells us that:
$$P(\theta|X) \propto P(X|\theta) P(\theta).$$In words, the posterior probability is proportional to the likelihood multiplied by the prior. We will use sampling techniques, MCMC, to estimate the posterior.
Remember - we already calculated the likelihood above.
Problem 3a
Write a function lnprior1
to calculate the log of the prior on $\theta$. Use a reasonable, wide and flat prior for all the model parameters.
Hint - for emcee
the log prior should return 0 within the prior and $-\infty$ otherwise.
In [ ]:
def lnprior1( # complete
a, b, c = # complete
if # complete
return 0.0
return -np.inf
Problem 3b
Write a function lnprob1
to calculate the log of the posterior probability. This function should take $\theta$ and x, y, dy
as inputs.
In [ ]:
def lnprob1( # complete
lp = lnprior1(theta)
if np.isfinite(lp):
return # complete
return -np.inf
Problem 3c
Initialize the walkers for emcee
, which we will use to draw samples from the posterior. Like before, we need to include an initial guess (the parameters of which don't matter much beyond the period). Start with a guess of 0.6 for the period.
As a quick reminder, emcee
is a pure python implementation of Goodman & Weare's affine Invariant Markov Chain Monte Carlo (MCMC) Ensemble sampler. emcee
seeds several "walkers" which are members of the ensemble. You can think of each walker as its own Metropolis-Hastings chain, but the key detail is that the chains are not independent. Thus, the proposal distribution for each new step in the chain is dependent upon the position of all the other walkers in the chain.
Choosing the initial position for each of the walkers does not significantly affect the final results (though it will affect the burn in time). Standard procedure is to create several walkers in a small ball around a reasonable guess [the samplers will quickly explore beyond the extent of the initial ball].
In [ ]:
guess = [10, 1, 0.6]
ndim = len(guess)
nwalkers = 100
p0 = [np.array(guess) + 1e-8 * np.random.randn(ndim)
for i in range(nwalkers)]
sampler = emcee.EnsembleSampler(nwalkers, ndim, lnprob1, args=(x, y, dy), threads = ncores)
Problem 3d
Run the walkers through 1000 steps.
Hint - The run_mcmc
method on the sampler
object may be useful.
In [ ]:
sampler.run_mcmc( # complete
Problem 3e
Use the previous created plot_chains
helper funtion to plot the chains from the MCMC sampling. Note - you may need to adjust nburn
after examining the chains.
Have your chains converged? Will extending the chains improve this?
In [ ]:
params_names = # complete
nburn = # complete
plot_chains( # complete
Problem 3f
Make a corner plot (use corner
) to examine the post burn-in samples from the MCMC chains.
In [ ]:
samples = sampler.chain[:, nburn:, :].reshape((-1, ndim))
fig = # complete
As you can see - force feeding this problem into a Bayesian framework does not automatically generate more reasonable answers. While some of the chains appear to have identified periods close to the correct period most of them are suck in local minima.
There are sampling techniques designed to handle multimodal posteriors, but the non-linear nature of this problem makes it difficult for the various walkers to explore the full parameter space in the way that we would like.
A GP is fully specified by a mean function and a covariance matrix $K$. In this case, we wish to model the simulated data from problem 1. If we specify a cosine kernel for the covariance:
$$K_{ij} = k(x_i - x_j) = \cos\left(\frac{2\pi \left|x_i - x_j\right|}{P}\right)$$then the mean function is simply the offset, b
.
Problem 4a
Write a function model2
that returns the mean function for the GP given input parameters $\theta$.
Hint - no significant computation is required to complete this task.
In [ ]:
def model2( # complete
# complete
return # complete
To model the GP in this problem we will use the george
package (first introduced during session 4) written by Dan Foreman-Mackey. george
is a fast and flexible tool for GP regression in python. It includes several built-in kernel functions, which we will take advantage of.
Problem 4b
Write a function lnlike2
to calculate the likelihood for the GP model assuming a cosine kernel, and mean model defined by model2
.
Note - george
takes $\ln P$ as an argument and not $P$. We will see why this is useful later.
Hint - there isn't a lot you need to do for this one! But pay attention to the functional form of the model.
In [ ]:
def lnlike2(theta, t, y, yerr):
lnper, lna = theta[:2]
gp = george.GP(np.exp(lna) * kernels.CosineKernel(lnper))
gp.compute(t, yerr)
return gp.lnlikelihood(y - model2(theta, t), quiet=True)
Problem 4c
Write a function lnprior2
to calculte $\ln P(\theta)$, the log prior for the model parameters. Use a wide flat prior for the parameters.
Note - a flat prior in log space is not flat in the parameters.
In [ ]:
def lnprior2( # complete
# complete
# complete
# complete
# complete
# complete
# complete
Problem 4d
Write a function lnprob2
to calculate the log posterior given the model parameters and data.
In [ ]:
def lnprob2(# complete
# complete
# complete
# complete
# complete
# complete
# complete
Problem 4e
Intialize 100 walkers in an emcee.EnsembleSampler
variable called sampler
. For you initial guess at the parameter values set $\ln a = 1$, $\ln P = 1$, and $b = 8$.
Note - this is very similar to what you did previously.
In [ ]:
initial = # complete
ndim = len(initial)
p0 = [np.array(initial) + 1e-4 * np.random.randn(ndim)
for i in range(nwalkers)]
sampler = emcee.EnsembleSampler( # complete
Problem 4f
Run the chains for 200 steps.
Hint - you'll notice these are shorter chains than we previously used. That is because the computational time is longer, as will be the case for this and all the remaining problems.
In [ ]:
p0, _, _ = sampler.run_mcmc( # complete
Problem 4g
Plot the chains from the MCMC.
In [ ]:
params_names = ['ln(P)', 'ln(a)', 'b']
nburn = # complete
plot_chains( # complete
It should be clear that the chains have not, in this case, converged. This will be true even if you were to continue to run them for a very long time.
Nevertheless, if we treat this entire run as a burn in, we can actually extract some useful information from this initial run. In particular, we will look at the posterior values for the different walkers at the end of their chains. From there we will re-initialize our walkers.
We are actually free to initialize the walkers at any location we choose, so this approach is not cheating. However, one thing that should make you a bit uneasy about the way in which we are re-initializing the walkers is that we have no guarantee that the initial run that we just performed found a global maximum for the posterior. Thus, it may be the case that our continued analysis in this case is not "right."
Problem 4h
Below you are given two arrays, chain_lnp_end
and chain_lnprob_end
, that contain the final $\ln P$ and log posterior, respectively, for each of the walkers.
Plot these two arrays against each other, to get a sense of what period is "best."
In [ ]:
chain_lnp_end = sampler.chain[:,-1,0]
chain_lnprob_end = sampler.lnprobability[:,-1]
fig, ax = plt.subplots()
ax.scatter( # complete
# complete
# complete
fig.tight_layout()
Problem 4i
Reinitialize the walkers in a ball around the maximum log posterior value from the walkers in the previous burn in. Then run the MCMC sampler for 200 steps.
Hint - you'll want to run sampler.reset()
prior to the running the MCMC, but after selecting the new starting point for the walkers.
In [ ]:
p = # complete
sampler.reset()
p0 = # complete
p0, _, _ = sampler.run_mcmc( # complete
Problem 4j
Plot the chains. Have they converged?
In [ ]:
paramsNames = ['ln(P)', 'ln(a)', 'b']
nburn = # complete
plot_chains( # complete
Problem 4k
Make a corner plot of the samples. Does the marginalized distribution on $P$ make sense?
In [ ]:
fig =
If you run the cell below, you will see random samples from the posterior overplotted on the data. Do the posterior samples seem reasonable in this case?
In [ ]:
fig, ax = plt.subplots()
ax.errorbar(x, y, dy, fmt='o')
ax.set_xlabel('x')
ax.set_ylabel('y')
for s in samples[np.random.randint(len(samples), size=5)]:
# Set up the GP for this sample.
lnper, lna = s[:2]
gp = george.GP(np.exp(lna) * kernels.CosineKernel(lnper))
gp.compute(x, dy)
# Compute the prediction conditioned on the observations and plot it.
m = gp.sample_conditional(y - model2(s, x), x_grid) + model2(s, x_grid)
ax.plot(x_grid, m, color="0.2", alpha=0.3)
fig.tight_layout()
Problem 4l
What is the marginalized best period estimate, including uncertainties?
In [ ]:
# complete
print('ln(P) = {:.6f} +{:.6f} -{:.6f}'.format( # complete
print('True period = 0.4, GP Period = {:.4f}'.format( # complete
In this way - it is possible to use GPs + MCMC to determine the period in noisy irregular data. Furthermore, unlike with LS, we actually have a direct estimate on the uncertainty for that period.
As I previously alluded to, however, the solution does depend on how we initialize the walkers. Because this is simulated data, we know that the correct period has been estimated in this case, but there's no guarantee of that once we start working with astronomical sources. This is something to keep in mind if you plan on using GPs to search for periodic signals...
As we saw in the first lecture, there are many sources with periodic light curves that are not strictly sinusoidal. Thus, the use of the cosine kernel (on its own) may not be sufficient to model the signal. As Suzanne told us during session, the quasi-period kernel:
$$K_{ij} = k(x_i - x_j) = \exp \left(-\Gamma \sin^2\left[\frac{\pi}{P} \left|x_i - x_j\right|\right]\right)$$is useful for non-sinusoidal signals. We will now use this kernel to model the variations in the simulated data.
Problem 5a
Write a function lnprob3
to calculate log posterior given model parameters $\theta$ and data x, y, dy
.
Hint - it may be useful to write this out as multiple functions.
In [ ]:
# complete
# complete
# complete
def lnprob3( # complete
# complete
# complete
Problem 5b
Initialize 100 walkers around a reasonable starting point. Be sure that $\ln P = 0$ in this initialization.
Run the MCMC for 200 steps.
Hint - it may be helpful to run this second step in a separate cell.
In [ ]:
# complete
# complete
# complete
sampler = emcee.EnsembleSampler( # complete
In [ ]:
p0, _, _ = sampler.run_mcmc( # complete
Problem 5c
Plot the chains from the MCMC. Did the chains converge?
In [ ]:
paramsNames = ['ln(P)', 'ln(a)', 'b', '$ln(\gamma)$']
nburn = # complete
plot_chains( # complete
Problem 5d
Plot the final $\ln P$ vs. log posterior for each of the walkers. Do you notice anything interesting?
Hint - recall that you are plotting the log posterior, and not the posterior.
In [ ]:
# complete
# complete
# complete
# complete
# complete
# complete
Problem 5e
Re-initialize the walkers around the chain with the maximum log posterior value.
Run the MCMC for 500 steps.
In [ ]:
p = # complete
sampler.reset()
# complete
sampler.run_mcmc( # complete
Problem 5f
Plot the chains for the MCMC.
Hint - you may need to adjust the length of the burn in.
In [ ]:
paramsNames = ['ln(P)', 'ln(a)', 'b', '$ln(\gamma)$']
nburn = # complete
plot_chains( # complete
Problem 5g
Make a corner plot for the samples.
Is the marginalized estimate for the period reasonable?
In [ ]:
# complete
fig = # complete
We will now apply this model to the same light curve that we studied in the LS lecture.
In this case we do not know the actual period (that's only sorta true), so we will have to be even more careful about initializing the walkers and performing burn in than we were previously.
Problem 6a
Read in the data for the light curve stored in example_asas_lc.dat
.
In [ ]:
# complete
Problem 6b
Adjust the prior from problem 5 to be appropriate for this data set.
In [ ]:
def lnprior3( # complete
# complete
# complete
# complete
# complete
# complete
# complete
# complete
Because we have no idea where to initialize our walkers in this case, we are going to use an ad hoc common sense + brute force approach.
Problem 6c
Run LombScarge
on the data and determine the top three peaks in the periodogram. Set nterms
= 2, and the maximum frequency to 5 (this is arbitrary but sufficient in this case).
Hint - you may need to search more than the top 3 periodogram values to find the 3 peaks.
In [ ]:
from astropy.stats import LombScargle
frequency, power = # complete
print('Top LS period is {}'.format(# complete
print( # complete
Problem 6d
Initialize one third of your 100 walkers around each of the periods identified in the previous problem (note - the total number of walkers must be an even number, so use 34 walkers around one of the top 3 frequency peaks).
Run the MCMC for 500 steps following this initialization.
In [ ]:
initial1 = # complete
# complete
# complete
initial2 = # complete
# complete
# complete
initial3 = # complete
# complete
# complete
# complete
sampler = emcee.EnsembleSampler( # complete
In [ ]:
p0, _, _ = sampler.run_mcmc( # complete
Problem 6e
Plot the chains.
In [ ]:
paramsNames = ['ln(P)', 'ln(a)', 'b', '$ln(\gamma)$']
nburn = # complete
plot_chains( # complete
Problem 6f
Plot $\ln P$ vs. log posterior.
In [ ]:
# complete
# complete
# complete
# complete
# complete
# complete
Problem 6g
Reinitialize the walkers around the previous walker with the maximum posterior value.
Run the MCMC for 500 steps. Plot the chains. Have they converged?
In [ ]:
# complete
sampler.reset()
# complete
# complete
sampler.run_mcmc( # complete
In [ ]:
paramsNames = ['ln(P)', 'ln(a)', 'b', '$ln(\gamma)$']
nburn = # complete
plot_chains( # complete
Problem 6h
Make a corner plot of the samples. What is the marginalized estimate for the period of this source?
How does this estimate compare to LS?
In [ ]:
# complete
fig = corner.corner( # complete
In [ ]:
# complete
print('ln(P) = {:.6f} +{:.6f} -{:.6f}'.format( # complete
print('GP Period = {:.6f}'.format( # complete
The cell below shows marginalized samples overplotted on the actual data. How well does the model perform?
In [ ]:
fig, ax = plt.subplots()
ax.errorbar(lc['hjd'], lc['mag'], lc['mag_unc'], fmt='o')
ax.set_xlabel('HJD (d)')
ax.set_ylabel('mag')
hjd_grid = np.linspace(2800, 3000,3000)
for s in samples[np.random.randint(len(samples), size=5)]:
# Set up the GP for this sample.
lnper, lna, b, lngamma = s
gp = george.GP(np.exp(lna) * kernels.ExpSine2Kernel(np.exp(lngamma), lnper))
gp.compute(lc['hjd'], lc['mag_unc'])
# Compute the prediction conditioned on the observations and plot it.
m = gp.sample_conditional(lc['mag'] - model3(s, lc['hjd']), hjd_grid) + model3(s, hjd_grid)
ax.plot(hjd_grid, m, color="0.2", alpha=0.3)
fig.tight_layout()
Now you have the tools to fit a GP to a light and get an estimate of the best fit period (and to get an estimate of the uncertainty on that period to boot!).
As previously noted, you should be a bit worried about "burn in" and how the walkers were initialized throughout. If you plan to use GPs to search for periods in your own work, I highly recommend you read Angus et al. 2018 on the GP periodogram. Angus et al. provide far more intelligent methods for initializing the MCMC than what is presented here.