In [ ]:
# imports
from __future__ import division, print_function, absolute_import, unicode_literals
import numpy as np
import matplotlib.pyplot as plt
import astropy.coordinates as coords
import astropy.units as u
from astropy.time import Time
from astroML.time_series import \
    lomb_scargle, lomb_scargle_bootstrap

from ztf_summerschool import source_lightcurve, barycenter_times
%matplotlib inline

Hands-On Exercise 3: Period Finding


One of the fundamental tasks of time-domain astronomy is determining if a source is periodic, and if so, measuring the period. Period measurements are a vital first step for more detailed scientific study, which may include source classification (e.g., RR Lyrae, W Uma), lightcurve modeling (binaries), or luminosity estimation (Cepheids).

Binary stars in particular have lightcurves which may show a wide variety of shapes, depending on the nature of the stars and the system geometry.

In this workbook we will develop a basic toolset for the generic problem of finding periodic sources.


by Eric Bellm (2014-2016)

Let's use the relative-photometry corrected light curves we built in Exercise 2. We'll use the utility function source_lightcurve to load the columns MJD, magnitude, and magnitude error. Note that we will use days as our time coordinate throughout the homework.


In [ ]:
# point to our previously-saved data
reference_catalog = '../data/PTF_Refims_Files/PTF_d022683_f02_c06_u000114210_p12_sexcat.ctlg'
outfile = reference_catalog.split('/')[-1].replace('ctlg','shlv')

We'll start by loading the data from our favorite star, which has coordinates $\alpha_\mathrm{J2000}, \delta_\mathrm{J2000} = (312.503802, -0.706603)$.


In [ ]:
ra_fav, dec_fav = (312.503802, -0.706603)
mjds, mags, magerrs = source_lightcurve('../data/'+outfile, ra_fav, dec_fav)

Barycentering

Our times are Modified Julian Date on earth. We need to correct them for Earth's motion around the sun (this is called heliocentering or barycentering). The largest this timing error can be if we do not make this correction is about the light travel time over one AU. We can use astropy constants to calculate this easily:


In [ ]:
import astropy.constants as const
(const.au / const.c).to(u.minute)

We have provided a script to barycenter the data--note that it assumes that the data come from the P48. Use the bjds (barycentered modified julian date) variable through the remainder of this notebook.


In [ ]:
bjds = barycenter_times(mjds,ra_fav,dec_fav)

Optional exercise: plot a histogram of the time differences between the barycentered and non-barycentered data.

Exercise 1: Getting started plotting

Complete this function for plotting the lightcurve:


In [ ]:
# define plot function
def plot_data( # COMPLETE THIS LINE
    plt.errorbar( # COMPLETE THIS LINE
        fmt = '_', capsize=0)
    plt.xlabel('Date (MJD)')
    plt.ylabel('Magnitude')
    plt.gca().invert_yaxis()

In [ ]:
# run plot function
plot_data(bjds, mags, magerrs)

The Lomb Scargle Periodogram

The Lomb-Scarge Periodogram provides a method for searching for periodicities in time-series data. It is comparable to the discrete Fourier Transform, but may be applied to irregularly sampled data. The periodogram gives as output the relative significance of a least-squares sinusoidal fit to the data as a function of frequency.

Much of this presentation follows Ch. 10 of Ivezic et al..

We use the "generalized" LS version implemented in astroML rather than the "standard" version implemented in scipy: the generalized version accounts better for cases of poor sampling.


In [ ]:
# documentation for the astroML lomb_scargle function
help(lomb_scargle)

Exercise 2: Determining the frequency grid

One of the challenges of using the LS periodogram is determining the appropriate frequency grid to search. We have to select the minimum and maximum frequencies as well as the bin size.

If we don't include the true frequency in our search range, we can't find the period!

If the bins are too coarse, true peaks may be lost. If the bins are too fine, the periodogram becomes very slow to compute.

The first question to ask is what range of frequencies our data is sensitive to.

Exercise 2.1

What is the smallest angular frequency $\omega_{\rm min}$ our data is sensitive to? (Hint: smallest frequency => largest time)


In [ ]:
freq_min =  # COMPLETE
print('The minimum frequency our data is sensitive to is {:.3f} radian/days, corresponding to a period of {:.3f} days'.format(freq_min, 2*np.pi/freq_min)

Exercise 2.2

Determining the highest frequency we are sensitive to turns out to be complicated.

if $\Delta t$ is the difference between consecutive observations, $\pi$/median($\Delta t$) is a good starting point, although in practice we may be sensitive to frequencies even higher than $2 \pi$/min($\Delta t$) depending on the details of the sampling.

What is the largest angular frequency $\omega_{\rm max}$ our data is sensitive to?


In [ ]:
freq_max =  # COMPLETE
print('The maximum frequency our data is sensitive to is APPROXIMATELY {:.3f} radian/days, corresponding to a period of {:.3f} days'.format(freq_max, 2*np.pi/freq_max)

Exercise 2.3

We need enough bins to resolve the periodogram peaks, which have frequency width $\Delta f \sim 2\pi/ (t_{\rm max} - t_{\rm min}) = \omega_{\rm min}$. If we want to have 5 samples of $\Delta f$, how many bins will be in our periodogram? Is this computationally feasible?


In [ ]:
n_bins =  # COMPLETE
print(n_bins)

Exercise 2.4

Let's wrap this work up in a convenience function that takes as input a list of observation times and returns a frequency grid with decent defaults.


In [ ]:
# define frequency function
def frequency_grid(times):
    freq_min = # COMPLETE
    freq_max = # COMPLETE
    n_bins =   # COMPLETE
    print('Using {} bins'.format(n_bins))
    return np.linspace(freq_min, freq_max, n_bins)

In [ ]:
# run frequency function
omegas = frequency_grid(bjds)

In some cases you'll want to generate the frequency grid by hand, either to extend to higher frequencies (shorter periods) than found by default, to avoid generating too many bins, or to get a more precise estimate of the period. In that case use the following code. We'll use a large fixed number of bins to smoothly sample the periodogram as we zoom in.


In [ ]:
# provided alternate frequency function
def alt_frequency_grid(Pmin, Pmax, n_bins = 5000):
    """Generate an angular frequency grid between Pmin and Pmax (assumed to be in days)"""
    freq_max = 2*np.pi / Pmin
    freq_min = 2*np.pi / Pmax
    return np.linspace(freq_min, freq_max, n_bins)

Exercise 3: Computing the Periodogram

Calculate the LS periodiogram and plot the power.


In [ ]:
# calculate and plot LS periodogram
P_LS = lomb_scargle( # COMPLETE
plt.plot(omegas, P_LS)
plt.xlabel('$\omega$')
plt.ylabel('$P_{LS}$')

In [ ]:
# provided: define function to find best period
def LS_peak_to_period(omegas, P_LS):
    """find the highest peak in the LS periodogram and return the corresponding period."""
    max_freq = omegas[np.argmax(P_LS)]
    return 2*np.pi/max_freq

In [ ]:
# run function to find best period
best_period = LS_peak_to_period(omegas, P_LS)
print("Best period: {} days".format(best_period))

Exercise 4: Phase Calculation

Complete this function that returns the phase of an observation (in the range 0-1) given its period. For simplicity set the zero of the phase to be the time of the initial observation.

Hint: Consider the python modulus operator, %.

Add a keyword that allows your function to have an optional user-settable time of zero phase.


In [ ]:
# define function to phase lightcurves
def phase(time, period, t0 = None):
    """ Given an input array of times and a period, return the corresponding phase."""
    if t0 is None:
        t0 = time[0]
    return # COMPLETE

Exercise 5: Phase Plotting

Plot the phased lightcurve at the best-fit period.


In [ ]:
# define function to plot phased lc
def plot_phased_lc(mjds, mags, magerrs, period, t0=None):
    phases = # COMPLETE
    plt.errorbar( #COMPLETE
        fmt = '_', capsize=0)
    plt.xlabel('Phase')
    plt.ylabel('Magnitude')
    plt.gca().invert_yaxis()

In [ ]:
# run function to plot phased lc
plot_phased_lc(bjds, mags, magerrs, best_period)

How does that look? Do you think you are close to the right period?

Try re-running your analysis using the alt_frequency_grid command, searching a narrower period range around the best-fit period.


In [ ]:
omegas = alt_frequency_grid( # COMPLETE
P_LS = lomb_scargle( # COMPLETE
plt.plot(omegas, P_LS)
plt.xlabel('$\omega$')
plt.ylabel('$P_{LS}$')

In [ ]:
best_period = # COMPLETE
print("Best period: {} days".format(best_period))
plot_phased_lc(bjds, mags, magerrs, best_period)

Exercise 6: Calculating significance of the period detection

Real data may have aliases--frequency components that appear because of the sampling of the data, such as once per night. Bootstrap significance tests, which shuffle the data values around but keep the times the same, can help rule these out.

Calculate the chance probability of finding a LS peak higher than the observed value in random data observed at the specified intervals: use lomb_scargle_bootstrap and np.percentile to find the 95 and 99 percent significance levels and plot them over the LS power.


In [ ]:
D = lomb_scargle_bootstrap( # COMPLETE
sig99, sig95 = np.percentile( # COMPLETE
plt.plot(omegas, P_LS)
plt.plot([omegas[0],omegas[-1]], sig99*np.ones(2),'--')
plt.plot([omegas[0],omegas[-1]], sig95*np.ones(2),'--')
plt.xlabel('$\omega$')
plt.ylabel('$P_{LS}$')

Exercise 7: Find periods of other sources

Now find the periods of these sources, plot their phased lightcurves, and evaluate the significance of the period you find:

  • 312.066287628 -0.983790357518
  • 311.967177518 -0.886275170839
  • 312.263445107 -0.342008023626
  • 312.050877142 -1.0632849268
  • 312.293550866 -0.783896411315

Suggestion: wrap the code you used above in a function that takes ra & dec as input.


In [ ]:

[Challenge] Exercise 8: gatspy

Try using the gatspy package to search for periods. It uses a slightly different interface but has several nice features, such as automatic zooming on candidate frequency peaks.

You'll need to read the online documentation or call help(gatspy.periodic.LombScargleFast()) to learn how to which commands to use.


In [ ]:
import gatspy
ls = gatspy.periodic.LombScargleFast()
ls.optimizer.period_range = ( # COMPLETE
# we have to subtract the t0 time so the model plotting has the correct phase origin
ls.fit(bjds-bjds[0],mags,magerrs)
gatspy_period = ls. # COMPLETE
print(gatspy_period)
plot_phased_lc(bjds, mags, magerrs, gatspy_period)
p = np.linspace(0,gatspy_period,100)
plt.plot(p/gatspy_period,ls.predict(p,period=gatspy_period))

[Challenge] Exercise 9: Alternate Algorithms

Lomb-Scargle is equivalent to fitting a sinusoid to the phased data, but many kinds of variable stars do not have phased lightcurves that are well-represented by a sinusoid. Other algorithms, such as those that attempt to minimize the dispersion within phase bins over a grid of trial phases, may be more successful in such cases. See Graham et al (2013) for a review.


In [ ]:
ss = gatspy.periodic.SuperSmoother(fit_period=True)
ss.optimizer.period_range = ( #COMPLETE
ss.fit( # COMPLETE
gatspy_period = ss. # COMPLETE
print(gatspy_period)
plot_phased_lc(bjds, mags, magerrs, gatspy_period)
p = np.linspace(0,gatspy_period,100)
plt.plot(p/gatspy_period,ss.predict(p,period=gatspy_period))

[Challenge] Exercise 10: Multi-harmonic fitting

Both AstroML and gatspy include code for including multiple Fourier components in the fit, which can better fit lightcurves that don't have a simple sinusoidal shape (like RR Lyrae).


In [ ]:
from astroML.time_series import multiterm_periodogram
omegas = alt_frequency_grid(.2,1.2)
P_mt = multiterm_periodogram( #COMPLETE
plt.plot(omegas, P_mt)
plt.xlabel('$\omega$')
plt.ylabel('$P_{mt}$')

In [ ]:
best_period = # COMPLETE
print("Best period: {} days".format(best_period))
plot_phased_lc(bjds, mags, magerrs, best_period)

In [ ]:
ls = gatspy.periodic.LombScargle(Nterms=4)
ls.optimizer.period_range = ( # COMPLETE
ls.fit( # COMPLETE
gatspy_period = ls. # COMPLETE
print(gatspy_period)
plot_phased_lc(bjds, mags, magerrs, gatspy_period)
p = np.linspace(0,gatspy_period,100)
plt.plot(p/gatspy_period,ls.predict(p,period=gatspy_period))

[Challenge] Exercise 11: Compute all periods

This is a big one: can you compute periods for all of the sources in our database with showing evidence of variability? How will you compute variablity? How can you tell which sources are likely to have good periods?

We're giving you less guidance here than before--see how you can do!


In [ ]:
# open the stored data
import shelve
import astropy
shelf = shelve.open('../data/'+outfile)
all_mags = shelf['mags']
all_mjds = shelf['mjds']
all_errs = shelf['magerrs']
all_coords = shelf['ref_coords']
shelf.close()

In [ ]:
# loop over stars
variable_inds = []
best_periods = []
best_power = []

with astropy.utils.console.ProgressBar(all_mags.shape[0],ipython_widget=True) as bar:
    for i in range(all_mags.shape[0]):
        # make sure there's real data
        wgood = (all_mags[i,:].mask == False)
        n_obs = np.sum(wgood)
        # if we don't have many observations, don't bother computing periods
        if n_obs < 40:
            continue # the "continue" instruction tells python to skip the rest of the loop for this element and continue with the next one
            
        # COMPLETE: make a cut so you only complete periods on variabile sources
        if # source is not variable: 
            continue
        variable_inds.append(i)
        bjds = barycenter_times(all_mjds[wgood],all_coords[i].ra.degree,all_coords[i].dec.degree)
        # COMPLETE: calculate best period
        best_periods.append( # COMPLETE
        best_power.append( # COMPLETE: add the LS power here
        bar.update()

In [ ]:
# COMPLETE: now find the most promising periods and plot them!

Other effects to consider

Many eclipsing binaries have primary and secondary eclipses, often with comparable depths. The period found by LS (which fits a single sinusoid) will thus often be only half of the true period. Plotting the phased lightcurve at double the LS period is often the easiest way to determine the true period.

References and Further Reading

Scargle, J. 1982, ApJ 263, 835

Zechmeister, M., and Kürster, M. 2009, A&A 496, 577

Graham, M. et al. 2013, MNRAS 434, 3423

Statistics, Data Mining, and Machine Learning in Astronomy (Ivezic, Connolly, VanderPlas, & Gray)

gatspy documentation