In [ ]:
# imports
from __future__ import division, print_function, absolute_import, unicode_literals
import numpy as np
import matplotlib.pyplot as plt
import astropy.coordinates as coords
import astropy.units as u
from astropy.time import Time
from astroML.time_series import \
lomb_scargle, lomb_scargle_bootstrap
from ztf_summerschool import source_lightcurve, barycenter_times
%matplotlib inline
One of the fundamental tasks of time-domain astronomy is determining if a source is periodic, and if so, measuring the period. Period measurements are a vital first step for more detailed scientific study, which may include source classification (e.g., RR Lyrae, W Uma), lightcurve modeling (binaries), or luminosity estimation (Cepheids).
Binary stars in particular have lightcurves which may show a wide variety of shapes, depending on the nature of the stars and the system geometry.
In this workbook we will develop a basic toolset for the generic problem of finding periodic sources.
by Eric Bellm (2014-2016)
Let's use the relative-photometry corrected light curves we built in Exercise 2. We'll use the utility function source_lightcurve
to load the columns MJD, magnitude, and magnitude error. Note that we will use days as our time coordinate throughout the homework.
In [ ]:
# point to our previously-saved data
reference_catalog = '../data/PTF_Refims_Files/PTF_d022683_f02_c06_u000114210_p12_sexcat.ctlg'
outfile = reference_catalog.split('/')[-1].replace('ctlg','shlv')
We'll start by loading the data from our favorite star, which has coordinates $\alpha_\mathrm{J2000}, \delta_\mathrm{J2000} = (312.503802, -0.706603)$.
In [ ]:
ra_fav, dec_fav = (312.503802, -0.706603)
mjds, mags, magerrs = source_lightcurve('../data/'+outfile, ra_fav, dec_fav)
Our times are Modified Julian Date on earth. We need to correct them for Earth's motion around the sun (this is called heliocentering or barycentering). The largest this timing error can be if we do not make this correction is about the light travel time over one AU. We can use astropy
constants to calculate this easily:
In [ ]:
import astropy.constants as const
(const.au / const.c).to(u.minute)
We have provided a script to barycenter the data--note that it assumes that the data come from the P48. Use the bjds
(barycentered modified julian date) variable through the remainder of this notebook.
In [ ]:
bjds = barycenter_times(mjds,ra_fav,dec_fav)
Optional exercise: plot a histogram of the time differences between the barycentered and non-barycentered data.
In [ ]:
# define plot function
def plot_data( # COMPLETE THIS LINE
plt.errorbar( # COMPLETE THIS LINE
fmt = '_', capsize=0)
plt.xlabel('Date (MJD)')
plt.ylabel('Magnitude')
plt.gca().invert_yaxis()
In [ ]:
# run plot function
plot_data(bjds, mags, magerrs)
The Lomb-Scarge Periodogram provides a method for searching for periodicities in time-series data. It is comparable to the discrete Fourier Transform, but may be applied to irregularly sampled data. The periodogram gives as output the relative significance of a least-squares sinusoidal fit to the data as a function of frequency.
Much of this presentation follows Ch. 10 of Ivezic et al..
We use the "generalized" LS version implemented in astroML
rather than the "standard" version implemented in scipy
: the generalized version accounts better for cases of poor sampling.
In [ ]:
# documentation for the astroML lomb_scargle function
help(lomb_scargle)
One of the challenges of using the LS periodogram is determining the appropriate frequency grid to search. We have to select the minimum and maximum frequencies as well as the bin size.
If we don't include the true frequency in our search range, we can't find the period!
If the bins are too coarse, true peaks may be lost. If the bins are too fine, the periodogram becomes very slow to compute.
The first question to ask is what range of frequencies our data is sensitive to.
In [ ]:
freq_min = # COMPLETE
print('The minimum frequency our data is sensitive to is {:.3f} radian/days, corresponding to a period of {:.3f} days'.format(freq_min, 2*np.pi/freq_min)
Determining the highest frequency we are sensitive to turns out to be complicated.
if $\Delta t$ is the difference between consecutive observations, $\pi$/median($\Delta t$) is a good starting point, although in practice we may be sensitive to frequencies even higher than $2 \pi$/min($\Delta t$) depending on the details of the sampling.
What is the largest angular frequency $\omega_{\rm max}$ our data is sensitive to?
In [ ]:
freq_max = # COMPLETE
print('The maximum frequency our data is sensitive to is APPROXIMATELY {:.3f} radian/days, corresponding to a period of {:.3f} days'.format(freq_max, 2*np.pi/freq_max)
We need enough bins to resolve the periodogram peaks, which have frequency width $\Delta f \sim 2\pi/ (t_{\rm max} - t_{\rm min}) = \omega_{\rm min}$. If we want to have 5 samples of $\Delta f$, how many bins will be in our periodogram? Is this computationally feasible?
In [ ]:
n_bins = # COMPLETE
print(n_bins)
In [ ]:
# define frequency function
def frequency_grid(times):
freq_min = # COMPLETE
freq_max = # COMPLETE
n_bins = # COMPLETE
print('Using {} bins'.format(n_bins))
return np.linspace(freq_min, freq_max, n_bins)
In [ ]:
# run frequency function
omegas = frequency_grid(bjds)
In some cases you'll want to generate the frequency grid by hand, either to extend to higher frequencies (shorter periods) than found by default, to avoid generating too many bins, or to get a more precise estimate of the period. In that case use the following code. We'll use a large fixed number of bins to smoothly sample the periodogram as we zoom in.
In [ ]:
# provided alternate frequency function
def alt_frequency_grid(Pmin, Pmax, n_bins = 5000):
"""Generate an angular frequency grid between Pmin and Pmax (assumed to be in days)"""
freq_max = 2*np.pi / Pmin
freq_min = 2*np.pi / Pmax
return np.linspace(freq_min, freq_max, n_bins)
In [ ]:
# calculate and plot LS periodogram
P_LS = lomb_scargle( # COMPLETE
plt.plot(omegas, P_LS)
plt.xlabel('$\omega$')
plt.ylabel('$P_{LS}$')
In [ ]:
# provided: define function to find best period
def LS_peak_to_period(omegas, P_LS):
"""find the highest peak in the LS periodogram and return the corresponding period."""
max_freq = omegas[np.argmax(P_LS)]
return 2*np.pi/max_freq
In [ ]:
# run function to find best period
best_period = LS_peak_to_period(omegas, P_LS)
print("Best period: {} days".format(best_period))
Complete this function that returns the phase of an observation (in the range 0-1) given its period. For simplicity set the zero of the phase to be the time of the initial observation.
Hint: Consider the python modulus operator, %
.
Add a keyword that allows your function to have an optional user-settable time of zero phase.
In [ ]:
# define function to phase lightcurves
def phase(time, period, t0 = None):
""" Given an input array of times and a period, return the corresponding phase."""
if t0 is None:
t0 = time[0]
return # COMPLETE
In [ ]:
# define function to plot phased lc
def plot_phased_lc(mjds, mags, magerrs, period, t0=None):
phases = # COMPLETE
plt.errorbar( #COMPLETE
fmt = '_', capsize=0)
plt.xlabel('Phase')
plt.ylabel('Magnitude')
plt.gca().invert_yaxis()
In [ ]:
# run function to plot phased lc
plot_phased_lc(bjds, mags, magerrs, best_period)
How does that look? Do you think you are close to the right period?
Try re-running your analysis using the alt_frequency_grid
command, searching a narrower period range around the best-fit period.
In [ ]:
omegas = alt_frequency_grid( # COMPLETE
P_LS = lomb_scargle( # COMPLETE
plt.plot(omegas, P_LS)
plt.xlabel('$\omega$')
plt.ylabel('$P_{LS}$')
In [ ]:
best_period = # COMPLETE
print("Best period: {} days".format(best_period))
plot_phased_lc(bjds, mags, magerrs, best_period)
Real data may have aliases--frequency components that appear because of the sampling of the data, such as once per night. Bootstrap significance tests, which shuffle the data values around but keep the times the same, can help rule these out.
Calculate the chance probability of finding a LS peak higher than the observed value in random data observed at the specified intervals: use lomb_scargle_bootstrap
and np.percentile
to find the 95 and 99 percent significance levels and plot them over the LS power.
In [ ]:
D = lomb_scargle_bootstrap( # COMPLETE
sig99, sig95 = np.percentile( # COMPLETE
plt.plot(omegas, P_LS)
plt.plot([omegas[0],omegas[-1]], sig99*np.ones(2),'--')
plt.plot([omegas[0],omegas[-1]], sig95*np.ones(2),'--')
plt.xlabel('$\omega$')
plt.ylabel('$P_{LS}$')
Now find the periods of these sources, plot their phased lightcurves, and evaluate the significance of the period you find:
Suggestion: wrap the code you used above in a function that takes ra & dec as input.
In [ ]:
Try using the gatspy
package to search for periods. It uses a slightly different interface but has several nice features, such as automatic zooming on candidate frequency peaks.
You'll need to read the online documentation or call help(gatspy.periodic.LombScargleFast())
to learn how to which commands to use.
In [ ]:
import gatspy
ls = gatspy.periodic.LombScargleFast()
ls.optimizer.period_range = ( # COMPLETE
# we have to subtract the t0 time so the model plotting has the correct phase origin
ls.fit(bjds-bjds[0],mags,magerrs)
gatspy_period = ls. # COMPLETE
print(gatspy_period)
plot_phased_lc(bjds, mags, magerrs, gatspy_period)
p = np.linspace(0,gatspy_period,100)
plt.plot(p/gatspy_period,ls.predict(p,period=gatspy_period))
Lomb-Scargle is equivalent to fitting a sinusoid to the phased data, but many kinds of variable stars do not have phased lightcurves that are well-represented by a sinusoid. Other algorithms, such as those that attempt to minimize the dispersion within phase bins over a grid of trial phases, may be more successful in such cases. See Graham et al (2013) for a review.
In [ ]:
ss = gatspy.periodic.SuperSmoother(fit_period=True)
ss.optimizer.period_range = ( #COMPLETE
ss.fit( # COMPLETE
gatspy_period = ss. # COMPLETE
print(gatspy_period)
plot_phased_lc(bjds, mags, magerrs, gatspy_period)
p = np.linspace(0,gatspy_period,100)
plt.plot(p/gatspy_period,ss.predict(p,period=gatspy_period))
In [ ]:
from astroML.time_series import multiterm_periodogram
omegas = alt_frequency_grid(.2,1.2)
P_mt = multiterm_periodogram( #COMPLETE
plt.plot(omegas, P_mt)
plt.xlabel('$\omega$')
plt.ylabel('$P_{mt}$')
In [ ]:
best_period = # COMPLETE
print("Best period: {} days".format(best_period))
plot_phased_lc(bjds, mags, magerrs, best_period)
In [ ]:
ls = gatspy.periodic.LombScargle(Nterms=4)
ls.optimizer.period_range = ( # COMPLETE
ls.fit( # COMPLETE
gatspy_period = ls. # COMPLETE
print(gatspy_period)
plot_phased_lc(bjds, mags, magerrs, gatspy_period)
p = np.linspace(0,gatspy_period,100)
plt.plot(p/gatspy_period,ls.predict(p,period=gatspy_period))
This is a big one: can you compute periods for all of the sources in our database with showing evidence of variability? How will you compute variablity? How can you tell which sources are likely to have good periods?
We're giving you less guidance here than before--see how you can do!
In [ ]:
# open the stored data
import shelve
import astropy
shelf = shelve.open('../data/'+outfile)
all_mags = shelf['mags']
all_mjds = shelf['mjds']
all_errs = shelf['magerrs']
all_coords = shelf['ref_coords']
shelf.close()
In [ ]:
# loop over stars
variable_inds = []
best_periods = []
best_power = []
with astropy.utils.console.ProgressBar(all_mags.shape[0],ipython_widget=True) as bar:
for i in range(all_mags.shape[0]):
# make sure there's real data
wgood = (all_mags[i,:].mask == False)
n_obs = np.sum(wgood)
# if we don't have many observations, don't bother computing periods
if n_obs < 40:
continue # the "continue" instruction tells python to skip the rest of the loop for this element and continue with the next one
# COMPLETE: make a cut so you only complete periods on variabile sources
if # source is not variable:
continue
variable_inds.append(i)
bjds = barycenter_times(all_mjds[wgood],all_coords[i].ra.degree,all_coords[i].dec.degree)
# COMPLETE: calculate best period
best_periods.append( # COMPLETE
best_power.append( # COMPLETE: add the LS power here
bar.update()
In [ ]:
# COMPLETE: now find the most promising periods and plot them!
Other effects to consider
Many eclipsing binaries have primary and secondary eclipses, often with comparable depths. The period found by LS (which fits a single sinusoid) will thus often be only half of the true period. Plotting the phased lightcurve at double the LS period is often the easiest way to determine the true period.
Scargle, J. 1982, ApJ 263, 835
Zechmeister, M., and Kürster, M. 2009, A&A 496, 577
Graham, M. et al. 2013, MNRAS 434, 3423
Statistics, Data Mining, and Machine Learning in Astronomy (Ivezic, Connolly, VanderPlas, & Gray)