In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

In [2]:
%matplotlib notebook

Conditional Entropy: Can Information Theory Beat the L-S Periodogram?

Version 0.1


By AA Miller 5 June 2019

In this lecture we will examine alternative methods to search for periodic signals in astronomical time series. Earlier, we spent a great deal of time discussing the Lomb-Scargle periodogram. This is the "standard" in astronomy, in part because it was the first (good) method developed for noisy and sparse data.

We will now explore some alternatives to the L-S periodogram, and this is good, because LS: (i) does not handle outliers well, and (ii) works best on purely sinusoidal signals.

An Incomplete Whirlwind Tour

In addition to LS, the following techniques are employed to search for periodic signals:

String Length

The string length method (Dworetsky), phase folds the data at trial periods and then minimizes the distance to connect the phase-ordered observations.

(credit: Gaveen Freer - http://slideplayer.com/slide/4212629/#)

Phase Dispersion Minimization

Phase Dispersion Minimization (PDM; Jurkevich 1971, Stellingwerth 1978), like LS, folds the data at a large number of trial frequencies $f$.

The phased data are then binned, and the variance is calculated in each bin, combined, and compared to the overall variance of the signal. No functional form of the signal is assumed, and thus, non-sinusoidal signals can be found.

Challenge: how to select the number of bins?

(credit: Gaveen Freer - http://slideplayer.com/slide/4212629/#)

Analysis of Variance

Analysis of Variance (AOV; Schwarzenberg-Czerny 1989) is similar to PDM. Optimal periods are defined via hypothesis testing, and these methods are found to perform best for certain types of astronomical signals.

Supersmoother

Supersmoother (Reimann) is a least-squares approach wherein a flexible, non-parametric model is fit to the folded observations at many trial frequncies. The use of this flexible model reduces aliasing issues relative to models that assume a sinusoidal shape, however, this comes at the cost of requiring considerable computational time.

Bayesian Methods

There have been some efforts to frame the period-finding problem in a Bayesian framework. Bretthorst 1988 developed Bayesian generalized LS models, while Gregory & Loredo 1992 applied Bayesian techniques to phase-binned models.

More recently, efforts to use Gaussian processes (GPs) to model and extract a period from the light curve have been developed (Wang et al. 2012). These methods have proved to be especially useful for detecting stellar rotation in Kepler light curves (Angus et al. 2018).

(There will be more on GPs later during this session)

Conditional Entropy

Conditional Entropy (CE; Graham et al. 2013), and other entropy based methods, aim to minimize the entropy in binned (normalized magnitude, phase) space. CE, in particular, is good at supressing signal due to the window function.

When tested on real observations, CE outperforms most of the alternatives (e.g., LS, PDM, etc).

(credit: Graham et al. 2013)

Conditional Entropy

The focus of today's exercise is conditional entropy (CE), which uses information theory and thus, in principle, works better in the presence of noise and outliers. Furthermore, CE does not make any assumptions about the underlying shape of the signal, which is useful when looking for non-sinusoidal patterns (such as transiting planets or eclipsing binaries).

For full details on the CE algorithm, see Graham et al. (2013).

Briefly, CE is based on the using the Shannon entropy (Cincotta et al. 1995), which is determined as follows:

  • Normalize the time series data $m(t_i)$ to occupy a uniform square over phase, $\phi$, and magnitude, $m$, at a given trial period, $p$.
  • Calculate the Shannon entropy, $H_0$, over the $k$ partitions in $(\phi, m)$:

    $$H_0 = - \sum_{i=1}^{k} \mu_i \ln{(\mu_i)}\;\; \forall \mu_i \ne 0,$$ where $\mu_i$ is the occupation probability for the $i^{th}$ partition, which is just the number of data points in that partition divided by the total number of points in the data set.

  • Iterate over multiple periods, and identify the period that minimizes the entropy (recall that entropy measures a lack of information)

As discussed in Graham et al. (2013), minimizing the Shannon entropy can be influenced by the window function, so they introduce the conditional entropy, $H_c(m|\phi)$, to help mitigate these effects. The CE can be calculated as:

$$H_c = \sum_{i,j} p(m_i, \phi_j) \ln \left( \frac{p(\phi_j)}{p(m_i, \phi_j)} \right), $$

where $p(m_i, \phi_j)$ is the occupation probability for the $i^{th}$ partition in normalized magnitude and the $j^{th}$ partition in phase and $p(\phi_j)$ is the occupation probability of the $j^{th}$ phase partition

In this problem we will first calculate the Shannon entropy, then the CE to find the best-fit period of the eclipsing binary from the LS lecture.

Problem 1) Helper Functions

Problem 1a

Create a function, gen_periodic_data, that creates simulated data (including noise) over a grid of user supplied positions:

$$ y = A\,cos\left(\frac{2{\pi}x}{P} - \phi\right) + \sigma_y$$

where $A, P, \phi$ are inputs to the function. gen_periodic_data should include Gaussian noise, $\sigma_y$, for each output $y_i$.


In [3]:
def gen_periodic_data(x, period=1, amplitude=1, phase=0, noise=0):
    '''Generate periodic data given the function inputs
    
    y = A*cos(x/p - phase) + noise
    
    Parameters
    ----------
    x : array-like
        input values to evaluate the array
    
    period : float (default=1)
        period of the periodic signal
    
    amplitude : float (default=1)
        amplitude of the periodic signal
    
    phase : float (default=0)
        phase offset of the periodic signal
    
    noise : float (default=0)
        variance of the noise term added to the periodic signal
    
    Returns
    -------
    y : array-like
        Periodic signal evaluated at all points x
    '''
    
    y = amplitude*np.sin(2*np.pi*x/(period) - phase) + np.random.normal(0, np.sqrt(noise), size=len(x))
    return y

Problem 1b

Create a function, phase_plot, that takes x, y, and $P$ as inputs to create a phase-folded light curve (i.e., plot the data at their respective phase values given the period $P$).

Include an optional argument, y_unc, to include uncertainties on the y values, when available.


In [29]:
def phase_plot(x, y, period, y_unc = 0.0):
    '''Create phase-folded plot of input data x, y
    
    Parameters
    ----------
    x : array-like
        data values along abscissa

    y : array-like
        data values along ordinate

    period : float
        period to fold the data
        
    y_unc : array-like
        uncertainty of the 
    '''    
    phases = (x/period) % 1
    if type(y_unc) == float:
        y_unc = np.zeros_like(x)
        
    plot_order = np.argsort(phases)
    norm_y = (y - np.min(y))/(np.max(y) - np.min(y))
    norm_y_unc = (y_unc)/(np.max(y) - np.min(y))
    
    plt.rc('grid', linestyle=":", color='0.8')
    fig, ax = plt.subplots()
    ax.errorbar(phases[plot_order], norm_y[plot_order], norm_y_unc[plot_order],
                 fmt='o', mec="0.2", mew=0.1)
    ax.set_xlabel("phase")
    ax.set_ylabel("signal")
    ax.set_yticks(np.linspace(0,1,11))
    ax.set_xticks(np.linspace(0,1,11))
    ax.grid()
    fig.tight_layout()

Problem 1c

Generate a signal with $A = 2$, $p = \pi$, and Gaussian noise with variance = 0.01 over a regular grid between 0 and 10. Plot the phase-folded results (and make sure the results behave as you would expect).

Hint - your simulated signal should have at least 100 data points.


In [30]:
x = np.linspace( # complete
y = # complete

# complete plot


Note a couple changes from the previous helper function –– we have added a grid to the plot (this will be useful for visualizing the entropy), and we have also normalized the brightness measurements from 0 to 1.

Problem 2) The Shannon entropy

As noted above, to calculate the Shannon entropy we need to sum the data over partitions in the normalized $(\phi, m)$ plane.

This is straightforward using histogram2d from numpy.

Problem 2a

Write a function shannon_entropy to calculate the Shannon entropy, $H_0$, for a timeseries, $m(t_i)$, at a given period, p.

Hint - use histogram2d and a 10 x 10 grid (as plotted above).


In [35]:
def shannon_entropy(m, t, p):
    '''Calculate the Shannon entropy
    
    Parameters
    ----------
    m : array-like
        brightness measurements of the time-series data
    
    t : array-like (default=1)
        timestamps corresponding to the brightness measurements
    
    p : float
        period of the periodic signal
        
    Returns
    -------
    H0 : float
        Shannon entropy for m(t) at period p
    '''
    
    m_norm = # complete
    phases = # complete
    H, _, _ = np.histogram2d( # complete
    
    occupied = np.where(H > 0)
    H0 = # complete
    
    return H0

Problem 2b

What is the Shannon entropy for the simulated signal at periods = 1, $\pi$-0.05, and $\pi$?

Do these results make sense given your understanding of the Shannon entropy?


In [46]:
print('For p = 1, \t\tH_0 = {:.5f}'.format( # complete
print('For p = pi - 0.05, \tH_0 = {:.5f}'.format( # complete
print('For p = pi, \t\tH_0 = {:.5f}'.format( # complete


For p = 1, 		H_0 = 4.24923
For p = pi - 0.05, 	H_0 = 3.28186
For p = pi, 		H_0 = 3.19802

We know the correct period of the simulated data is $\pi$, so it makes sense that this period minimizes the Shannon entropy.

Problem 2c

Write a function, se_periodogram to calculate the Shannon entropy for observations $m$, $t$ over a frequency grid f_grid.


In [147]:
def se_periodogram(m, t, f_grid):
    '''Calculate the Shannon entropy at every freq in f_grid
    
    Parameters
    ----------
    m : array-like
        brightness measurements of the time-series data
    
    t : array-like
        timestamps corresponding to the brightness measurements
    
    f_grid : array-like
        trial periods for the periodic signal
        
    Returns
    -------
    se_p : array-like
        Shannon entropy for m(t) at every trial freq
    '''
    
    # complete
    for # complete in # complete
        # complete
    
    return se_p

Problem 2d

Plot the Shannon entropy periodogram, and return the best-fit period from the periodogram.

Hint - recall what we learned about frequency grids earlier today.


In [148]:
f_grid = # complete
se_p = # complete

fig,ax = plt.subplots()
# complete
# complete
# complete

print("The best fit period is: {:.4f}".format( # complete


The best fit period is: 3.1445

Problem 3) The Conditional Entropy

The CE is very similar to the Shannon entropy, though we need to condition the calculation on the occupation probability of the partitions in phase.

Problem 3a

Write a function conditional_entropy to calculate the CE, $H_c$, for a timeseries, $m(t_i)$, at a given period, p.

Hint - if you use histogram2d be sure to sum along the correct axes

Hint 2 - recall from session 8 that we want to avoid for loops, try to vectorize your calculation.


In [198]:
def conditional_entropy(m, t, p):
    '''Calculate the conditional entropy
    
    Parameters
    ----------
    m : array-like
        brightness measurements of the time-series data
    
    t : array-like
        timestamps corresponding to the brightness measurements
    
    p : float
        period of the periodic signal
        
    Returns
    -------
    Hc : float
        Conditional entropy for m(t) at period p
    '''
    
    m_norm = # complete
    phases = # complete
    # complete
    # complete
    # complete
    Hc = # complete
    
    return Hc

Problem 3b

What is the conditional entropy for the simulated signal at periods = 1, $\pi$-0.05, and $\pi$?

Do these results make sense given your understanding of CE?


In [199]:
print('For p = 1, \t\tH_c = {:.5f}'.format( # complete
print('For p = pi - 0.05, \tH_c = {:.5f}'.format( # complete
print('For p = pi, \t\tH_c = {:.5f}'.format( # complete


For p = 1, 		H_c = 1.94668
For p = pi - 0.05, 	H_c = 0.98672
For p = pi, 		H_c = 0.90229

Problem 3c

Write a function, ce_periodogram, to calculate the conditional entropy for observations $m$, $t$ over a frequency grid f_grid.


In [200]:
def ce_periodogram(m, t, f_grid):
    '''Calculate the conditional entropy at every freq in f_grid
    
    Parameters
    ----------
    m : array-like
        brightness measurements of the time-series data
    
    t : array-like
        timestamps corresponding to the brightness measurements
    
    f_grid : array-like
        trial periods for the periodic signal
        
    Returns
    -------
    ce_p : array-like
        conditional entropy for m(t) at every trial freq
    '''
    
    # complete
    for # complete in # complete
        # complete
    
    return ce_p

Problem 3d

Plot the conditional entropy periodogram, and return the best-fit period from the periodogram.


In [202]:
f_grid = # complete
ce_p = # complete

fig,ax = plt.subplots()
# complete
# complete
# complete

print("The best fit period is: {:.4f}".format( # complete


The best fit period is: 3.1445

The Shannon entropy and CE return nearly identical results for a simulated sinusoidal signal. Now we will examine how each performs with actual astronomical observations.

Problem 4) SE vs. CE for real observations

Problem 4a

Load the data from our favorite eclipsing binary from this morning's LS exercise. Plot the light curve.

Hint - if you haven't already, download the example light curve.


In [3]:
data = pd.read_csv("example_asas_lc.dat")

fig, ax = plt.subplots()
ax.errorbar( # complete
ax.set_xlabel('HJD (d)')
ax.set_ylabel('V (mag)')
ax.set_ylim(ax.get_ylim()[::-1])
fig.tight_layout()


Problem 4b

Using the Shannon entropy, determine the best period for this light curve.

Hint - recall this morning's discussion about the optimal grid for a period search


In [4]:
f_min = # complete
f_max = # complete
delta_f = # complete

f_grid = # complete

In [208]:
se_p = # complete

print("The best fit period is: {:.9f}".format( # complete


The best fit period is: 0.997269191

Problem 4c

Plot the Shannon entropy periodogram.


In [210]:
fig, ax = plt.subplots()
# complete
# complete
# complete


Problem 4d

Plot the light curve phase-folded on the best-fit period, as measured by the Shannon entropy periodogram.

Does this look reasonable? Why or why not?

Hint - it may be helpful to zoom in on the periodogram.


In [212]:
phase_plot(# complete


Problem 4e

Using the conditional entropy, determine the best period for this light curve.


In [204]:
ce_p = # complete

print("The best fit period is: {:.9f}".format( # complete


The best fit period is: 0.367542842

Problem 4f

Plot the CE periodogram.


In [216]:
fig, ax = plt.subplots()
# complete
# complete
# complete


Problem 4g

Plot the light curve phase-folded on the best-fit period, as measured by the CE periodogram.

Does this look reasonable? If not - can you make it look better?


In [214]:
phase_plot( # complete


This example demonstrates the primary strength of CE over the Shannon entropy.

If you zoom-in on the CE periodogram, there is no power at $p \approx 1\,\mathrm{d}$, unlike the LS periodogram or the Shannon entropy method. This will not be the case for every single light curve, but this is a very nice feature of the CE method. And one reason why it may be preferred to something like LS when analyzing every light curve in LSST.

Challenge Problem) Overlapping Bins

In the previous example we used a simple uniform grid to identify the best-fit period for the eclipsing binary. However, the "best-fit" resulted in an estimate of the half period. One way to improve upon this estimate is to build a grid that has overlapping phase bins. This requirement results in better continuity in the phase-folded light curves (K.Burdge, private communication).

Challenge Problem

Build a function conditional_entropy_overlap that utilizes overlapping bins in the CE calculation.

Can you use this function to identify the correct period of the binary?