Which type of cycle could we detect?

Maurice Wilson extracted X-ray lightcurves for stars in the Chandra Deep Field South (CDFS). Unfortunately, the number of stars in this field is small and most stars are faint (after all, the CDFS was specifically choosen to contain few stars). In the end, four lightcurves proved to be of sufficient quality to actually look for cycles. Looking at those lightcruves, we don't see a cyclic behaviour. In this notebook, we use Maurice's results to

  • quantify that we don't see a cycle
  • run Monte-Carlo (MC) simulations to see what type of cycle we would have detected, given the time sampling at the relative errors of the dataset.

In [60]:
import numpy as np
from numpy.random import randn, rand
import matplotlib.pyplot as plt
from astropy.table import Table, Column

from lombscargle import lombscargle

%matplotlib inline

In [35]:
def vary_flux(flux, error):
    '''Vary flux in a lightcurve using the error values
    
    The Lomb-Scargle formalism by itself does not make use of the errors of a lightcurve.
    One way to account for this is to generate many different realizations of the observed
    lightcurve and run each of them through the Lomb-Scargleperiodigram. 
    Each realization starts with the measured fluxes and varies each point according to the 
    errors.
    
    Parameters
    ----------
    flux : np.ndarray
        Array of flux values
    error : float or np.ndarray
        Absolute (not relative) error in the same units as flux. If this is a float number, the same error is applied
        for every data point. Otherwise, error and flux must have the same shape.
        
    Returns
    -------
    new_flux : np.ndarray
        Flux varied according to the errors
    '''
    return flux + error * randn(len(flux))

In [38]:
def run_simulations(time, flux, error, n, **kwargs):
    '''Simulate n lightcurves based on time and flux and run Lomb-Scargle periodogram
    
    Parameters
    ----------
    time : np.ndarray
        Time of measurement
    flux : np.ndarray
        Flux of each data point
    error : np.ndarray or float
        Absolute error for each datapoint in the same units as flux. This can be a single number if
        the error is the same for every datapoint or must have the same dimension as ``flux``.
    n : integer
        number of lightcurves to run
        
    Additional parameters
    ---------------------
    All other parameters are passed to :meth:`lombscargle.lombscargle`.
    
    Return
    -------
    periods : np.ndarray
        Array of best-fit periods
    faps : np.ndarray
        Array of the FAP for the highest peak in the periodogram
    '''
    if time.shape != flux.shape:
        raise ValueError('time and flux must have the same shape.')
    if np.isscalar(error) or (time.shape == error.shape):
        pass
    else:
        raise ValueError('Error must be skalar or have the same shape as time and flux.')
    
    faps = np.ones(n)
    periods = np.ones(n)
    
    for i in range(n):
        fsim = vary_flux(flux, error)
        period, sig, fap = lombscargle(time, fsim, **kwargs)
        faps[i] = fap
        periods[i] = period

    return periods, faps

In [23]:
# This is copied directly form Maurice's file
# If we want to make this look nicer, we can reduce it easily to two digits, that's totally
# sufficient given the error. 

source_id=['033222.56-275805.3','033242.02-275702.4','033249.66-275454.1','033258.54-275007.7']

# XMM results
time_xmm=[round(2001.0+(7*30+27)/365.0,1), round(2002.0+(1*30+14)/365.0,1), round(2008.0+(7*30+8)/365.0,1), round(2009.0+(1*30+15)/365.0,1),round(2009.0+(7*30+17)/365.0,1), round(2010.0+(1*30+25)/365.0,1)]

# source 033222.56-275805.3
flux_1=[1.02093263948e-15, 1.4023656976e-15, 1.21796919022e-15, 7.61527573465e-16, 1.15050930915e-15, 8.02283879609e-16]
temp_1=[0.18240965615889917, 0.25480632438450962, 0.25973552298767516, 0.2560681575962182, 0.3088640073826231, 0.25257464796209983]
temp_error_1=[0.15513387660171216, 0.026314085335443049, 0.053685422768912749, 0.033057363672080398, 0.095853150025202893, 0.030654220228148787]
norm_1=[6.2345984945516415e-07, 6.8004369454762676e-07, 5.8602144135358495e-07, 3.6854343208592376e-07, 5.2324336899917542e-07, 3.9043696536480364e-07]
norm_error_1=[4.9521526982736924e-07, 1.490072296180406e-07, 3.4486018790239541e-07, 1.1515946397126324e-07, 3.5670420167338167e-07, 1.1803071891999829e-07]

# source 033242.02-275702.4 *xmm does reveal this source in 6 epochs (unlike chandra with only 5 epochs)
flux_2=[3.11892857175e-15, 5.97536861782e-15, 1.2724327666e-14,5.43247474047e-15, 7.16908958545e-15, 6.62256808764e-15]
temp_2=[0.49281818082422901, 0.28479756190785194, 0.32043468229563948, 0.29938692076433321, 0.31464315231523438, 0.30508074774485849]
temp_error_2=[0.44642130943900638, 0.0081175682309818087, 0.0090444933964847762, 0.0067682319363868704, 0.011799940576909196, 0.007490241489250149]
norm_2=[1.082345594922226e-06, 2.7833002903899436e-06, 5.7219492340666588e-06, 2.4938477888162987e-06, 3.2420603146963446e-06, 3.0231218926118239e-06]
norm_error_2=[5.4636633915219948e-07, 1.2227957144281039e-07, 3.884164358304641e-07, 1.2075590557350081e-07, 3.0242181768301826e-07, 1.3311037084304318e-07]

# source 033249.66-275454.1
flux_3=[2.13895397873e-15, 1.8120254364e-15, 1.83421508989e-15, 1.23572182734e-15, 1.19988372937e-15, 1.61193859009e-15]
temp_3=[0.26813120590789091, 1.0310822710109184, 0.33303486319044778, 0.45356362785143461, 0.49932939080846028, 0.94500205087698386]
temp_error_3=[0.093813167222834948, 0.041543003786878696, 0.060462946154865915, 0.62055761664359022, 0.24808595944785039, 0.063643454172429004]
norm_3=[1.0156660039728918e-06, 6.6963511622204855e-07, 8.1485607117359254e-07, 4.5225745015492949e-07, 4.1284161987758255e-07, 5.4609391223995934e-07]
norm_error_3=[3.8840379378773392e-07, 1.0634150123539027e-07, 2.3544330724461885e-07, 1.1813340171659508e-07, 2.601044195361493e-07, 1.3986160269961403e-07]

# source 033258.54-275007.7
flux_4=[6.43665812836e-16, 4.2499913015e-16, 8.71124883149e-16, 4.88660551856e-16, 6.0718871376e-16, 5.09176758637e-16]
temp_4=[0.33610582053661525, 0.52845329064127111, 0.26609731135896814, 0.33137904656105449, 0.74393271687184515, 0.4984112863661071]
temp_error_4=[0.39573502850862302, 0.45759106887381518, 0.05412402503616881, 0.082287290514230838, 0.49098992077954934, 0.28948114915506651]
norm_4=[2.8511117512644155e-07, 1.4086024387973206e-07, 4.1496412490059993e-07, 2.1743423372177997e-07, 1.8707287732417253e-07, 1.7540214420559911e-07]
norm_error_4=[3.0604223431090917e-07, 1.0035172501079886e-07, 1.2958759446285109e-07, 1.0463641561818527e-07, 1.2590796400861138e-07, 1.056270073022812e-07]


# Chandra results
time_chandra=[round(1999.0+(11*30+4)/365.0,1), round(2000.0+(9*30+15)/365.0,1),round( 2007.0+(10*30+20)/365.0,1), round(2010.0+(4*30+0)/365.0,1), round(2010.0+(6*30+3)/365.0,1), round(2010.0+(7*30+13)/365.0,1)]


# source 033222.56-275805.3
Flux_1=[3.00101993037e-15, 5.24503736502e-16, 4.94720821958e-16, 4.39139871296e-16, 3.25217402226e-16, 6.78572836864e-16]
Temp_1=[13.65390935444954, 3.5854186062483189, 0.56196607015933298, 0.76977776453011193, 0.54370572511920301,4.0690442061129959]
Temp_error_1=[0.00,3.5577619864983121, 3.005680883610081, 0.59998569860597506, 0.00, 3.8864614469855852]
Norm_1=[1.7446886952901496e-06, 3.1595992114431002e-07, 1.5902610834473347e-07, 1.3547714146209467e-07, 1.0540972206823163e-07, 4.012421597485776e-07]
Norm_error_1=[7.8328510559146982e-06, 4.944662358251794e-07, 6.2940214009182143e-07, 6.4465686903276242e-08, 7.0e-06, 9.1854837556664324e-07]  # the second from last is actually 0.00
time_chandra_source1=[round(1999.0+(11*30+4)/365.0,1), round(2000.0+(9*30+15)/365.0,1), round(2004.0+(11*30+16)/365.0,1), round(2007.0+(10*30+20)/365.0,1),round(2010.0+(4*30+0)/365.0,1), round(2010.0+(7*30+13)/365.0,1)]

# source 033242.02-275702.4  *both 2004 and the last 2010 epoch do not have this source.
Flux_2=[1.78490773266e-15,2.83737940738e-15, 3.58919994119e-15, 3.32334178409e-15, 2.17106078572e-15]
Temp_2=[6.845034991474563, 0.90307985416997161, 0.93023527976750076, 0.88996804034911858, 1.0862051356302032]
Temp_error_2=[6.81047310697992, 0.14034289681170031, 0.062558171033882504, 0.18530962973098164, 0.52551080157691066]
Norm_2=[9.9469908568859262e-07,9.2176347968674455e-07, 1.1962190913131731e-06, 1.066626286325222e-06,8.4980820977232287e-07]
Norm_error_2=[1.7238846740737122e-06, 9.7868616936726455e-08, 1.3326933992274849e-07, 1.6561873000557657e-07, 3.1741947198744341e-07]
time_chandra_source2=[round(1999.0+(11*30+4)/365.0,1), round(2000.0+(9*30+15)/365.0,1),round( 2007.0+(10*30+20)/365.0,1), round(2010.0+(4*30+0)/365.0,1), round(2010.0+(6*30+3)/365.0,1)]

# source 033249.66-275454.1
Flux_3=[8.20916658257e-16,1.33616525953e-15, 1.54522434222e-15, 1.1930170241e-15, 2.19288222097e-15,2.71553681637e-16]
Temp_3=[1.0639210796276415,1.8205718501781496, .5406498186708222, 1.3830770181895267, 2.1483764962747349,1.4828429263098994]
Temp_error_3=[1.0516918541598299, 2.9725409562500307, 0.4144803318491721, 0.71261263813427544, 3.2948693002958356, 0.00]
Norm_3=[3.1360563532138893e-07,8.1578481658679114e-07, 8.5759018039556895e-07,6.1901898045499227e-07, 1.3986970141689128e-06, 1.4695709654681712e-07]
Norm_error_3=[3.2891029485899888e-07, 7.0283401455265046e-07, 2.9333685810923749e-07, 3.4984826055541969e-07, 7.9038794992726062e-07, 2.3016436993617791e-06]


# source 033258.54-275007.7
Flux_4=[8.86136211942e-16, 6.27207045296e-16, 7.7821516777e-16,1.05071981113e-15, 9.21822407188e-16, 9.80483839542e-16]
Temp_4=[13.652986339916701,2.2092587062333187, 1.2163869588618761, 13.657233247937798, 5.1374081326645396, 3.2230852920542752]
Temp_error_4=[0.00, 3.2994706894631656, 0.4768261471372377, 9.6516177856482095, 3.178248557816489, 2.4811275573934664]
Norm_4=[5.1516271629850311e-07, 4.003349418732643e-07, 3.4298099600532035e-07,6.1087790417612043e-07, 5.2666744423975356e-07, 6.0020602072084644e-07]
Norm_error_4=[1.3560250417294774e-06,3.3935375357415921e-07, 1.5190329598684497e-07, 6.2090734643988562e-07, 4.9039359468458521e-07, 7.363572129809017e-07]

In [26]:
source1 = Table({'time': time_xmm + time_chandra_source1, 'norm': norm_1 + Norm_1, 
                 'e_norm': norm_error_1 + Norm_error_1, 'flux': flux_1 + Flux_1}, 
                meta={'name': '033222.56-275805.3'})
source2 = Table({'time': time_xmm + time_chandra_source2, 'norm': norm_2 + Norm_2, 
                 'e_norm': norm_error_2 + Norm_error_2, 'flux': flux_2 + Flux_2}, 
                meta={'name': '033242.02-275702.4'})
source3 = Table({'time': time_xmm + time_chandra, 'norm': norm_3 + Norm_3, 
                 'e_norm': norm_error_3 + Norm_error_3, 'flux': flux_3 + Flux_3}, 
                meta={'name': '033249.66-275454.1'})
source4 = Table({'time': time_xmm + time_chandra, 'norm': norm_4 + Norm_4, 
                 'e_norm': norm_error_4 + Norm_error_4, 'flux': flux_4 + Flux_4},
                meta={'name': '033258.54-275007.7'})

for s in [source1, source2, source3, source4]:
    s.sort('time')
    s.add_column(Column(name='e_flux', data=s['flux'] * s['e_norm']/s['norm']))

In [28]:
# Check that I read in all the right arrays and the lightcurves look like they do in Maurice's report.
# I don't bother to put labels on this plot. It's only here to compare with Maurice pdf to make sure I did not 
# anything stupid and mixed up the Chandra and XMM times.

fig = plt.figure(figsize=(8,6))
for i, s in enumerate([source1, source2, source3, source4]):
    ax = fig.add_subplot(2, 2, i)
    ax.errorbar(s['time'], s['flux']*1e15, yerr=s['e_flux']*1e15)
    ax.set_title(s.meta['name'])
    ax.set_ylim([0, None])


Quantify that we don't see a cycle

Now, let's ma many realisation for each lightcurve and see in how many cases we detect periodicity. If the lightcurve, is really periodic, then we would expect that we detect the periodicity, even if we vary the flux within the errors in most cases. If there is no real periodicity (e.g. the lightcurve is constant), then we might still occasionally find periodic lightcurves by chance, but this should be rare.

In this paragraph, I've use words like rare and should see. We can put that in mathematical terms and say, we exepct to find the periodicity with a false-alarm-probability (FAP) below 30% in at least 68% of all realizations. This is a very loose definition compared to what most people use, and we can discuss the exact numbers, but I used them here, because that's what we used in Hoffmann et al. (2012).


In [43]:
n = 1000

all_periods = np.zeros((4, n))
all_faps = np.zeros((4, n))

for i, s in enumerate([source1, source2, source3, source4]):
    periods, faps = run_simulations(s['time'], s['flux'], s['e_flux'], n=n,
                                    maxper = 15, oversamp=4, maxfreq=1)
    all_periods[i,:] = periods
    all_faps[i,:] = faps

In [58]:
fig = plt.figure(figsize=(8,6))
for i, s in enumerate([source1, source2, source3, source4]):
    ax = fig.add_subplot(2, 2, i+1)
    ax.hist(all_faps[i,:], cumulative=True, range=[0,1], normed=True)
    ax.set_title(s.meta['name'])
    ax.set_ylim([0, None])
    ax.barh(0.7, 0.3, left=0, height=0.3, color='r')
    
text = fig.axes[2].set_xlabel('FAP')
text = fig.axes[3].set_xlabel('FAP')
text = fig.axes[0].set_ylabel('fraction of MC runs that pass the FAP threshold')


The plots above show a cumulative, normed histogram in blue (i.e. the last bin always is always 1) of the FAP for the MC run. The first bin shows the fraction of runs with FAP < 0.1, the second bin shows the fraction of runs with FAP < 0.2, and so on. If the histrogram hits the red area, thee we have detected a significant cycle according to the definition above (at least 68% of all runs have FAP < 0.3).

Not surpristingly, that's not the case for any of the four sources that we have. We knew that before, but I wanted to show here that we can do that in a quantitative way. You may notice that the plots look very much like in Hoffmann et al. (2012). That's not accident. Although I don't have the original plotting code, I tried to make it look similar because it helps us compare and while we don't have any periodic star right now, when we write up our paper two years from now, we probably want to incorporate every X-ray observation ever done and run the same code on every object to assemble, so it's good to have that ready already.

What type of cycles could we detect?

Again, I'll follow closely along the lines of Hoffmann et al. (2012). We can think about how we improve on that, but for now I think that's good enough. And again, I don't have the original code we used in that paper, but I'll write up something that's similar.


In [59]:
def find_cycle(time, flux, error, n, **kwargs):
    '''Wrap `run_simulations` into a simple True/False question
    
    The :meth:`run_simulations` return a whole lot of information:
    A list of FAPs and periods found for every single realization of a lightcurve. 
    Here, we wrap that in a simple yes/no condition, which allows us to mark the area
    on a grid where we would have detected something.
    '''
    periods, faps = run_simulations(time, flux, error, n, **kwargs)
    return (faps <= 0.3).sum() > 0.68*n

In [61]:
def lc_sinussoidal(time, s, p):
    '''General sinusiodal lightcurve
    
    The lightcurve has a constant plus a periodic component.
    This lightcurve adds a random phase to the lightcurve, because we don't know
    what phase ``t=0`` corresponds to. This lightcurve is always normalized to a mean of 1.
    
    Parameters
    ----------
    time : np.ndarray
        Times at which the lightcruve is sampled
    s : float
        Ratio between constant and periodic component. ``s=0`` corresponds to a constant
        lightcurve, ``s=1`` to a fully periodic one.
    
    Returns
    -------
    lc : np.ndarray
        fluxes sampled at times `time`.
    '''    
    delta = rand()*2*np.pi
    return 1. + s * np.sin(2. * np.pi * time / p + delta)

In [66]:
def analyze_sim_lcs(time, fluxfunc, error, n, fluxfuncargs={}, **kwargs):
    '''Analyse simulated lightcurves with Lomb-Scargle periodogram
    
    This function simulates ``n`` lightcurves with the same underlying form, add errors to them
    and rund the Lomb-Scargle Periodogram. It returns the period and FAP found for each lightcurve.
    This can be used to test if a certain periodic signature is detectable, given the cadence
    and observational errors.
    
    Parameters
    ----------
    time : np.ndarray
        Time of measurement
    fluxfunc : function
        Function that generates a simulated lightcurve with the following signature::
        
            flux = fluxfunc(time, **fluxfuncargs)
        
    error : np.ndarray or float
        Absolute error for each datapoint in the same units as flux. This can be a single number if
        the error is the same for every datapoint or must have the same dimension as ``flux``.
    n : integer
        number of lightcurves to run
    fluxfuncargs : dictionary
        names arguments for ``fluxfunc``
        
    Additional parameters
    ---------------------
    All other parameters are passed to :meth:`lombscargle.lombscargle`.
    
    Return
    -------
    periods : np.ndarray
        Array of best-fit periods
    faps : np.ndarray
        Array of the FAP for the highest peak in the periodogram
    '''
    if np.isscalar(error) or (time.shape == error.shape):
        pass
    else:
        raise ValueError('Error must be skalar or have the same shape as time and flux.')
    
    faps = np.ones(n)
    periods = np.ones(n)
    
    for i in range(n):
        fsim = vary_flux(fluxfunc(time, **fluxfuncargs), error)
        period, sig, fap = lombscargle(time, fsim, **kwargs)
        faps[i] = fap
        periods[i] = period

    return periods, faps

Below, I try a few different values of period and amplitude with the time cadence and relative error of our real sources. We could of course run a grid and try out e.g. periods from 1 to 15 years and relative amplitudes from 0 to 1 systematically. However, as you will see when looking at the plots, findin anything in the data we have is almost impossible. That's the bad news.


In [101]:
# These are just examples here. You cna try out other numbers, but it's going to look similar (I tried it).
fig = plt.figure(figsize=(8,6))
for i, (s, p, amp) in enumerate(zip([source1, source2, source3, source4], [3,4,5,6], [0.8,1.,0.9,0.5])):
    ax = fig.add_subplot(2, 2, i+1)
    p, fap = analyze_sim_lcs(s['time'], lc_sinussoidal, s['e_flux']/s['flux'], 1000, 
                             {'s': amp, 'p': p})
    ax.hist(fap, cumulative=True, range=[0,1], normed=True)
    ax.set_title(s.meta['name'])
    ax.set_ylim([0, None])
    ax.barh(0.7, 0.3, left=0, height=0.3, color='r')
    
text = fig.axes[2].set_xlabel('FAP')
text = fig.axes[3].set_xlabel('FAP')
text = fig.axes[0].set_ylabel('fraction of MC runs that pass the FAP threshold')


However, here is the good news: Below I run a similar experiment, but I add three points in the timeline in 2014 and 2015 for the new data with the next 3 Ms of Chandra time in the CDFS. Since we don't know yet how good that data will be, I just assume that all errors are the median errors in the existing data As you can see, that increases out chances to find something dramatically.


In [106]:
period = np.arange(1,16)
amplitude = np.arange(0,1.01, .1)

#p, fap = analyze_sim_lcs(source1['time'], lc_sinussoidal, source1['e_flux']/source1['flux'], 1000, {'s': 1, 'p': 5})

p, fap = analyze_sim_lcs(np.array([ 1999.9,  2000.8,  2001.6,  2002.1,  2007.9,  2008.6,  2009.1,
        2009.6,  2010.2,  2010.3,  2010.5, 2014.5, 2015., 2015.5]), 
                         lc_sinussoidal, np.median(source2['e_flux']/source2['flux']), 1000, {'s': 1, 'p': 3})
plt.hist(fap, cumulative=True, range=[0,1], normed=True)
plt.title('Example incl. 3 Ms in 2014 and 2015')
plt.ylim([0, None])
plt.barh(0.7, 0.3, left=0, height=0.3, color='r')


Out[106]:
<Container object of 1 artists>

So, in a very favourable case (low errors, short period of only 3 years, using source 2 as a template, which is the best source in our current sample, maximum possible amplitude), we will be able to detect periodicity (The blue histogram intersets the red "significant detection" area). I also tried slightly less favourable conditions (e. g. 5 year period) and unfortunately that will be just outside the detection area. So, it;s still not ideal, but having the extra data will certainly push us much closer to detecting anything. We won't find a solar-like cycle, but we might be sensitive to shorter periods. We don't know yet how the 3 Ms will be schedules. In the simulation above, I have jsut added three more datapoints, but it's not unlikely, that the 3 Ms will we split over a longer time and obviously each datapoint will push us much closer into the regime where we can either find or confidently reject stellar cycles.

Looking at this figure, I am absolutely convinced that we took the right decision when we said that for the moment an AAS poster is good enough, but we will wait for a full article until the new data is available.