# Kālī

Kālī is a software package written to model and study determinstic and stochastic astronomical light curves, i.e. ordered lists of flux as a function of time. Kālī provides two models for astronomical light curves:

1. Continuous-time Autoregressive Moving Average (C-ARMA) processes (stochastic).
2. Massive Black Hole Binaries (MBHBs) with relativistic doppler beaming (deterministic).
3. Massive Black hole Binaries (MBHBs) with relativistic doppler beaming (stochastic, CARMA process).

## Concepts

CARMA processes are defined as solutions to the differential equation $$\mathrm{d}^{p}x + \alpha_{1} \mathrm{d}^{p-1}x + \ldots + \alpha_{p-1} \mathrm{d}x + \alpha_{p}x = \beta_{0} (\mathrm{d}W) + \ldots + \beta_{q} \mathrm{d}^{q}(\mathrm{d}W).$$ Realize that this differential equation is just a vanilla linear differential equation that happens to be driven by a noise process. For example, for a CARMA(2,1) process, the defining differential equation is $$\mathrm{d}^{2}x + \alpha_{1} \mathrm{d}x + \alpha_{2}x = \beta_{0} (\mathrm{d}W) + \beta_{1} \mathrm{d}^{1}(\mathrm{d}W),$$ which we can cast into slightly friendlier form by re-defining the coefficients on the LHS as $$\mathrm{d}^{2}x + 2 \zeta \omega \mathrm{d}x + \omega^{2} x = \beta_{0} (\mathrm{d}W) + \beta_{1} \mathrm{d}^{1}(\mathrm{d}W).$$ Astute readers will notice that the LHS is just a damped harmonic oscillator making the whole equation that of a forced damped harmonic oscillator! The forcing term is a 'blue'-noise process.

Kālī consists of multiple Python sub-modules that call underlying c++ libraries using Cython. All the sub-modules are structured around two concepts: lc objects that represent light curves and task objects that take lc objects as inputs and perform useful work such as fitting light curves to various models, both stochastic and non-stochastic. The lc aobjects are abstract objects that must be inherited from to make concrete instantiable sub-classes such as the mockLC, externalLC, sdssLC, k2LC and the crtsLC objects. The abstract lc and the concrete mockLC & externalLC classes are defined in the lc.py file. The concrete sdssLC, k2LC, and crtsLC objects are all defined in thier respective s82.py, k2.py, and crts.py files.

The two concrete classes mockLC and externalLC are designed for slightly different purposes. mockLCs are to be used for simulated data generated by various task objects while externalLCs are to be used when performing one-off analysis of a real light curve. Typically, if multiple real light curves are to be studied with Kālī, it is better to define custom lc classes such as the sdssLC defined in s82.py and the k2LC defined in the k2.py files. A quick look at these files shows how the two classes deal in very different ways with getting SDSS and K2 light curves. The SDSS lightcurves are hosted on multiple author-run servers and the class uses zeromq to talk to these servers and fetch the required light curves from them. The K2 data are fetched from the NASA MAST servers where they are hosted on-demand. The idea is to let data-set specific sub-classes of lc specialize for the data-set being used and make the remaining package agnostic to the source of the data. Later on in this document, we will provide guidelines for creating inherited sub-classes of lc.

We shall begin by introducing the reader to the CARMATask object.

## CARMATasks and mockLCs

The CARMATask object allows us to perform C-ARMA analysis on light curves. In order to use a CARMATask, it is necessary to import the carma module. After this, we can create a new basic task object called firstNewTask to study the C-ARMA(2,1) model as follows:



In [1]:

import kali.carma



All task objects contain one or more models i.e. each task object is capable of being used to study multiple models simultaneously. The only restriction is that the models held by each task object have to have the order i.e. in the case of CARMATasks that are designed for C-ARMA models, all the C-ARMA models held by a single CARMATask have to have the the same p & q order. Of course, this does not prevent us from having multiple task objects declared at once. By default, the task object initializer will query your system to determine how many hardware threads (including Intel Hyperthreading) your CPU supports and will hold that number of models. However, this behavior can be changed by explicitly instructing the task constructor to create a specific number of models in the task as follows:



In [2]:



anotherNewTask is a CARMATask that holds two C-ARMA(3,1) models for use. In the case of CARMATasks, we can specify how models will burn in light curves when generating simulated light curves by initializing the yetAnotherNewTask object as follows:



In [3]:



Here yetAnotherNewTask is a CARMATask that holds a single C-ARMA(4,2) model. Any simulated light curves generated by yetAnotherNewTask will automatically be burnt in for 100,000 steps. Other parameters that can be passed to the CARMATask constructor include nwalkers, nsteps etc... Please refer to the code documentation for the exact list of parameters accepted by the constructor. For the rest of this section, we will make a new CARMATask object newTask and set the number of steps taken by the MCMC walkers to 500 to properly sample our posterior space.



In [4]:



At this point, our CARMATask objects are hollow objects that lack numerical meaning. We must specify C-ARMA model parameters to be able to do useful things with our tasks. Two model parameters are required - a double precision value for the time-step dt to be used for simulations and a numpy ndarray of doubles Theta for the C-ARMA model coefficients. Mathematically, an individual C-ARMA model can be specified using either the co-efficients representation or the roots representation. The roots representation is much more convenient to use when providing model parametrs because it is trivial to pick valid values. However, Kālī requires that the C-ARMA model be specified purely in terms of the C-ARMA model coefficients and provides helper functions to convert the roots representation to the coeffiecients represenation. Let us set the first C-ARMA model in the newTask object to have an autocorrelation function that decays with two timescales - 107.8 d and 43.2 d. We shall set the MA timescale to be 5.5 d and set the amplitude of the light curve (i.e. the square root of the asymptotic autocorrelation function) to be 1.0



In [5]:

r_1 = (1.0/107.8) + 0j
r_2 = (-1.0/33.2) + 0j
m_1 = (-1.0/5.5) + 0j
amp = 1.0



Note that we have purposely made the first root positive i.e. the C-ARMA model corresponding to this set of roots will be unstable. We can use the helper function kali.carma.coeffs(p, q, Rho) to compute the polynomial coefficients representation corresponding to this roots representation as follows:



In [6]:

import numpy as np
Rho = np.array([r_1, r_2, m_1, amp])
Theta = kali.carma.coeffs(2, 1, Rho)
print Theta




[  2.08440441e-02  -2.79410779e-04   1.81818182e-01   1.00000000e+00]



We can check the returned Theta for validity using the check(Theta) method of the CARMATask object newTask as follows:



In [7]:




False



After fixing the bad root, we can rerun the check as follows



In [8]:

r_1 = (-1.0/107.8) + 0j
Rho = np.require(np.array([r_1, r_2, m_1, amp]), requirements=['F', 'A', 'W', 'O', 'E'])
Theta = kali.carma.coeffs(2, 1, Rho)
print Theta




[ 0.03939692  0.00027941  0.0046724   0.0256982 ]
True



Note that we wrapped the definition of Rho in a np.require call that ensures that the underlying data in Rho is contiguous in memory and can thus be accesses by simple pointer arithmetic in the underlying c++ library. Although it is not necesary to wrap such small arrays in the np.require call, it is good practice and can help avoid mysterious memeory related errors as well as silent garbage output caused by an incorrect interpretation of a pointer as data.

After having established that we have a good C-ARMA model coefficients vector, let us set the first C-ARMA model in newTask to use this vector. We also pick the time increment dt between points in the light curve to be dt = 0.1 d. We can set the first C-ARMA model held by newTask to the specified dt and Theta as follows:



In [9]:

dt = 0.1




Out[9]:

0



The return value of 0 indicates that the method call completed without error. We can check to see which of the C-ARMA models held by newTask has been set as follows:



In [10]:




Out[10]:

array([ True, False, False, False, False, False, False, False], dtype=bool)



As can be seen above, only the first model has been set with values so far. We can see the resulting values of the C-ARMA model matrices as follows:



In [11]:



As can be seen by the way we issued the two previous commands, we do not have to specify tnum = 0 because the default for tnum is 0 i.e. the first C-ARMA model held by the task. tnum was not required for the call to set(dt, Theta) either. Having set the model, we can also obtain the asymptotic standard deviation of light curves made with this model, i.e. the variability of the light curve, as follows:



In [12]:

import math




Out[12]:

1.0000000000000002



It turns out that the variability amplitude is 1.0 flux units which should come as no surprise given that we picked amp in the roots representation Rho to be 1.0 to begin with. We can plot the power spectral density of the C-ARMA as follows:



In [13]:




Out[13]:



The black line above is the final PSD. It is the sum of the red (log PSD numerator) and blue (-log PSD denominator) lines. They in turn are generated by evaluating the respective numerator and denominator polynomial functions of frequency at each frequency. Consider the numerator polynomial (blue) first. At low frequencies, i.e. on long timescales, it is completely dominated by the freq^0 term. At a frequency of about 1/day, the freq^2 term begins to dominate. The same way, in the case of the denominator, the chosen coefficients lead to the freq^0 term dominating on long timescales. On shorter timescales, the freq^2 term would have begun to dominate, except that the freq^4 term becomes important just a little later. So the PSD is dominated by the freq^2 denominator term over a very small range of timescales and we should expect to find that model parameter recovery will be hard (i.e. we won't do a good job unless our light curve is very long). The overall PSD does show a flattening of the slope at high frequencies, but this is driven purtely by the numerator freq^2 term. At this point, we are ready to simulate a light curve using this C-ARMA model. The simulate(T) method of CARMATask requires the duration of the desired light curve and returns a mockLC object. We can create a new light curve of length 1000 d as follows:



In [14]:



This creates a new lightcurve object of class mockLC that has been regularly sampled with sampling interval dt and of duration 1000 d. Alternatively, we could have obtained a mockLC with irregular sampling by passing in a numpy array or python list tIn of timestamps. Obviously, we cannot pas in both duration and tIn at the same time. Having made a mock light curve, we can view the simulated light curve newLC using the plot() method that all lc objects support as follows:



In [15]:

newLC.plot()




/home/vish/anaconda2/lib/python2.7/site-packages/matplotlib/axes/_axes.py:519: UserWarning: No labelled objects found. Use label='...' kwarg on individual plots.
warnings.warn("No labelled objects found. "

Out[15]:



This light curve has no noise. We can simulate noise by fixing two parameters - the fractional level of variability (fracIntrinsicVar = 0.15 by default) and the fractional level of the noise to the signal (fracNoiseToSignal = 0.001 by default). Once we have fixed these two parameters, we can simulate noise as follows:



In [16]:



Once again, we can plot the resulting light curve using the plot() method of all lc objects as follows:



In [17]:

newLC.plot()




Out[17]:



Lets change the noise-to-signal ratio and re-observe the light curve.



In [19]:

newLC.fracNoiseToSignal = 1.0
newLC.plot()




Out[19]:



Before proceeding further, let's drop the noise to signal back down to a realistic level (~ 0.1% photometry).



In [21]:

newLC.fracNoiseToSignal = 0.001
newLC.plot()




Out[21]:



Lets look at the periodogram of this mock light curve.



In [22]:

newLC.plotperiodogram()




/home/vish/anaconda2/lib/python2.7/site-packages/matplotlib/scale.py:93: RuntimeWarning: invalid value encountered in less_equal