Kālī is a software package written to model and study determinstic and stochastic astronomical light curves, i.e. ordered lists of flux as a function of time. Kālī provides two models for astronomical light curves:
CARMA processes are defined as solutions to the differential equation $$\mathrm{d}^{p}x + \alpha_{1} \mathrm{d}^{p-1}x + \ldots + \alpha_{p-1} \mathrm{d}x + \alpha_{p}x = \beta_{0} (\mathrm{d}W) + \ldots + \beta_{q} \mathrm{d}^{q}(\mathrm{d}W).$$ Realize that this differential equation is just a vanilla linear differential equation that happens to be driven by a noise process. For example, for a CARMA(2,1) process, the defining differential equation is $$\mathrm{d}^{2}x + \alpha_{1} \mathrm{d}x + \alpha_{2}x = \beta_{0} (\mathrm{d}W) + \beta_{1} \mathrm{d}^{1}(\mathrm{d}W),$$ which we can cast into slightly friendlier form by re-defining the coefficients on the LHS as $$\mathrm{d}^{2}x + 2 \zeta \omega \mathrm{d}x + \omega^{2} x = \beta_{0} (\mathrm{d}W) + \beta_{1} \mathrm{d}^{1}(\mathrm{d}W).$$ Astute readers will notice that the LHS is just a damped harmonic oscillator making the whole equation that of a forced damped harmonic oscillator! The forcing term is a 'blue'-noise process.
Kālī consists of multiple Python sub-modules that call underlying c++ libraries using Cython. All the sub-modules are structured around two concepts: lc
objects that represent light curves and task
objects that take lc
objects as inputs and perform useful work such as fitting light curves to various models, both stochastic and non-stochastic. The lc
aobjects are abstract objects that must be inherited from to make concrete instantiable sub-classes such as the mockLC
, externalLC
, sdssLC
, k2LC
and the crtsLC
objects. The abstract lc
and the concrete mockLC
& externalLC
classes are defined in the lc.py
file. The concrete sdssLC
, k2LC
, and crtsLC
objects are all defined in thier respective s82.py
, k2.py
, and crts.py
files.
The two concrete classes mockLC
and externalLC
are designed for slightly different purposes. mockLC
s are to be used for simulated data generated by various task
objects while externalLC
s are to be used when performing one-off analysis of a real light curve. Typically, if multiple real light curves are to be studied with Kālī, it is better to define custom lc
classes such as the sdssLC
defined in s82.py
and the k2LC
defined in the k2.py
files. A quick look at these files shows how the two classes deal in very different ways with getting SDSS and K2 light curves. The SDSS lightcurves are hosted on multiple author-run servers and the class uses zeromq
to talk to these servers and fetch the required light curves from them. The K2 data are fetched from the NASA MAST servers where they are hosted on-demand. The idea is to let data-set specific sub-classes of lc
specialize for the data-set being used and make the remaining package agnostic to the source of the data. Later on in this document, we will provide guidelines for creating inherited sub-classes of lc
.
We shall begin by introducing the reader to the CARMATask
object.
The CARMATask
object allows us to perform C-ARMA analysis on light curves. In order to use a CARMATask
, it is necessary to import the carma
module. After this, we can create a new basic task object called firstNewTask
to study the C-ARMA(2,1) model as follows:
In [1]:
import kali.carma
firstNewTask = kali.carma.CARMATask(2,1)
All task
objects contain one or more models i.e. each task object is capable of being used to study multiple models simultaneously. The only restriction is that the models held by each task object have to have the order i.e. in the case of CARMATask
s that are designed for C-ARMA models, all the C-ARMA models held by a single CARMATask
have to have the the same p
& q
order. Of course, this does not prevent us from having multiple task objects declared at once. By default, the task
object initializer will query your system to determine how many hardware threads (including Intel Hyperthreading) your CPU supports and will hold that number of models. However, this behavior can be changed by explicitly instructing the task constructor to create a specific number of models in the task as follows:
In [2]:
anotherNewTask = kali.carma.CARMATask(3, 1, nthreads = 2)
anotherNewTask
is a CARMATask
that holds two C-ARMA(3,1) models for use. In the case of CARMATask
s, we can specify how models will burn in light curves when generating simulated light curves by initializing the yetAnotherNewTask
object as follows:
In [3]:
yetAnotherNewTask = kali.carma.CARMATask(4, 2, nthreads = 1, nburn = 100000)
Here yetAnotherNewTask
is a CARMATask
that holds a single C-ARMA(4,2) model. Any simulated light curves generated by yetAnotherNewTask
will automatically be burnt in for 100,000 steps. Other parameters that can be passed to the CARMATask
constructor include nwalkers
, nsteps
etc... Please refer to the code documentation for the exact list of parameters accepted by the constructor. For the rest of this section, we will make a new CARMATask
object newTask
and set the number of steps taken by the MCMC walkers to 500 to properly sample our posterior space.
In [4]:
newTask = kali.carma.CARMATask(2, 1, nsteps = 500)
At this point, our CARMATask
objects are hollow objects that lack numerical meaning. We must specify C-ARMA model parameters to be able to do useful things with our tasks. Two model parameters are required - a double precision value for the time-step dt
to be used for simulations and a numpy ndarray of doubles Theta
for the C-ARMA model coefficients. Mathematically, an individual C-ARMA model can be specified using either the co-efficients representation or the roots representation. The roots representation is much more convenient to use when providing model parametrs because it is trivial to pick valid values. However, Kālī requires that the C-ARMA model be specified purely in terms of the C-ARMA model coefficients and provides helper functions to convert the roots representation to the coeffiecients represenation. Let us set the first C-ARMA model in the newTask
object to have an autocorrelation function that decays with two timescales - 107.8
d and 43.2
d. We shall set the MA timescale to be 5.5
d and set the amplitude of the light curve (i.e. the square root of the asymptotic autocorrelation function) to be 1.0
In [5]:
r_1 = (1.0/107.8) + 0j
r_2 = (-1.0/33.2) + 0j
m_1 = (-1.0/5.5) + 0j
amp = 1.0
Note that we have purposely made the first root positive i.e. the C-ARMA model corresponding to this set of roots will be unstable. We can use the helper function kali.carma.coeffs(p, q, Rho)
to compute the polynomial coefficients representation corresponding to this roots representation as follows:
In [6]:
import numpy as np
Rho = np.array([r_1, r_2, m_1, amp])
Theta = kali.carma.coeffs(2, 1, Rho)
print Theta
We can check the returned Theta
for validity using the check(Theta)
method of the CARMATask
object newTask
as follows:
In [7]:
print newTask.check(Theta)
After fixing the bad root, we can rerun the check as follows
In [8]:
r_1 = (-1.0/107.8) + 0j
Rho = np.require(np.array([r_1, r_2, m_1, amp]), requirements=['F', 'A', 'W', 'O', 'E'])
Theta = kali.carma.coeffs(2, 1, Rho)
print Theta
print newTask.check(Theta)
Note that we wrapped the definition of Rho in a np.require
call that ensures that the underlying data in Rho
is contiguous in memory and can thus be accesses by simple pointer arithmetic in the underlying c++
library. Although it is not necesary to wrap such small arrays in the np.require
call, it is good practice and can help avoid mysterious memeory related errors as well as silent garbage output caused by an incorrect interpretation of a pointer as data.
After having established that we have a good C-ARMA model coefficients vector, let us set the first C-ARMA model in newTask
to use this vector. We also pick the time increment dt
between points in the light curve to be dt = 0.1
d. We can set the first C-ARMA model held by newTask
to the specified dt
and Theta
as follows:
In [9]:
dt = 0.1
newTask.set(dt, Theta, tnum = 0)
Out[9]:
The return value of 0
indicates that the method call completed without error. We can check to see which of the C-ARMA models held by newTask
has been set as follows:
In [10]:
newTask.list()
Out[10]:
As can be seen above, only the first model has been set with values so far. We can see the resulting values of the C-ARMA model matrices as follows:
In [11]:
newTask.show()
As can be seen by the way we issued the two previous commands, we do not have to specify tnum = 0
because the default for tnum
is 0
i.e. the first C-ARMA model held by the task. tnum
was not required for the call to set(dt, Theta)
either. Having set the model, we can also obtain the asymptotic standard deviation of light curves made with this model, i.e. the variability of the light curve, as follows:
In [12]:
import math
math.sqrt(newTask.Sigma()[0,0])
Out[12]:
It turns out that the variability amplitude is 1.0
flux units which should come as no surprise given that we picked amp
in the roots representation Rho
to be 1.0
to begin with. We can plot the power spectral density of the C-ARMA as follows:
In [13]:
newTask.plotpsd(doShow = False)
Out[13]:
The black line above is the final PSD. It is the sum of the red (log PSD numerator) and blue (-log PSD denominator) lines. They in turn are generated by evaluating the respective numerator and denominator polynomial functions of frequency at each frequency. Consider the numerator polynomial (blue) first. At low frequencies, i.e. on long timescales, it is completely dominated by the freq^0 term. At a frequency of about 1/day, the freq^2 term begins to dominate. The same way, in the case of the denominator, the chosen coefficients lead to the freq^0 term dominating on long timescales. On shorter timescales, the freq^2 term would have begun to dominate, except that the freq^4 term becomes important just a little later. So the PSD is dominated by the freq^2 denominator term over a very small range of timescales and we should expect to find that model parameter recovery will be hard (i.e. we won't do a good job unless our light curve is very long). The overall PSD does show a flattening of the slope at high frequencies, but this is driven purtely by the numerator freq^2 term.
At this point, we are ready to simulate a light curve using this C-ARMA model. The simulate(T)
method of CARMATask
requires the duration of the desired light curve and returns a mockLC
object. We can create a new light curve of length 1000
d as follows:
In [14]:
newLC = newTask.simulate(duration=1000.0)
This creates a new lightcurve object of class mockLC
that has been regularly sampled with sampling interval dt
and of duration 1000
d. Alternatively, we could have obtained a mockLC
with irregular sampling by passing in a numpy array or python list tIn
of timestamps. Obviously, we cannot pas in both duration
and tIn
at the same time. Having made a mock light curve, we can view the simulated light curve newLC
using the plot()
method that all lc
objects support as follows:
In [15]:
newLC.plot()
Out[15]:
This light curve has no noise. We can simulate noise by fixing two parameters - the fractional level of variability (fracIntrinsicVar = 0.15 by default) and the fractional level of the noise to the signal (fracNoiseToSignal = 0.001 by default). Once we have fixed these two parameters, we can simulate noise as follows:
In [16]:
newTask.observe(newLC)
Once again, we can plot the resulting light curve using the plot()
method of all lc
objects as follows:
In [17]:
newLC.plot()
Out[17]:
Lets change the noise-to-signal ratio and re-observe the light curve.
In [19]:
newLC.fracNoiseToSignal = 1.0
newTask.observe(newLC)
newLC.plot()
Out[19]:
Before proceeding further, let's drop the noise to signal back down to a realistic level (~ 0.1% photometry).
In [21]:
newLC.fracNoiseToSignal = 0.001
newTask.observe(newLC)
newLC.plot()
Out[21]:
Lets look at the periodogram of this mock light curve.
In [22]:
newLC.plotperiodogram()
Out[22]: