Multi-band Lomb-Scargle

Standard Lomb-Scargle Algorithm

In standard Lomb-Scargle, we start with our data

$$ D = \{t_i,y_i,\sigma_i\}_{i=1}^N $$

We assume here that the true values of $y$ are centered at zero (we'll relax this momentarily)

The model is a simple 1-term sinusoid given by

$$ M(t,\omega~|~\theta) = \theta_0 \sin(\omega t) + \theta_1\cos(\omega t) $$

The likelihood for the dataset is

$$ L(\{t,y,dy\},\omega~|~\theta) = \sum_i \frac{1}{\sqrt{2\pi \sigma_i^2}} \exp\left[ \frac{-(y_i - M(t_i,\omega~|~\theta)^2}{2\sigma_i^2} \right] $$

Which leads to the chi-squared function (derived from the log-likelihood)

$$ \chi^2(\omega, \theta) = \sum_i\frac{[y_i - M(t_i,\omega~|~\theta)]^2}{2\sigma_i^2} $$

If we re-express the model by defining

$$ X_\omega = \left[ \begin{array}{ll} \sin(\omega t_1) & \cos(\omega t_1)\\ \sin(\omega t_2) & \cos(\omega t_2)\\ \vdots & \vdots \\ \sin(\omega t_N) & \cos(\omega t_N)\\ \end{array} \right],~~~~ y = \left[ \begin{array}{l} y_1 \\ y_2\\ \vdots \\ y_N\\ \end{array} \right] $$

And create the error matrix

$$ \Sigma_y = \left[ \begin{array}{lllll} \sigma_1^2 & 0 & 0 & \cdots & 0\\ 0 & \sigma_2^2 & 0 & \cdots & 0\\ 0 & 0 & \sigma_3^2 & \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & 0 & \cdots & \sigma_N^2 \end{array} \right] $$

then our model is given by

$$ M(\omega, \theta) = X_\omega\theta $$

and our $\chi^2$ can be written

$$ \chi^2(\omega, \theta) = (y - X_\omega\theta)^T\Sigma_y^{-1}(y - X_\omega\theta) $$

Minimizing this cost funciton with respect to $\theta$ gives the maximum likelihood parameters:

$$ \hat{\theta} = (X_\omega^T\Sigma_y^{-1}X_\omega)^{-1}X_\omega^T\Sigma_y^{-1}y $$

We can simplify this a bit by defining

$$ X_{\omega,\ast} = \Sigma_y^{-1/2}X_\omega \\ y_\ast = \Sigma_y^{-1/2}y $$

And the above becomes

$$ \hat{\theta} = (X_{\omega,\ast}^TX_{\omega,\ast})^{-1}X_{\omega,\ast}^Ty_\ast $$

Now the $\chi^2$ of the model fit is given by

$$ \chi^2(\omega,\hat{\theta}) = \left[ y_\ast^Ty_\ast - y_\ast^TX_{\omega,\ast} (X_{\omega,\ast}^TX_{\omega,\ast})^{-1}X_{\omega,\ast}^Ty_\ast \right] $$

The reference $\chi^2$ is

$$ \chi_0^2 = \bar{y}_\ast^T\bar{y}_\ast $$

So the power $P_{LS} = 1 - \chi^2/\chi_0^2$ is given by

$$ P_{LS}(\omega) = \frac{y_\ast^TX_{\omega,\ast} (X_{\omega,\ast}^TX_{\omega,\ast})^{-1}X_{\omega,\ast}^T\bar{y}_\ast}{\bar{y}_\ast^T\bar{y}_\ast} $$

Generalized Lomb-Scargle

The generalized lomb-scargle fits for the mean of $y$ as part of the model, rather than as a separete step.

So what changes is that the $X_\omega$ becomes:

$$ X_\omega = \left[ \begin{array}{lll} 1 & \sin(\omega t_1) & \cos(\omega t_1)\\ 1 & \sin(\omega t_2) & \cos(\omega t_2)\\ \vdots & \vdots & \vdots \\ 1 & \sin(\omega t_N) & \cos(\omega t_N)\\ \end{array} \right] $$

With this, we can relax the requirement that $y$ is centered: it will be centered as part of the model. Everything else carries through, and we have

$$ P_{LS}(\omega) = \frac{y_\ast^TX_{\omega,\ast} (X_{\omega,\ast}^TX_{\omega,\ast})^{-1}X_{\omega,\ast}^T\bar{y}_\ast}{\bar{y}_\ast^T\bar{y}_\ast} $$

Where the quantities $y_\ast$ and $X_{\omega,\ast}$ are the noise-corrected matrices as used above.

Best-fit Parameters

Note that for either of these models, the best-fit parameters are given by

$$ \hat{\theta} = (X_{\omega,\ast}^T X_{\omega,\ast})^{-1}X_{\omega, \ast}^T y_\ast $$

and that these best-fit values consist of a step within the computation of $P_{LS}$.

Generalizing to Multiple Bands

For multiple bands, we'll assume that there exists some fundamental model

$$ \theta_{0} = \{\omega, y_0, A_0, B_0\} $$

which defines an underlying oscillation

$$ M_0(t~|~\theta_0) = y_0 + A_0\sin(\omega t) + B_0\cos(\omega_0 t) $$

We'll assume that each band indexed by $b \in \{1, 2, 3...\}$ has a periodic offset function $Q_b$ parametrized by some $\theta_b$, such that the model for that band is

$$ M_b(t~|~\theta_0,\omega_0) = M_0(t~|~\theta_0) + Q_b(t~|~\omega, \theta_b) $$

Where, in general, $Q_b$ can be any periodic function.

This will give us a single global underlying model $\theta_0$, plus a $\theta_f^\ast$ for each of the have $N_f$ filters.

But this problem is over-specified: one set of parameters here is redundant; note that we can easily reparametrize the model for each filter as

$$ \theta_f = \{y_f, A_f, B_f\} $$

where we've defined

$$ y_f = y_f^\ast + y_0\\ A_f = A_f^\ast + A_0\\ B_f = B_f^\ast + B_0 $$

and we've thus eliminated the ability to solve explicitly for $\{y_0, A_0, B_0\}$.

Wave hands a bit

The result is that we can peform a very efficient lomb-scargle algorithm for each band independently, and then manipulate the results into a single global power $P$ which takes into account all the bands!

What about regularization?

Say we go back to our $\chi^2$ expression:

$$ \chi^2(\omega, \theta) = (y - X_\omega\theta)^T\Sigma_y^{-1}(y - X_\omega\theta) $$

We using a Tikhonov regularization, we can penalize the $\theta$ values:

$$ \chi^2(\omega, \theta) = (y - X_\omega\theta)^T\Sigma_y^{-1}(y - X_\omega\theta) + \theta^T \Gamma^T\Gamma \theta $$

(Note that while this form is often assumed within a frequentist context, it can be derived within a Bayesian context where the priors on $\theta$ are Gaussian, centered at zero, with a covariance matrix $\Sigma_\theta = [\Gamma^T\Gamma]^{-1}$)

Minimizing this with respect to theta gives:

$$ X_\omega^T\Sigma_y^{-1}(X_\omega\theta - y) + \Gamma^T\Gamma\theta = 0 $$

Or

$$ \hat{\theta} = \left(X_\omega^T\Sigma_y^{-1}X_\omega + \Gamma^T\Gamma\right)^{-1}\left(X_\omega^T\Sigma_y^{-1}y\right) $$

Plugging this in to the expression for $\chi^2$, we get the following result:

$$ \chi^2 = y^Ty - y^TX\left(X^TX + \Gamma^T\Gamma\right)^{-1}X^Ty $$

(TODO: fill in the missing sigma in the above expression)

That is, it's the same expression as above with the regularization term added to the pseudoinverse!!

So the regularized lomb-scargle power is given by:

$$ P_{LS} = \frac{y^TX\left(X^TX + \Gamma^T\Gamma\right)^{-1}X^Ty}{y^Ty} $$

In [ ]: