Analysis of Stochastic Processes ($\S$ 10.5)

If a system is always variable, but the variability is not (infinitely) predictable, then we have a stochastic process. Counter to what you may think, these processes can also be characterized.

Take a (stochastically varying) quasar which has both line and continuum emission and where the line emission is stimulated by the continuum. Since there is a physical separation between the regions that produce each type of emission, we get a delay between the light curves as can be seen here:

To understand stochastic processes, let's first talk about correlation functions. A correlation function ($\S$ 6.5) gives us information about the time delay between 2 processes. If one time series is derived from another simply by shifting the time axis by $t_{\rm lag}$, then their correlation function will have a peak at $\Delta t = t_{\rm lag}$.

The correlation function between $f(t)$, and $g(t)$ is defined as $${\rm CF}(\Delta t) = \frac{\lim_{T\rightarrow \infty}\frac{1}{T}\int_T f(t)g(t+\Delta t)dt }{\sigma_f \sigma_g}$$

Computing the correlation function is basically the mathematical processes of sliding the two curves over each other and computing the degree of similarity for each step in time. The peak of the correlation function reveals the time delay between the processes. Below we have the correlation function of the line and continuum emission from a quasar, which reveals a $\sim$ 15 day delay between the two.

In an autocorrelation function (ACF), $f(t)= g(t)$ and we instead are revealing information about variability timescales present in a process. If the values of $y$ are uncorrelated, then ACF$(\Delta t)=0$.

The Fourier Transform of an ACF is the Power Spectral Density (PSD). So, the PSD is an analysis in frequency space and the ACF is in time space. For example, for a sinusoidal function in time space, the ACF will have period, $T$, and the PSD in frequency space is a $\delta$ function centered on $\omega = 1/2\pi T$.

The structure function is another quantity that is frequently used in astronomy and is related to the ACF:

$${\rm SF}(\Delta t) = {\rm SF}_\infty[1 - {\rm ACF}(\Delta t)]^{1/2},$$

where ${\rm SF}_\infty$ is the standard deviation of the time series as evaluated on timescales much larger than any charateristic timescale.

If ${\rm SF} \propto t^{\alpha}$, then ${\rm PSD} \propto \frac{1}{f^{1+2\alpha}}$.

So an analysis of a stochastic system can be done with either the ACF, SF, or PSD.

AstroML has time series and Fourier tools for generating light curves drawn from a power law in frequency space. Note that these tools define $\beta = 1+2\alpha$. Complete the cell below to make a plot of counts vs. time and of the PSD vs. frequency for both a $1/f$ and a $1/f^2$ process. (Where the latter is known as Brownian motion or a random walk.)


In [ ]:
import numpy as np
from matplotlib import pyplot as plt
from astroML.time_series import generate_power_law
from astroML.fourier import PSD_continuous

N = 2014
dt = 0.01
beta = 2

t = dt * np.arange(N)
y = generate_power_law(# Complete
f, PSD = PSD_continuous(# Complete
    
fig = plt.figure(figsize=(8, 4))
ax1 = fig.add_subplot(121)
ax1.plot(t, y, '-k')
ax1.set_xlim(0, 10)

ax2 = fig.add_subplot(122, xscale='log', yscale='log')
ax2.plot(f, PSD, '-k')   
ax2.set_xlim(1E-1, 60)
ax2.set_ylim(1E-11, 1E-3)

plt.show()

You should find that, because the power at high frequency is larger for $1/f$, that light curve will look noisier.

We can even hear the difference: https://www.youtube.com/watch?v=3vEDZ-_iLNU)


In [ ]:
# Ivezic, Figure 10.29
# Author: Jake VanderPlas
# License: BSD
#   The figure produced by this code is published in the textbook
#   "Statistics, Data Mining, and Machine Learning in Astronomy" (2013)
#   For more information, see http://astroML.github.com
#   To report a bug or issue, use the following forum:
#    https://groups.google.com/forum/#!forum/astroml-general
import numpy as np
from matplotlib import pyplot as plt
from astroML.time_series import generate_power_law
from astroML.fourier import PSD_continuous

N = 1024
dt = 0.01
factor = 100

t = dt * np.arange(N)
random_state = np.random.RandomState(1)

fig = plt.figure(figsize=(5, 3.75))
fig.subplots_adjust(wspace=0.05)

for i, beta in enumerate([1.0, 2.0]):
    # Generate the light curve and compute the PSD
    x = factor * generate_power_law(N, dt, beta, random_state=random_state)
    f, PSD = PSD_continuous(t, x)

    # First axes: plot the time series
    ax1 = fig.add_subplot(221 + i)
    ax1.plot(t, x, '-k')

    ax1.text(0.95, 0.05, r"$P(f) \propto f^{-%i}$" % beta,
             ha='right', va='bottom', transform=ax1.transAxes)

    ax1.set_xlim(0, 10.24)
    ax1.set_ylim(-1.5, 1.5)

    ax1.set_xlabel(r'$t$')

    # Second axes: plot the PSD
    ax2 = fig.add_subplot(223 + i, xscale='log', yscale='log')
    ax2.plot(f, PSD, '-k')
    ax2.plot(f[1:], (factor * dt) ** 2 * (2 * np.pi * f[1:]) ** -beta, '--k')

    ax2.set_xlim(1E-1, 60)
    ax2.set_ylim(1E-6, 1E1)

    ax2.set_xlabel(r'$f$')

    if i == 1:
        ax1.yaxis.set_major_formatter(plt.NullFormatter())
        ax2.yaxis.set_major_formatter(plt.NullFormatter())
    else:
        ax1.set_ylabel(r'${\rm counts}$')
        ax2.set_ylabel(r'$PSD(f)$')

plt.show()

ACF for Unevenly Sampled Data

astroML also has tools for computing the ACF of unevenly sampled data using two different (Scargle) and (Edelson & Krolik) methods: http://www.astroml.org/modules/classes.html#module-astroML.time_series

One of the tools is for generating a damped random walk (DRW). Above we found that a random walk had a $1/f^2$ PSD. A damped random walk is a process "remembers" its history only for a characteristic time, $\tau$. The ACF vanishes for $\Delta t \gg \tau$.


In [2]:
# Syntax for EK and Scargle ACF computation
import numpy as np
from astroML.time_series import generate_damped_RW
from astroML.time_series import ACF_scargle, ACF_EK

t = np.arange(0,1000)
y = generate_damped_RW(t, tau=300)
dy = 0.1
y = np.random.normal(y,dy)

ACF_scargle, bins_scargle = ACF_scargle(t,y,dy)
ACF_EK, ACF_err_EK, bins_EK = ACF_EK(t,y,dy)

Figure 10.30 below gives an example of an ACF for a DRW, which mimics the variability that we might see from a quasar. (Note that the Scargle method doesn't seem to be working.)


In [ ]:
# Ivezic, Figure 10.30
# Author: Jake VanderPlas
# License: BSD
#   The figure produced by this code is published in the textbook
#   "Statistics, Data Mining, and Machine Learning in Astronomy" (2013)
#   For more information, see http://astroML.github.com
#   To report a bug or issue, use the following forum:
#    https://groups.google.com/forum/#!forum/astroml-general
import numpy as np
from matplotlib import pyplot as plt

from astroML.time_series import lomb_scargle, generate_damped_RW
from astroML.time_series import ACF_scargle, ACF_EK

#------------------------------------------------------------
# Generate time-series data:
#  we'll do 1000 days worth of magnitudes

t = np.arange(0, 1E3)
z = 2.0
tau = 300
tau_obs = tau / (1. + z)

np.random.seed(6)
y = generate_damped_RW(t, tau=tau, z=z, xmean=20)

# randomly sample 100 of these
ind = np.arange(len(t))
np.random.shuffle(ind)
ind = ind[:100]
ind.sort()
t = t[ind]
y = y[ind]

# add errors
dy = 0.1
y_obs = np.random.normal(y, dy)

#------------------------------------------------------------
# compute ACF via scargle method
C_S, t_S = ACF_scargle(t, y_obs, dy, n_omega=2 ** 12, omega_max=np.pi / 5.0)

ind = (t_S >= 0) & (t_S <= 500)
t_S = t_S[ind]
C_S = C_S[ind]

#------------------------------------------------------------
# compute ACF via E-K method
C_EK, C_EK_err, bins = ACF_EK(t, y_obs, dy, bins=np.linspace(0, 500, 51))
t_EK = 0.5 * (bins[1:] + bins[:-1])

#------------------------------------------------------------
# Plot the results
fig = plt.figure(figsize=(8, 8))

# plot the input data
ax = fig.add_subplot(211)
ax.errorbar(t, y_obs, dy, fmt='.k', lw=1)
ax.set_xlabel('t (days)')
ax.set_ylabel('observed flux')

# plot the ACF
ax = fig.add_subplot(212)
ax.plot(t_S, C_S, '-', c='gray', lw=1, label='Scargle')
ax.errorbar(t_EK, C_EK, C_EK_err, fmt='.k', lw=1, label='Edelson-Krolik')
ax.plot(t_S, np.exp(-abs(t_S) / tau_obs), '-k', label='True')
ax.legend(loc=3)

ax.plot(t_S, 0 * t_S, ':', lw=1, c='gray')

ax.set_xlim(0, 500)
ax.set_ylim(-1.0, 1.1)

ax.set_xlabel('t (days)')
ax.set_ylabel('ACF(t)')

plt.show()

Autoregressive Models

For processes like these that are not periodic, but that "retain memory" of previous states, we can use autogressive models.

A random walk is an example of such a process; every new value is given by the preceeding value plus some noise:

$$y_i = y_{i-1} + \epsilon_i.$$

If the coefficient of $y_{i-1}$ is $>1$ then it is known as a geometric random walk, which is typical of the stock market. (So, when you interview for a quant position on Wall Street, you tell them that you are an expert in using autoregressive geometric random walks to model stochastic processes.)

In the random walk case above, each new value depends only on the immediately preceeding value. But we can generalized this to include $p$ values:

$$y_i = \sum_{j=1}^pa_jy_{i-j} + \epsilon_i$$

We refer to this as an autoregressive (AR) process of order $p$: $AR(p)$. For a random walk, we have $p=1$, and $a_1=1$.

If the data are drawn from a "stationary" process (one where it doesn't matter what region of the light curve you sample [so long as it is representative]), the $a_j$ satisfy certain conditions.

One thing that we might do then is ask whether a system is more consistent with $a_1=0$ or $a_1=1$ (noise vs. a random walk).

Below are some example light curves for specific $AR(p)$ processes. In the first example, $AR(0)$, the light curve is simply responding to noise fluctuations. In the second example, $AR(1)$, the noise fluctuation responses are persisting for slightly longer as the next time step depends positively on the time before. For the 3rd example, nearly the full effect of the noise spike from the previous time step is applied again, giving particularly long and high chains of peaks and valleys. In the 4th example, $AR(2)$, we have long, but low chains of peaks and valleys as a spike persists for an extra time step. Finally, in the 5th example, the response of a spike in the second time step has the opposite sign as for the first time step, and both have large coefficients, so the peaks and valleys are both quite high and quite narrowly separated.

A moving average (MA) process is similar in some ways to an AR process, but is different in other ways. It is defined as

$$y_i = \epsilon_i + \sum_{j=1}^qb_j\epsilon_{i-j}.$$

So, for example, an MA(q=1) process would look like $$y_i = \epsilon_{i} + b_1\epsilon_{i-1},$$

whereas an AR(p=2) process would look like

$$y_i = a_1y_{i-1} + a_2y_{i-2} + \epsilon_i$$

Thus the $MA$ process is similar to an $AR$ process in that the next time step depends on the previous time step, but they are different in terms of how they respond to a shock. In an $MA$ process a shock affects only the current value and $q$ values into the future. In an $AR$ process a shock affects all future values.

Below is some code and a plot that illustrates this.


In [ ]:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt
from matplotlib.ticker import MultipleLocator

N=10

#epsilon = np.array([0,0,0,1,0,0,0,0,0,0,0,0])
epsilon = np.zeros(N+2)
epsilon[3] = 1
yAR=np.zeros(N+2)
yMA=np.zeros(N+2)
yARMA=np.zeros(N+2)

for i in np.arange(N)+2:
    # Complete using the coefficients given in the legend text below
    yAR[i] = 
    yMA[i] = 
    yARMA[i] = 
    #print i, yAR[i], yMA[i]

fig = plt.figure(figsize=(6, 6))
t = np.arange(len(yAR))
plt.plot(t,yAR,label="AR(2), a_1=0.5, a_2=0.5")
plt.plot(t,yMA,label="MA(2), b_1=0.5, b_2=0.5")
plt.plot(t,yARMA,label="ARMA(2,1), a_1=0.5, a_2=0.25, b_1=0.5",zorder=0)
plt.xlabel("t")
plt.ylabel("y")
plt.legend(loc="upper right",prop={'size':8})
plt.ylim([0,1.1])
ax = plt.axes()
ax.xaxis.set_major_locator(plt.MultipleLocator(1.0))

plt.show()

Note that I also included an ARMA(2,1) model, which combines AR(2) and MA(1): $$y_i = a_1y_{i-1} + a_2y_{i-2} + \epsilon_i + b_1 \epsilon_{i-1}.$$

I found these videos particularly helpful in trying to understand these processes.

MA(1)

AR(1)

ARMA(1,1)

CARMA Models

$AR$ and $ARMA$ models assume evenly sampled time-series data. However, we can extend this to unevenly sampled data with $CAR$ or $CARMA$ processes, where the $C$ stands for continuous.

A $CAR(1)$ process is described by a stochastic differential equation which includes a damping term that pushes $y(t)$ back towards the mean, so it is also a damped random walk (DRW). For evenly sampled data a CAR(1) process is the same as an AR(1) process with $a_1=\exp(-1/\tau)$. That is, the next value is the previous value times the damping factor (plus noise).

The ACF for a DRW is given by $$ ACF(t) = \exp(-t/\tau),$$ where $\tau$ is the characteristic timescale (i.e., the damping timescale).

The structure function can be written as $$ SF(t) = SF_{\infty}[1-\exp(-t/\tau)]^{1/2}.$$

The PSD is then $$ PSD(f) = \frac{\tau^2 SF_{\infty}^2}{1+(2\pi f \tau)^2},$$ which means that a DRW is a $1/f^2$ process at high frequency. The damped part comes from the flat PSD at low frequency.

The ACF example above was an example of a DRW: the light curve is strongly correlated a short timescales, but uncorrelated at long timescales.

Regression and Classification

Regression and Classification for these stochastic models is just the same as before. With our model and model parameters, we can predict future values via regression, or look for similarities/differences as a function of the model parameters via clustering (unsupervised) or classification (supervised). Similarly, we can apply dimensionality reduction techniques to help visualize results from high-dimensional models.