We first see here which specific characteristics the Time Series (TS) have, and will then see in a second part a concrete example of TS analysis (smoothing + modelling + forecasting).
Stochastic vs. deterministic model
In a deterministic process you know exactly where the next point is, in the stochastic process there is some randomness at every point.
In a deterministic process you know exactly where the next point is.
For example, the sine wave function:
In [1]:
import pandas as pd
import numpy as np
np.random.seed(2018)
In [2]:
n = 100
x = np.linspace(0, 2 * np.pi, n)
y = np.sin(x)
In [3]:
import matplotlib.pyplot as plt
%matplotlib inline
In [4]:
fig,ax = plt.subplots()
ax.plot(x,y)
fig.suptitle("A deterministic process");
In [5]:
whiteNoise = np.random.standard_normal(100) # normal distribution mean=0, std=1
In [6]:
fig,ax = plt.subplots()
ax.plot(whiteNoise)
fig.suptitle("A discrete white noise");
As you see, it is basically impossible to predict the value of the next step.
Assuming that our time series are a realisation of a stochastic process (a sequence of random variables) will allow us to predict the next data points of the time series. That is, we are going to assume that there is some underlying generating process for our time series.
Let's see another example: we inject some randomness into the above sine wave:
In [7]:
n = 100
x = np.linspace(0, 2 * np.pi, n)
noisyWave = np.sin(x) + 0.3 * np.random.randn(n)
In [8]:
fig,ax = plt.subplots()
ax.plot(x,noisyWave)
fig.suptitle("A stochastic process: a noisy sine wave");
There is randomness and we cannot say exactly where the next point will be but we can also see that there is an underlying generating process, the sine wave.
In a stationary time series , there is
For a non-stationary time series, we will do some transformations to get stationary time series.
But why is this important? Most of the TS models work on the assumption that the TS is stationary. Intuitively, we can sat that if a TS has a particular behaviour over time, there is a very high probability that it will follow the same in the future.
One of the challenge is then to extract signal from noise. The autocorrelation will help us. Let's see it in details.
The covariance of two random variables x and y, tells us how the two variables move together. Measures the linear dependence between two random variables:
In [9]:
def getCovariance(x, y):
# returns the covariance of two data sets
# Subtract the subseries means
return np.dot((x-np.mean(x)), (y-np.mean(y))) / len(x)
In [10]:
getCovariance([1,2,3], [1,2,3])
Out[10]:
Auto means self and the autocovariance is the covariance of X with a lagged version of itself:
In [11]:
def getAutoCovariance(x, lag):
# return the covariance of one data set by
# slicing it based on the given lag
# slice into two sub sets
x1 = x[:(len(x)-lag)] # from begin till (end-lag)
x2 = x[lag:] # from lag till the end
# get the covariance of the two sub-sets
covariance = getCovariance(x1, x2)
# return the covariance but also the
# standard deviations of the two subsets
return (covariance, np.std(x1), np.std(x2))
In [12]:
getAutoCovariance([1,2,3,4], lag=2)
Out[12]:
If the signal is weakly stationary, it means that this autocorrelation is constant over time. And because of that, it means that we can use the data to create a sample version of the autocorrelation. We just take subsequents slices (the samples) of the data series.
In [13]:
def autocorrelation_var(x, lag):
# get the covariance by slicing
(covariance, _, _) = getAutoCovariance(x, lag)
# Normalize with variance of whole series
return covariance / np.var(x)
def autocorrelation_std(x, lag):
# get the covariance by slicing
(covariance, std_x1, std_x2) = getAutoCovariance(x, lag)
# Normalize with the subseries standard deviations
return covariance / (std_x1 * std_x2)
The statsmodels package has also an autocorrelation function (acf) which calculates for each lag the autocorrelation. And also Pandas has one, the autocorr.
Let's see them all together, applied to the white noise process:
In [14]:
import statsmodels.tsa.stattools as smt
In [15]:
results = {}
nlags=20
results["Slices_variance"] = [autocorrelation_var(whiteNoise, lag) for lag in range(nlags)]
results["Slices_Std"] = [autocorrelation_std(whiteNoise, lag) for lag in range(nlags)]
results["Pandas"] = [pd.Series(whiteNoise).autocorr(lag) for lag in range(nlags)]
results["Statmodels"] = smt.acf(whiteNoise, unbiased=True, nlags=nlags-1)
In [16]:
pd.DataFrame(results).plot(kind="bar", figsize=(10,5), grid=True)
plt.xlabel("Lag")
plt.ylim([-0.3, 1.2])
plt.ylabel("Autocorrelation")
plt.suptitle("Different ways to compute autocorrelation")
plt.show()
Pandas is using the variance, while statsmodels is using the std.
Statmodels return more information :
In [17]:
data_acf, ci, Q, pValue = smt.acf(whiteNoise, qstat=True, alpha=0.05)
and it provides useful plots:
In [18]:
import statsmodels.graphics.tsaplots as smpl
In [19]:
smpl.plot_acf(whiteNoise, lags=20);
Let's see an example where every point is correlated with the previous one: a sequence.
In [20]:
sequence1 = np.arange(0, 1, 1/100) # a sequence from 0 to 1 by 1/100
In [21]:
smpl.plot_acf(sequence1, lags=20);
In [22]:
smpl.plot_pacf(sequence1, lags=20);
In [23]:
sequence2 = np.array([1,2,3])
sequence2 = np.resize(sequence2, 21) # repeat it 7 times
In [24]:
smpl.plot_acf(sequence2, lags=20);
In [25]:
smpl.plot_pacf(sequence2, lags=20);
In [26]:
smpl.plot_acf(whiteNoise, lags=30);
In [27]:
smpl.plot_pacf(whiteNoise, lags=20);
In [28]:
from pandas.tools.plotting import autocorrelation_plot
wnS = pd.Series(whiteNoise)
autocorrelation_plot(wnS)
Out[28]:
A random walk is a stationary stochastic process. Not a time series.
Its model relates current value of the time series to the previous value by adding some random deviation to the previous value.
A random walk model: X_t = X_t-1 + Z_t
Z_t is the white noise (residual).
In [29]:
def plotTS(y, lags=None, style='bmh'):
if not isinstance(y, pd.Series):
y = pd.Series(y)
with plt.style.context(style):
fig = plt.figure()
layout = (2, 2)
ts_ax = plt.subplot2grid(layout, (0, 0), colspan=2)
acf_ax = plt.subplot2grid(layout, (1, 0))
pacf_ax = plt.subplot2grid(layout, (1, 1))
y.plot(ax=ts_ax)
ts_ax.set_title('Time Series Analysis Plots')
smpl.plot_acf(y, lags=lags, ax=acf_ax, alpha=0.5)
smpl.plot_pacf(y, lags=lags, ax=pacf_ax, alpha=0.5)
plt.tight_layout()
return
In [30]:
SAMPLES = 1000
Z = np.random.normal(size = SAMPLES)
rw = np.zeros(SAMPLES)
for t in range(SAMPLES):
rw[t] = rw[t-1] + Z[t]
In [31]:
plotTS(rw, lags=20);
Clearly our random walk is not stationary.
Recall that a random walk is X_t = X_t-1 + Z_t. Using algebra we can say that X_t - Xt-1 = Z_t. Thus the first differences of our random walk series should equal a white noise process!
We can use the "np.diff()" function on our TS and see if this holds.
In [32]:
plotTS(np.diff(rw), lags=20);
Our definition holds as this looks exactly like a white noise process.
Applying the difference to a Time Series is one way to remove the trend; this will get a stationary time series from a random walk.
Let's see more ways to extract the trend from a Time Series.
In [33]:
fig,ax = plt.subplots()
ax.plot(x,noisyWave)
ax.set_xlabel("Time")
fig.suptitle("Noisy sine wave");
We know that there is a trend (because of how it has been built) but how to extract the trend?
One way to get the trend is to use regression, basically a local regression.
You define a window of width m and then you just average the previous m values to get a smooth estimate for the current time. And then we continue to move over, use the next m values to get the estimate for the next point and so on.
This is called moving average and is used to get a fine trend (it's basically a low-pass filter).
To get a higher level trend, a method called Loess can be used. It still defines a window and uses the nearest neighbours that are within that window to do the local regression. But The loess regression uses polynomial terms:
((1-distance(t,t_0)^3)^3.
Higher weights for the nearer points.
In [34]:
def movingAverage(values, order):
# perform local regression on "values" using moving windows of width "order"
end = len(values)
out = np.zeros(len(values))
out[0] = values[1]
for i in range(1,end):
if (i - order <= 1):
j = 1
else:
j += 1
out[i] = sum(values[j-1:i]) / (i-j+1)
return out
Let's try with a window of width 10:
In [35]:
trendMA = movingAverage(noisyWave,10)
In [36]:
fig,ax = plt.subplots()
ax.plot(x,noisyWave)
ax.plot(x,trendMA)
ax.set_xlabel("Time")
fig.suptitle("Trend extracted from noisy sine wave");
The Loess regression is more complicated but there is a function for it in the statsmodels package:
In [37]:
import statsmodels.nonparametric.smoothers_lowess as smL
trendL = smL.lowess(noisyWave,x,frac=0.25,return_sorted=False)
In [38]:
fig,ax = plt.subplots()
ax.plot(x,noisyWave)
ax.plot(x,trendMA, label='Trend from Moving Average')
ax.plot(x,trendL, label="Trend from Loess")
ax.set_xlabel("Time")
ax.legend()
fig.suptitle("Trends extracted from noisy sine wave");
You can see that the trend is smoother with Loess.
In [39]:
SIZE = 100
ma1 = np.zeros(SIZE)
# Generate noise
noise = np.random.standard_normal((SIZE,)) # normal distribution
# Loop for generating MA(1) process
for i in range(0, SIZE):
ma1[i] = noise[i] + 0.6*noise[i-1]
In [40]:
fig,ax = plt.subplots()
ax.plot(ma1)
ax.set_xlabel("Steps")
fig.suptitle("A moving average process of order 1");
In [41]:
smpl.plot_acf(ma1, lags=20);
Moving average model, order 2:
X_t = Z_t + Theta_1 Z_t-1 + Theta2 Zt-2
In [42]:
SIZE = 500
ORDER = 2
ma2 = np.zeros(SIZE)
# Generate noise
noise = np.random.standard_normal((SIZE,)) # normal distribution
# Loop for generating MA(2) process
for i in range(ORDER, SIZE):
ma2[i] = noise[i] + 0.7*noise[i-1] + 0.2*noise[i-2]
In the moving average process of order two, Xt depends on the noise two steps back. It depends on Zt, Zt minus 1, zt minus 2. For clarity, I started the loop from 2 (the order), therefore it's needed to shift the resulting model by two values:
In [43]:
# Shift data to left by 2 units
moving_average_process_2 = np.roll(ma2, -ORDER)
In [44]:
fig,ax = plt.subplots()
ax.plot(moving_average_process_2)
ax.set_xlabel("Steps")
fig.suptitle("A moving average process of order 2");
Here I have a moving average process of order 2. Without analyzing it, it would be impossible to tell this is actually a moving average process.
In [45]:
smpl.plot_acf(moving_average_process_2, lags=40);
As expected, MA(2) has two significative lages in the ACF plot.
This can be used the other way around: if a time series has q significative lags, then it can be modeled by a MA of order q.
So this is one of the ways we're going to model our time series later on. You're going to look at ACF and if you see an ACF cuts off after some lag, that gives us a reason to model our data using a moving average process.
In [46]:
SIZE = 500
ORDER = 4
ma4 = np.zeros(SIZE)
# Generate noise
noise = np.random.standard_normal((SIZE,)) # normal distribution
# Yet another loop for generating MA(4) process :) no shift!
for i in range(ORDER, SIZE):
ma4[i-4] = noise[i] + 0.2*noise[i-1] + 0.3*noise[i-2] + 0.4*noise[i-3]
In [47]:
smpl.plot_acf(ma4, lags=40);
When the dependent variable is regressed against one or more lagged values of itself the model is called autoregressive. The formula looks like this: $$ x_{t} = w_{t} + \beta_{1} x_{t-1} + ... + \beta_{p} x_{t-p}$$
When you describe the "order" of the model, as in, an AR model of order "p", the p represents the number of lagged variables used within the model. For example an AR(2) model or second-order autoregressive model looks like this:
Note that an AR(1) model with $\beta_{1}$ set equal to 1 is a random walk and therefore not stationary.
In [48]:
SIZE = 100
ORDER = 1
beta = 1
X = np.zeros(SIZE)
# Generate noise
W = np.random.standard_normal((SIZE,)) # noise as a normal distribution
# Loop for generating AR1 process
#X[1] = W[1]
for t in range(ORDER, SIZE):
X[t] = beta * X[t-1] + W[t]
In [49]:
fig,ax = plt.subplots()
ax.plot(X)
ax.set_xlabel("Steps")
fig.suptitle("AR(1) Time Series == random walk");
In [50]:
smpl.plot_acf(X, lags=20);
It looks like a random walk.
An AR(1) process is only stationary when -1 < beta < 1.
In [51]:
SAMPLES = 1000
beta = 0.6
X = W = np.random.normal(size = SAMPLES)
for t in range(SAMPLES):
X[t] = beta * X[t-1] + W[t]
In [52]:
plotTS(X, lags=20);
As expected the distribution of our simulated AR(1) model is normal. There is significant serial correlation between lagged values but interestingly the PACF plot shows only one significant value.
Let's try an AR(2) process.
In [53]:
SAMPLES = 1000
ORDER = 2
X = W = np.random.normal(size = SAMPLES)
for t in range(SAMPLES):
X[t] = 0.666 * X[t-1] - 0.333 * X[t-2] + W[t]
In [54]:
plotTS(X, lags=20);
As before we can see that the ACF correlogram differs significantly from that of white noise, as we'd expect.
The PACF plot shows two statistically significant peaks at k=1, k=2.
Similar to the MA model and the ACF plot, here you're going to look at PACF and if you see that PACF cuts off after some lag p, that gives a reason to model the data using an autoregressive process with order p.
AR and MA models, that we have just seen, are special cases of a more general class of time series models known as ARIMA models.
ARIMA stands for Auto-Regressive Integrated Moving Averages.
Let's see some examples.
The statsmodels package offer a function to generate ARIMA models:
In [55]:
import statsmodels.tsa.arima_process as ap
In [56]:
# AR and MA coeffs need always 1 at first
simulated = ap.arma_generate_sample(ar=[1], ma=[1], nsample=100)
In [57]:
fig,ax = plt.subplots()
ax.plot(simulated)
fig.suptitle("Generated ARIMA(0,0,0)");
In [58]:
ar_coef = [1, -0.75, 0.25]
ma_coef = [1, .65, .35]
n = 100
In [59]:
arma_process = ap.ArmaProcess(ar_coef, ma_coef)
arma_process.isstationary
Out[59]:
In [60]:
simulated = ap.arma_generate_sample(ar_coef, ma_coef, n)
In [61]:
fig,ax = plt.subplots()
ax.plot(simulated)
fig.suptitle("Generated ARIMA(2,0,2)");
As a rule of thumb, you use the ACF for confirming trend and inferring possible values of the moving average parameters, and the PACF for the auto regressive part.
In [62]:
from statsmodels.tsa.arima_model import ARIMA
In [63]:
modelFitted = ARIMA(simulated, order=(2,0,2))
In [64]:
results = modelFitted.fit(disp=0) # disp: do not display covergence info
In [65]:
results.summary()
Out[65]:
In [66]:
fig,ax = plt.subplots()
ax.plot(simulated)
ax.plot(results.fittedvalues, color='red')
ax.set_xlabel("Time")
fig.suptitle("A contrived example of ARIMA modeling");
In [ ]: