In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [2]:
airline = pd.read_csv('airline_passengers.csv',
index_col = "Month")
In [3]:
airline.dropna(inplace = True)
airline.index = pd.to_datetime(airline.index)
In [4]:
airline.head()
Out[4]:
In [5]:
airline['6-month-SMA'] = airline['Thousands of Passengers'].rolling(window = 6).mean()
airline['12-month-SMA'] = airline['Thousands of Passengers'].rolling(window = 12).mean()
In [6]:
airline.head()
Out[6]:
In [7]:
airline.plot(figsize = (12, 8))
Out[7]:
We just showed how to calculate the SMA based on some window.However, basic SMA has some "weaknesses".
To help fix some of these issues, we can use an EWMA (Exponentially-weighted moving average).
EWMA will allow us to reduce the lag effect from SMA and it will put more weight on values that occured more recently (by applying more weight to the more recent values, thus the name). The amount of weight applied to the most recent values will depend on the actual parameters used in the EWMA and the number of periods given a window size. Full details on Mathematics behind this can be found here Here is the shorter version of the explanation behind EWMA.
The formula for EWMA is:
$ y_t = \frac{\sum\limits_{i=0}^t w_i x_{t-i}}{\sum\limits_{i=0}^t w_i} $
Where x_t is the input value, w_i is the applied weight (Note how it can change from i=0 to t), and y_t is the output.
Now the question is, how to we define the weight term w_i ?
This depends on the adjust parameter you provide to the .ewm() method.
When adjust is True (default), weighted averages are calculated using weights:
When adjust=False is specified, moving averages are calculated as:
yt &= (1 - \alpha) y{t-1} + \alpha x_t,\end{split}$
which is equivalent to using weights:
\begin{split}w_i = \begin{cases} \alpha (1 - \alpha)^i & \text{if } i < t \\ (1 - \alpha)^i & \text{if } i = t. \end{cases}\end{split}
When adjust=True we have y0=x0 and from the last representation above we have yt=αxt+(1−α)yt−1, therefore there is an assumption that x0 is not an ordinary value but rather an exponentially weighted moment of the infinite series up to that point.
One must have 0<α≤1, and while since version 0.18.0 it has been possible to pass α directly, it’s often easier to think about either the span, center of mass (com) or half-life of an EW moment:
In [8]:
airline['EWMA12'] = airline['Thousands of Passengers'].ewm(span = 12).mean()
In [9]:
airline[['Thousands of Passengers','EWMA12']].plot(figsize = (12, 8))
Out[9]:
Great! That is all for now, let's move on to ARIMA modeling!