Time series forecasting algorithms in PyAF

This document describes the algorithmic aspects of time series forecasting in PyAF. We will describe:

  1. The overall algorithm
  2. The detail of the signal decomposition
  3. The machine learning aspects
  4. Advanced usage/control of the algorithms.
  5. Hierarchical forecasting.

Warning : This document ins intended for advanced uses of PyAF. The aspects described here are not useful in a typical forecasting use case.

Generic Algorithmic Choices

PyAF uses a machine learning approach to forecasting. A lot of time series models are generated and their forecasting quality is compared on a validation dataset (most recent part of the whole signal). To summarize, PyAF is performing a competition between a large set of possible models/hypothesis and selecting the best to perform the final forecast.

the models used/tested in PyAF are signal decompositions generated on the fly internally. An additive signal decomposition is the sum of a trend (long term) , periodic and an irregular component as described in http://en.wikipedia.org/wiki/Decomposition_of_time_series

PyAF generates tens of possible decomposition for the input signal and outputs the best. One can control the amount/types of decompositions, enable/disable such of such component, and review the performance of each decomposition internally.

In addition to the decomposition , PyAF allows a whole set of possible signal transformations performed in a pre-processing phase (before decomposition) and a post-processing step (after forecasting).

PyAF performs the forecasting task of a signal $X_t$ in three steps described below:

  1. Signal Transformation : $$ Y_t = \phi(X_t) $$
  2. Decomposition of the transformed signal : $$ \hat{Y_t} = T_t + C_t + I_t $$
  3. Back transformation of the forecast : $$ \hat{X_t} = \phi^{-1}(\hat{Y_t}) $$

Signal Decompositions with PyAF

PyAF supports the following operations :

lKnownTransformations = ['None', 'Difference', 'RelativeDifference','Integration', 'BoxCox', 'Quantization', 'Logit', 'Fisher', 'Anscombe'];

lKnownTrends = ['ConstantTrend', 'Lag1Trend', 'LinearTrend', 'PolyTrend','MovingAverage', 'MovingMedian'];

lKnownPeriodics = ['NoCycle', 'BestCycle', 'Seasonal_MonthOfYear' , 'Seasonal_Second' ,'Seasonal_Minute' ,'Seasonal_Hour' ,'Seasonal_DayOfWeek' , 'Seasonal_DayOfMonth', 'Seasonal_WeekOfYear'];

lKnownAutoRegressions = ['NoAR' , 'AR' , 'ARX' , 'SVR' , 'MLP' , 'LSTM'];

Transformations

  1. None : $$Y_t = \phi(X_t) = X_t $$
  2. Difference : $$Y_t = \phi(X_t) = X_t - X_{t-1} $$
  3. Relative Difference : $$Y_t = \phi(X_t) = \frac{X_t - X_{t-1}}{X_{t-1}} $$
  4. Integration : $$Y_t = \phi(X_t) = \sum_{s=0}^{s=t} X_s $$

Optional Transformations:

  1. BoxCox : $$Y_t = \phi(X_t) = \frac{X_t^\lambda - 1}{\lambda} $$
  2. Quantization : $$Y_t = \phi(X_t) = quantile(X_t) $$
  1. ConstantTrend : $$T_t = a $$
  2. LinearTrend : $$T_t = a t + b$$
  3. PolyTrend : $$T_t = a t^2 + b t + c$$
  4. Lag1Trend : $$T_t = Y_t - Y_{t-1} $$

Optional Models

  1. MovingAverage : $$T_t = \sum_{s=t-k}^{s=t-1} Y_s$$
  2. MovingMedian : $$T_t = median(Y_{t-k}, \dots, Y_{t-1})$$

Periodicities

  1. None: $$C_t = 0 $$
  2. BestCycle (extracted automatically) : $$ C_t = C_{t-p}$$
  3. Seasonality (depends on date parts) : $$ C_t = minute(t) , hour(t), etc$$

Irregular Components

These models are built of the residues of trend and cycles:

$$Z_t = Y_t - T_t - C_t $$

We consider here some models based on the residue lags ($Lag(Z)_t = Z_{t-k}$).

The models described here as implemented using external libraries, either scikit-learn (AR, ARX, SVR) or keras (MLP, LSTM).

  1. None : $$I_t = 0$$
  2. Autoregressive (AR) model : $$I_t = a_0 + \sum_{k=1}^{k=p} a_k Z_{t-k}$$
  3. Autoregressive with Exogenous data (ARX) model : $$I_t = a_0 + \sum_{k=1}^{k=p} a_k Z_{t-k} + \sum_{k=1}^{k=p} b_k Exog_{t-k} $$

Experimental Models :

  1. Support Vector Regression (SVR) : $$I_t = SVR(target = Z_t , inputs = \{Z_{t-1} , \dots, Z_{t-p}\})$$
  2. MultiLayer Perceptron (MLP) : $$I_t = MLP(target = Z_t , inputs = \{Z_{t-1} , \dots, Z_{t-p}\})$$
  3. LSTM : $$I_t = LSTM(target = Z_t , inputs = \{Z_{t-1} , \dots, Z_{t-p}\})$$

The parameter $p$ repesents the dependency on the past. It can be customized.

WIP ***


In [ ]: