In [1]:
# remove the notebook root logger.\n",
import logging
logger = logging.getLogger()
logger.handlers = []
PyAF allows using some external sources to improve its forecasts.
In addition to the training dataset, the user can provide an external table with exogenous variables data. This table is merged with the signal dataset when exogenous vairables values are need (either when training the model or when producing forecasts).
The signal and exogenous variables can come from the same table (self-join).
All PyAF models are of the form of a linear decomposition (Trend + Periodic + AR). The exogenous avriables are introduced in the AR component through thier past values.
Before working with exogenous variables, they need to be first transformed into a numerical form (encoed). PyAF used standrd encoding procedures. Non-numerical exogenous variables are dummified (a binary column is created for each distinct column value) and numericcal columns are stadanrdiozed ( Y = (X-m)/s , where m is the mean and s is the standard deviation)
For demosntration puroposes, we tranform the Loas Angeles ozone dataset so that it contains some exogenous variables. In a real case, these variables can provide some information on Los Angeles (population , temperature, ...) on the same period.
Here, 4 variables have been created artificially.
In [2]:
import numpy as np
import pandas as pd
import datetime
csvfile_link = "https://raw.githubusercontent.com/antoinecarme/pyaf/master/data/ozone-la-exogenous-2.csv"
exog_dataframe = pd.read_csv(csvfile_link);
exog_dataframe['Date'] = exog_dataframe['Date'].astype(np.datetime64);
print(exog_dataframe.info())
exog_dataframe.head()
Out[2]:
This table contains for each 'Date' value, two numeriocal exogenous variable ('Exog2' and 'Exog3') and two character(object) variables ('Exog4' and 'Exog5')
This is how the encoded dataset looks like internally:
In [3]:
encoded_csvfile_link = "https://raw.githubusercontent.com/antoinecarme/pyaf/master/data/ozone_exogenous_encoded.csv"
encoded_ozone_dataframe = pd.read_csv(encoded_csvfile_link);
# print(encoded_ozone_dataframe.columns)
interesting_Columns = ['Date',
'Ozone', 'Exog2',
'Exog2',
'Exog3',
'Exog4=E', 'Exog4=F', 'Exog4=C', 'Exog4=D', 'Exog4=B',
'Exog5=K', 'Exog5=L', 'Exog5=M', 'Exog5=N'];
encoded_ozone_dataframe = encoded_ozone_dataframe[interesting_Columns]
print(encoded_ozone_dataframe.info())
encoded_ozone_dataframe.head()
Out[3]:
In [4]:
import pyaf.ForecastEngine as autof
lEngine = autof.cForecastEngine()
csvfile_link = "https://raw.githubusercontent.com/antoinecarme/TimeSeriesData/master/ozone-la.csv"
ozone_dataframe = pd.read_csv(csvfile_link);
ozone_dataframe['Date'] = ozone_dataframe['Month'].apply(lambda x : datetime.datetime.strptime(x, "%Y-%m"))
ozone_dataframe.info()
lExogenousData = (exog_dataframe , ['Exog2' , 'Exog3' , 'Exog4', 'Exog5'])
lEngine.train(ozone_dataframe , 'Date' , 'Ozone', 12 , lExogenousData);
In [5]:
lEngine.getModelInfo()
In this specific model, the ARX cxomponent shows that the most important predictors are (in this order):
In [6]:
lEngine_Without_Exogenous = autof.cForecastEngine()
lEngine_Without_Exogenous.train(ozone_dataframe , 'Date' , 'Ozone', 12);
In [7]:
lEngine_Without_Exogenous.getModelInfo()
In [8]:
ozone_forecast_without_exog = lEngine_Without_Exogenous.forecast(ozone_dataframe, 12);
ozone_forecast_with_exog = lEngine.forecast(ozone_dataframe, 12);
In [9]:
%matplotlib inline
ozone_forecast_without_exog.plot.line('Date', ['Ozone' , 'Ozone_Forecast',
'Ozone_Forecast_Lower_Bound',
'Ozone_Forecast_Upper_Bound'], grid = True, figsize=(12, 8))
ozone_forecast_with_exog.plot.line('Date', ['Ozone' , 'Ozone_Forecast',
'Ozone_Forecast_Lower_Bound',
'Ozone_Forecast_Upper_Bound'], grid = True, figsize=(12, 8))
Out[9]:
In [ ]: