auto_arima
Pmdarima bring R's auto.arima
functionality to Python by wrapping statsmodel ARIMA
and SARIMAX
models into a singular scikit-learn-esque estimator (pmdarima.arima.ARIMA
) and adding several layers of degree and seasonal differencing tests to identify the optimal model parameters.
Pmdarima ARIMA models:
In [1]:
import numpy as np
import pmdarima as pm
print('numpy version: %r' % np.__version__)
print('pmdarima version: %r' % pm.__version__)
We'll start by defining an array of data from an R time-series, wineind
:
> forecast::wineind
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1980 15136 16733 20016 17708 18019 19227 22893 23739 21133 22591 26786 29740
1981 15028 17977 20008 21354 19498 22125 25817 28779 20960 22254 27392 29945
1982 16933 17892 20533 23569 22417 22084 26580 27454 24081 23451 28991 31386
1983 16896 20045 23471 21747 25621 23859 25500 30998 24475 23145 29701 34365
1984 17556 22077 25702 22214 26886 23191 27831 35406 23195 25110 30009 36242
1985 18450 21845 26488 22394 28057 25451 24872 33424 24052 28449 33533 37351
1986 19969 21701 26249 24493 24603 26485 30723 34569 26689 26157 32064 38870
1987 21337 19419 23166 28286 24570 24001 33151 24878 26804 28967 33311 40226
1988 20504 23060 23562 27562 23940 24584 34303 25517 23494 29095 32903 34379
1989 16991 21109 23740 25552 21752 20294 29009 25500 24166 26960 31222 38641
1990 14672 17543 25453 32683 22449 22316 27595 25451 25421 25288 32568 35110
1991 16052 22146 21198 19543 22084 23816 29961 26773 26635 26972 30207 38687
1992 16974 21697 24179 23757 25013 24019 30345 24488 25156 25650 30923 37240
1993 17466 19463 24352 26805 25236 24735 29356 31234 22724 28496 32857 37198
1994 13652 22784 23565 26323 23779 27549 29660 23356
Note that the frequency of the data is 12:
> frequency(forecast::wineind)
[1] 12
In [2]:
from pmdarima.datasets import load_wineind
# this is a dataset from R
wineind = load_wineind().astype(np.float64)
In [3]:
from pmdarima.arima import ARIMA
fit = ARIMA(order=(1, 1, 1), seasonal_order=(0, 1, 1, 12)).fit(y=wineind)
Also note that your data does not have to exhibit seasonality to work with an ARIMA. We could fit an ARIMA against the same data with no seasonal terms whatsoever (but it is unlikely that it will perform better; quite the opposite, likely).
In [4]:
fit = ARIMA(order=(1, 1, 1), seasonal_order=None).fit(y=wineind)
auto_arima
:If you are unsure (as is common) of the best parameters for your model, let auto_arima
figure it out for you. auto_arima
is similar to an ARIMA-specific grid search, but (by default) uses a more intelligent stepwise
algorithm laid out in a paper by Hyndman and Khandakar (2008). If stepwise
is False, the models will be fit similar to a gridsearch. Note that it is possible for auto_arima
not to find a model that will converge; if this is the case, it will raise a ValueError
.
In [5]:
# fitting a stepwise model:
stepwise_fit = pm.auto_arima(wineind, start_p=1, start_q=1, max_p=3, max_q=3, m=12,
start_P=0, seasonal=True, d=1, D=1, trace=True,
error_action='ignore', # don't want to know if an order does not work
suppress_warnings=True, # don't want convergence warnings
stepwise=True) # set to stepwise
stepwise_fit.summary()
Out[5]:
In [6]:
rs_fit = pm.auto_arima(wineind, start_p=1, start_q=1, max_p=3, max_q=3, m=12,
start_P=0, seasonal=True, d=1, D=1, trace=True,
n_jobs=-1, # We can run this in parallel by controlling this option
error_action='ignore', # don't want to know if an order does not work
suppress_warnings=True, # don't want convergence warnings
stepwise=False, random=True, random_state=42, # we can fit a random search (not exhaustive)
n_fits=25)
rs_fit.summary()
Out[6]:
In [7]:
from bokeh.plotting import figure, show, output_notebook
import pandas as pd
# init bokeh
output_notebook()
def plot_arima(truth, forecasts, title="ARIMA", xaxis_label='Time',
yaxis_label='Value', c1='#A6CEE3', c2='#B2DF8A',
forecast_start=None, **kwargs):
# make truth and forecasts into pandas series
n_truth = truth.shape[0]
n_forecasts = forecasts.shape[0]
# always plot truth the same
truth = pd.Series(truth, index=np.arange(truth.shape[0]))
# if no defined forecast start, start at the end
if forecast_start is None:
idx = np.arange(n_truth, n_truth + n_forecasts)
else:
idx = np.arange(forecast_start, n_forecasts)
forecasts = pd.Series(forecasts, index=idx)
# set up the plot
p = figure(title=title, plot_height=400, **kwargs)
p.grid.grid_line_alpha=0.3
p.xaxis.axis_label = xaxis_label
p.yaxis.axis_label = yaxis_label
# add the lines
p.line(truth.index, truth.values, color=c1, legend='Observed')
p.line(forecasts.index, forecasts.values, color=c2, legend='Forecasted')
return p
In [8]:
in_sample_preds = stepwise_fit.predict_in_sample()
in_sample_preds[:10]
Out[8]:
In [9]:
show(plot_arima(wineind, in_sample_preds,
title="Original Series & In-sample Predictions",
c2='#FF0000', forecast_start=0))
In [10]:
next_25 = stepwise_fit.predict(n_periods=25)
next_25
Out[10]:
In [11]:
# call the plotting func
show(plot_arima(wineind, next_25))
ARIMAs create forecasts by using the latest observations. Over time, your forecasts will drift, and you'll need to update the model with the observed values. There are several solutions to this problem:
auto_arima
function, or re-run auto_arima
altogether.update
method (preferred). This will allow your model to update its parameters by taking several more MLE steps on new observations (controlled by the maxiter
arg) starting from the parameters it's already discovered. This approach will help you avoid over-fitting.For this example, let's update our existing model with the next_25
we just computed, as if they were actually observed values.
In [12]:
stepwise_fit.update(next_25, maxiter=10) # take 10 more steps
stepwise_fit.summary()
Out[12]:
In [14]:
updated_data = np.concatenate([wineind, next_25])
In [15]:
# visualize new forecasts
show(plot_arima(updated_data, stepwise_fit.predict(n_periods=10)))
In [ ]: