In [1]:
    
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
%matplotlib inline
# the main library has a small set of functionality
from stldecompose import decompose, forecast
from stldecompose.forecast_funcs import (naive,
                                         drift, 
                                         mean, 
                                         seasonal_naive)
%load_ext autoreload
%autoreload 2
    
We'll use some of the data that comes pre-packaged with statsmodels to demonstrate the library functionality. The data set below comprises incomplete, daily measurements of CO2 levels in Hawaii.
Note: at the time of this writing, the current release of statsmodels includes a utility method for loading these datasets as a pandas.DataFrame which appears to be broken. Below is a short hack inspired by the current master branch on the statsmodels GitHub page.
In [2]:
    
def get_statsmodels_df():
    """Return packaged data in a pandas.DataFrame"""
    # some hijinks to get around outdated statsmodels code
    dataset = sm.datasets.co2.load()
    start = dataset.data['date'][0].decode('utf-8')
    index = pd.date_range(start=start, periods=len(dataset.data), freq='W-SAT')
    obs = pd.DataFrame(dataset.data['co2'], index=index, columns=['co2'])
    return obs
    
In [3]:
    
obs = get_statsmodels_df()
obs.head()
    
    Out[3]:
Because it's based on some existing statsmodels functionality, STLDecompose requires two things of the input dataframe:
pandas DateTimeIndexSince these are both very situation-dependent, we leave it to the user to define how they want to acheive these goals - pandas provides a number of ways to work with missing data. In particular, the functions shown below make these steps relatively straightforward. Below, we add use linear interpolation, and resample to daily observations. The resulting frame meets both of our criteria.
In [4]:
    
obs = (obs
       .resample('D')
       .mean()
       .interpolate('linear'))
obs.head(10)
    
    Out[4]:
In [5]:
    
obs.index
    
    Out[5]:
In [6]:
    
obs.head(1000).plot()
    
    Out[6]:
    
One of the primary pieces of functionality is the STL decomposition. The associated method requires the observation frame, and the primary (largest) period of seasonality. This period is specified in terms of index positions, and so care is needed for the user to correctly specify the periodicity in terms of their observations.
For example, with daily observations and large annual cycles, period=365. For hourly observations with large daily cycles, period=24. Some inspection, and trial and error may be helpful.
In [7]:
    
decomp = decompose(obs, period=365)
decomp
    
    Out[7]:
The resulting object is an extended version of the statsmodels.tsa.seasonal.DecomposeResult. Like the statsmodels object, the arrays of values are available on the object (the observations; and the trend, seasonal, and residual components). An extra attribute (the average seasonal cycle) has been added for the purpose of forecasting.
We inherit the built-in .plot() method on the object.
In [8]:
    
decomp.plot();
    
    
While the STL decomposition is interesting on its own, STLDecompose also provides some relatively naive capabilities for using the decomposition to forecast based on our observations.
We'll use the same data set, but pretend that we only had the first two third of observations. Then we can compare our forecast to the real observation data.
In [9]:
    
len(obs)
    
    Out[9]:
In [10]:
    
short_obs = obs.head(10000)
    
In [11]:
    
# apply the decomp to the truncated observation
short_decomp = decompose(short_obs, period=365)
short_decomp
    
    Out[11]:
The forecast() method requires the following arguments:
DecomposeResultThere are a handful of predefined functions that can be imported from the stldecompose.forecast_funcs module. These implementations are based on Hyndman's online textbook. The user can also define their own forecast function, following the patterns demonstrated in the predefined functions.
The return type of the forecast() method is a pandas.Dataframe with a column name that represents the forecast function and an appropriate DatetimeIndex.
In [12]:
    
fcast = forecast(short_decomp, steps=8000, fc_func=drift)
fcast.head()
    
    Out[12]:
If desired, we can then plot the corresponding components of the observation and forecast to check and verify the results.
In [13]:
    
plt.plot(obs, '--', label='truth')
plt.plot(short_obs, '--', label='obs')
plt.plot(short_decomp.trend, ':', label='decomp.trend')
plt.plot(fcast, '-', label=fcast.columns[0])
plt.xlim('1970','2004'); plt.ylim(330,380);
plt.legend();
    
    
To include the estimated seasonal component in the forecast, use the boolean seasonal keyword.
In [14]:
    
fcast = forecast(short_decomp, steps=8000, fc_func=drift, seasonal=True)
plt.plot(obs, '--', label='truth')
plt.plot(short_obs, '--', label='obs')
plt.plot(short_decomp.trend, ':', label='decomp.trend')
plt.plot(fcast, '-', label=fcast.columns[0])
plt.xlim('1970','2004'); plt.ylim(330,380);
plt.legend();
    
    
In [15]:
    
fcast.head()
    
    Out[15]:
Enjoy.
In [ ]: