This notebook covers the use of 'holidays' in the Prophet forecasting library. In this notebook, we will extend the previous example (http://pythondata.com/forecasting-time-series-data-prophet-jupyter-notebook/) to use holidays in the forecasting.
In [1]:
import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize']=(20,10)
plt.style.use('ggplot')
In [2]:
sales_df = pd.read_csv('../examples/retail_sales.csv', index_col='date', parse_dates=True)
In [3]:
sales_df.head()
Out[3]:
In [4]:
df = sales_df.reset_index()
In [5]:
df.head()
Out[5]:
Let's rename the columns as required by fbprophet. Additioinally, fbprophet doesn't like the index to be a datetime...it wants to see 'ds' as a non-index column, so we won't set an index differnetly than the integer index.
In [6]:
df=df.rename(columns={'date':'ds', 'sales':'y'})
In [7]:
df.head()
Out[7]:
Now's a good time to take a look at your data. Plot the data using pandas' plot
function
In [8]:
df.set_index('ds').y.plot()
Out[8]:
We can see from this data that there is a spike in the same month each year. While spike could be due to many different reasons, let's assume its because there's a major promotion that this company runs every year at that time, which is in December for this dataset.
Because we know this promotion occurs every december, we want to use this knowledge to help prophet better forecast those months, so we'll use prohpet's holiday
construct (explained here https://facebookincubator.github.io/prophet/docs/holiday_effects.html).
The holiday constrict is a pandas dataframe with the holiday and date of the holiday. For this example, the construct would look like this:
promotions = pd.DataFrame({
'holiday': 'december_promotion',
'ds': pd.to_datetime(['2009-12-01', '2010-12-01', '2011-12-01', '2012-12-01',
'2013-12-01', '2014-12-01', '2015-12-01']),
'lower_window': 0,
'upper_window': 0,
})
This promotions
dataframe consisists of promotion dates for Dec in 2009 through 2015, The lower_window
and upper_window
values are set to zero to indicate that we don't want prophet to consider any other months than the ones listed.
In [9]:
promotions = pd.DataFrame({
'holiday': 'december_promotion',
'ds': pd.to_datetime(['2009-12-01', '2010-12-01', '2011-12-01', '2012-12-01',
'2013-12-01', '2014-12-01', '2015-12-01']),
'lower_window': 0,
'upper_window': 0,
})
In [10]:
promotions
Out[10]:
To continue, we need to log-transform our data:
In [11]:
df['y'] = np.log(df['y'])
In [12]:
df.tail()
Out[12]:
Now, let's set prophet up to begin modeling our data using our promotions
dataframe as part of the forecast
Note: Since we are using monthly data, you'll see a message from Prophet saying Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.
This is OK since we are workign with monthly data but you can disable it by using weekly_seasonality=True
in the instantiation of Prophet.
In [13]:
model = Prophet(holidays=promotions)
model.fit(df);
We've instantiated the model, now we need to build some future dates to forecast into.
In [14]:
future = model.make_future_dataframe(periods=24, freq = 'm')
future.tail()
Out[14]:
To forecast this future data, we need to run it through Prophet's model.
In [15]:
forecast = model.predict(future)
The resulting forecast dataframe contains quite a bit of data, but we really only care about a few columns. First, let's look at the full dataframe:
In [16]:
forecast.tail()
Out[16]:
We really only want to look at yhat, yhat_lower and yhat_upper, so we can do that with:
In [17]:
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
Out[17]:
In [18]:
model.plot(forecast);
Personally, I'm not a fan of this visualization but I'm not going to build my own...you can see how I do that here: https://github.com/urgedata/pythondata/blob/master/fbprophet/fbprophet_part_one.ipynb.
Additionally, prophet let's us take a at the components of our model, including the holidays. This component plot is an important plot as it lets you see the components of your model including the trend and seasonality (identified in the yearly
pane).
In [19]:
model.plot_components(forecast);
In [20]:
model_no_holiday = Prophet()
model_no_holiday.fit(df);
In [21]:
future_no_holiday = model_no_holiday.make_future_dataframe(periods=24, freq = 'm')
future_no_holiday.tail()
Out[21]:
In [22]:
forecast_no_holiday = model_no_holiday.predict(future)
Let's compare the two forecasts now. Note: I doubt there will be much difference in these models due to the small amount of data, but its a good example to see the process. We'll set the indexes and then join the forecast dataframes into a new dataframe called 'compared_df'.
In [23]:
forecast.set_index('ds', inplace=True)
forecast_no_holiday.set_index('ds', inplace=True)
compared_df = forecast.join(forecast_no_holiday, rsuffix="_no_holiday")
We are only really insterested in the yhat values, so let's remove all the rest and convert the logged values back to their original scale.
In [24]:
compared_df= np.exp(compared_df[['yhat', 'yhat_no_holiday']])
Now, let's take the percentage difference and the average difference for the model with holidays vs that without.
In [25]:
compared_df['diff_per'] = 100*(compared_df['yhat'] - compared_df['yhat_no_holiday']) / compared_df['yhat_no_holiday']
compared_df.tail()
Out[25]:
In [26]:
compared_df['diff_per'].mean()
Out[26]:
This isn't an enormous differnece, (<1%) but there is some difference between using holidays and not using holidays.
If you know there are holidays or events happening that might help/hurt your forecasting efforts, prophet allows you to easily incorporate them into your modeling.
In [ ]: