This notebook covers the use of 'holidays' in the Prophet forecasting library. In this notebook, we will extend the previous example (http://pythondata.com/forecasting-time-series-data-prophet-jupyter-notebook/) to use holidays in the forecasting.

Import necessary libraries


In [1]:
import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt
 
%matplotlib inline
 
plt.rcParams['figure.figsize']=(20,10)
plt.style.use('ggplot')

Read in the data

Read the data in from the retail sales CSV file in the examples folder then set the index to the 'date' column. We are also parsing dates in the data file.


In [2]:
sales_df = pd.read_csv('../examples/retail_sales.csv', index_col='date', parse_dates=True)

In [3]:
sales_df.head()


Out[3]:
sales
date
2009-10-01 338630
2009-11-01 339386
2009-12-01 400264
2010-01-01 314640
2010-02-01 311022

Prepare for Prophet

As explained in previous prophet posts, for prophet to work, we need to change the names of these columns to 'ds' and 'y'.


In [4]:
df = sales_df.reset_index()

In [5]:
df.head()


Out[5]:
date sales
0 2009-10-01 338630
1 2009-11-01 339386
2 2009-12-01 400264
3 2010-01-01 314640
4 2010-02-01 311022

Let's rename the columns as required by fbprophet. Additioinally, fbprophet doesn't like the index to be a datetime...it wants to see 'ds' as a non-index column, so we won't set an index differnetly than the integer index.


In [6]:
df=df.rename(columns={'date':'ds', 'sales':'y'})

In [7]:
df.head()


Out[7]:
ds y
0 2009-10-01 338630
1 2009-11-01 339386
2 2009-12-01 400264
3 2010-01-01 314640
4 2010-02-01 311022

Now's a good time to take a look at your data. Plot the data using pandas' plot function


In [8]:
df.set_index('ds').y.plot()


Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7fa42a82ef90>

Reviewing the Data

We can see from this data that there is a spike in the same month each year. While spike could be due to many different reasons, let's assume its because there's a major promotion that this company runs every year at that time, which is in December for this dataset.

Because we know this promotion occurs every december, we want to use this knowledge to help prophet better forecast those months, so we'll use prohpet's holiday construct (explained here https://facebookincubator.github.io/prophet/docs/holiday_effects.html).

The holiday constrict is a pandas dataframe with the holiday and date of the holiday. For this example, the construct would look like this:

promotions = pd.DataFrame({ 'holiday': 'december_promotion', 'ds': pd.to_datetime(['2009-12-01', '2010-12-01', '2011-12-01', '2012-12-01', '2013-12-01', '2014-12-01', '2015-12-01']), 'lower_window': 0, 'upper_window': 0, })

This promotions dataframe consisists of promotion dates for Dec in 2009 through 2015, The lower_window and upper_window values are set to zero to indicate that we don't want prophet to consider any other months than the ones listed.


In [9]:
promotions = pd.DataFrame({
  'holiday': 'december_promotion',
  'ds': pd.to_datetime(['2009-12-01', '2010-12-01', '2011-12-01', '2012-12-01',
                        '2013-12-01', '2014-12-01', '2015-12-01']),
  'lower_window': 0,
  'upper_window': 0,
})

In [10]:
promotions


Out[10]:
ds holiday lower_window upper_window
0 2009-12-01 december_promotion 0 0
1 2010-12-01 december_promotion 0 0
2 2011-12-01 december_promotion 0 0
3 2012-12-01 december_promotion 0 0
4 2013-12-01 december_promotion 0 0
5 2014-12-01 december_promotion 0 0
6 2015-12-01 december_promotion 0 0

To continue, we need to log-transform our data:


In [11]:
df['y'] = np.log(df['y'])

In [12]:
df.tail()


Out[12]:
ds y
67 2015-05-01 13.044650
68 2015-06-01 13.013060
69 2015-07-01 13.033991
70 2015-08-01 13.030993
71 2015-09-01 12.973671

Running Prophet

Now, let's set prophet up to begin modeling our data using our promotions dataframe as part of the forecast

Note: Since we are using monthly data, you'll see a message from Prophet saying Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. This is OK since we are workign with monthly data but you can disable it by using weekly_seasonality=True in the instantiation of Prophet.


In [13]:
model = Prophet(holidays=promotions)
model.fit(df);


Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.

We've instantiated the model, now we need to build some future dates to forecast into.


In [14]:
future = model.make_future_dataframe(periods=24, freq = 'm')
future.tail()


Out[14]:
ds
91 2017-04-30
92 2017-05-31
93 2017-06-30
94 2017-07-31
95 2017-08-31

To forecast this future data, we need to run it through Prophet's model.


In [15]:
forecast = model.predict(future)

The resulting forecast dataframe contains quite a bit of data, but we really only care about a few columns. First, let's look at the full dataframe:


In [16]:
forecast.tail()


Out[16]:
ds t trend seasonal_lower seasonal_upper trend_lower trend_upper yhat_lower yhat_upper december_promotion december_promotion_lower december_promotion_upper yearly yearly_lower yearly_upper seasonal yhat
91 2017-04-30 1.280888 13.045281 0.017746 0.017746 12.921667 13.164843 12.939614 13.184197 0.0 0.0 0.0 0.017746 0.017746 0.017746 0.017746 13.063027
92 2017-05-31 1.295234 13.046923 0.008715 0.008715 12.917939 13.175036 12.923197 13.182217 0.0 0.0 0.0 0.008715 0.008715 0.008715 0.008715 13.055638
93 2017-06-30 1.309116 13.048513 0.026016 0.026016 12.909993 13.184642 12.932721 13.211859 0.0 0.0 0.0 0.026016 0.026016 0.026016 0.026016 13.074529
94 2017-07-31 1.323461 13.050155 0.000594 0.000594 12.902636 13.196556 12.897856 13.198499 0.0 0.0 0.0 0.000594 0.000594 0.000594 0.000594 13.050750
95 2017-08-31 1.337807 13.051798 -0.027989 -0.027989 12.891882 13.209361 12.855695 13.179740 0.0 0.0 0.0 -0.027989 -0.027989 -0.027989 -0.027989 13.023809

We really only want to look at yhat, yhat_lower and yhat_upper, so we can do that with:


In [17]:
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()


Out[17]:
ds yhat yhat_lower yhat_upper
91 2017-04-30 13.063027 12.939614 13.184197
92 2017-05-31 13.055638 12.923197 13.182217
93 2017-06-30 13.074529 12.932721 13.211859
94 2017-07-31 13.050750 12.897856 13.198499
95 2017-08-31 13.023809 12.855695 13.179740

Plotting Prophet results

Prophet has a plotting mechanism called plot. This plot functionality draws the original data (black dots), the model (blue line) and the error of the forecast (shaded blue area).


In [18]:
model.plot(forecast);


Personally, I'm not a fan of this visualization but I'm not going to build my own...you can see how I do that here: https://github.com/urgedata/pythondata/blob/master/fbprophet/fbprophet_part_one.ipynb.

Additionally, prophet let's us take a at the components of our model, including the holidays. This component plot is an important plot as it lets you see the components of your model including the trend and seasonality (identified in the yearly pane).


In [19]:
model.plot_components(forecast);


Comparing holidays vs no-holidays forecasts

Let's re-run our prophet model without our promotions/holidays for comparison.


In [20]:
model_no_holiday = Prophet()
model_no_holiday.fit(df);


Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.

In [21]:
future_no_holiday = model_no_holiday.make_future_dataframe(periods=24, freq = 'm')
future_no_holiday.tail()


Out[21]:
ds
91 2017-04-30
92 2017-05-31
93 2017-06-30
94 2017-07-31
95 2017-08-31

In [22]:
forecast_no_holiday = model_no_holiday.predict(future)

Let's compare the two forecasts now. Note: I doubt there will be much difference in these models due to the small amount of data, but its a good example to see the process. We'll set the indexes and then join the forecast dataframes into a new dataframe called 'compared_df'.


In [23]:
forecast.set_index('ds', inplace=True)
forecast_no_holiday.set_index('ds', inplace=True)
compared_df = forecast.join(forecast_no_holiday, rsuffix="_no_holiday")

We are only really insterested in the yhat values, so let's remove all the rest and convert the logged values back to their original scale.


In [24]:
compared_df= np.exp(compared_df[['yhat', 'yhat_no_holiday']])

Now, let's take the percentage difference and the average difference for the model with holidays vs that without.


In [25]:
compared_df['diff_per'] = 100*(compared_df['yhat'] - compared_df['yhat_no_holiday']) / compared_df['yhat_no_holiday']
compared_df.tail()


Out[25]:
yhat yhat_no_holiday diff_per
ds
2017-04-30 471194.800038 471459.411475 -0.056126
2017-05-31 467726.135806 468116.264965 -0.083340
2017-06-30 476645.649992 476858.563032 -0.044649
2017-07-31 465445.198797 465480.406592 -0.007564
2017-08-31 453073.129385 453067.919980 0.001150

In [26]:
compared_df['diff_per'].mean()


Out[26]:
0.065771316509064426

This isn't an enormous differnece, (<1%) but there is some difference between using holidays and not using holidays.

If you know there are holidays or events happening that might help/hurt your forecasting efforts, prophet allows you to easily incorporate them into your modeling.


In [ ]: