This notebook covers the use of 'holidays' in the Prophet forecasting library. In this notebook, we will extend the previous example (http://pythondata.com/forecasting-time-series-data-prophet-jupyter-notebook/) to use holidays in the forecasting.

Import necessary libraries



In [1]:

    
import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt
 
%matplotlib inline
 
plt.rcParams['figure.figsize']=(20,10)
plt.style.use('ggplot')

Read in the data

Read the data in from the retail sales CSV file in the examples folder then set the index to the 'date' column. We are also parsing dates in the data file.



In [2]:

    
sales_df = pd.read_csv('../examples/retail_sales.csv', index_col='date', parse_dates=True)



In [3]:

    
sales_df.head()

Prepare for Prophet

As explained in previous prophet posts, for prophet to work, we need to change the names of these columns to 'ds' and 'y'.



In [4]:

    
df = sales_df.reset_index()



In [5]:

    
df.head()

Let's rename the columns as required by fbprophet. Additioinally, fbprophet doesn't like the index to be a datetime...it wants to see 'ds' as a non-index column, so we won't set an index differnetly than the integer index.



In [6]:

    
df=df.rename(columns={'date':'ds', 'sales':'y'})



In [7]:

    
df.head()

Now's a good time to take a look at your data. Plot the data using pandas' plot function



In [8]:

    
df.set_index('ds').y.plot()









    Out[8]:





<matplotlib.axes._subplots.AxesSubplot at 0x7fa42a82ef90>

Reviewing the Data

We can see from this data that there is a spike in the same month each year. While spike could be due to many different reasons, let's assume its because there's a major promotion that this company runs every year at that time, which is in December for this dataset.

Because we know this promotion occurs every december, we want to use this knowledge to help prophet better forecast those months, so we'll use prohpet's holiday construct (explained here https://facebookincubator.github.io/prophet/docs/holiday_effects.html).

The holiday constrict is a pandas dataframe with the holiday and date of the holiday. For this example, the construct would look like this:

promotions = pd.DataFrame({ 'holiday': 'december_promotion', 'ds': pd.to_datetime(['2009-12-01', '2010-12-01', '2011-12-01', '2012-12-01', '2013-12-01', '2014-12-01', '2015-12-01']), 'lower_window': 0, 'upper_window': 0, })

This promotions dataframe consisists of promotion dates for Dec in 2009 through 2015, The lower_window and upper_window values are set to zero to indicate that we don't want prophet to consider any other months than the ones listed.



In [9]:

    
promotions = pd.DataFrame({
  'holiday': 'december_promotion',
  'ds': pd.to_datetime(['2009-12-01', '2010-12-01', '2011-12-01', '2012-12-01',
                        '2013-12-01', '2014-12-01', '2015-12-01']),
  'lower_window': 0,
  'upper_window': 0,
})



In [10]:

    
promotions









    Out[10]:







  
    
      
      ds
      holiday
      lower_window
      upper_window
    
  
  
    
      0
      2009-12-01
      december_promotion
      0
      0
    
    
      1
      2010-12-01
      december_promotion
      0
      0
    
    
      2
      2011-12-01
      december_promotion
      0
      0
    
    
      3
      2012-12-01
      december_promotion
      0
      0
    
    
      4
      2013-12-01
      december_promotion
      0
      0
    
    
      5
      2014-12-01
      december_promotion
      0
      0
    
    
      6
      2015-12-01
      december_promotion
      0
      0

To continue, we need to log-transform our data:



In [11]:

    
df['y'] = np.log(df['y'])



In [12]:

    
df.tail()

Running Prophet

Now, let's set prophet up to begin modeling our data using our promotions dataframe as part of the forecast

Note: Since we are using monthly data, you'll see a message from Prophet saying Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. This is OK since we are workign with monthly data but you can disable it by using weekly_seasonality=True in the instantiation of Prophet.



In [13]:

    
model = Prophet(holidays=promotions)
model.fit(df);









    



Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.

We've instantiated the model, now we need to build some future dates to forecast into.



In [14]:

    
future = model.make_future_dataframe(periods=24, freq = 'm')
future.tail()

To forecast this future data, we need to run it through Prophet's model.



In [15]:

    
forecast = model.predict(future)

The resulting forecast dataframe contains quite a bit of data, but we really only care about a few columns. First, let's look at the full dataframe:



In [16]:

    
forecast.tail()









    Out[16]:







  
    
      
      ds
      t
      trend
      seasonal_lower
      seasonal_upper
      trend_lower
      trend_upper
      yhat_lower
      yhat_upper
      december_promotion
      december_promotion_lower
      december_promotion_upper
      yearly
      yearly_lower
      yearly_upper
      seasonal
      yhat
    
  
  
    
      91
      2017-04-30
      1.280888
      13.045281
      0.017746
      0.017746
      12.921667
      13.164843
      12.939614
      13.184197
      0.0
      0.0
      0.0
      0.017746
      0.017746
      0.017746
      0.017746
      13.063027
    
    
      92
      2017-05-31
      1.295234
      13.046923
      0.008715
      0.008715
      12.917939
      13.175036
      12.923197
      13.182217
      0.0
      0.0
      0.0
      0.008715
      0.008715
      0.008715
      0.008715
      13.055638
    
    
      93
      2017-06-30
      1.309116
      13.048513
      0.026016
      0.026016
      12.909993
      13.184642
      12.932721
      13.211859
      0.0
      0.0
      0.0
      0.026016
      0.026016
      0.026016
      0.026016
      13.074529
    
    
      94
      2017-07-31
      1.323461
      13.050155
      0.000594
      0.000594
      12.902636
      13.196556
      12.897856
      13.198499
      0.0
      0.0
      0.0
      0.000594
      0.000594
      0.000594
      0.000594
      13.050750
    
    
      95
      2017-08-31
      1.337807
      13.051798
      -0.027989
      -0.027989
      12.891882
      13.209361
      12.855695
      13.179740
      0.0
      0.0
      0.0
      -0.027989
      -0.027989
      -0.027989
      -0.027989
      13.023809

We really only want to look at yhat, yhat_lower and yhat_upper, so we can do that with:



In [17]:

    
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

Plotting Prophet results

Prophet has a plotting mechanism called plot. This plot functionality draws the original data (black dots), the model (blue line) and the error of the forecast (shaded blue area).



In [18]:

    
model.plot(forecast);

Personally, I'm not a fan of this visualization but I'm not going to build my own...you can see how I do that here: https://github.com/urgedata/pythondata/blob/master/fbprophet/fbprophet_part_one.ipynb.

Additionally, prophet let's us take a at the components of our model, including the holidays. This component plot is an important plot as it lets you see the components of your model including the trend and seasonality (identified in the yearly pane).



In [19]:

    
model.plot_components(forecast);

Comparing holidays vs no-holidays forecasts

Let's re-run our prophet model without our promotions/holidays for comparison.



In [20]:

    
model_no_holiday = Prophet()
model_no_holiday.fit(df);









    



Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.



In [21]:

    
future_no_holiday = model_no_holiday.make_future_dataframe(periods=24, freq = 'm')
future_no_holiday.tail()



In [22]:

    
forecast_no_holiday = model_no_holiday.predict(future)

Let's compare the two forecasts now. Note: I doubt there will be much difference in these models due to the small amount of data, but its a good example to see the process. We'll set the indexes and then join the forecast dataframes into a new dataframe called 'compared_df'.



In [23]:

    
forecast.set_index('ds', inplace=True)
forecast_no_holiday.set_index('ds', inplace=True)
compared_df = forecast.join(forecast_no_holiday, rsuffix="_no_holiday")

We are only really insterested in the yhat values, so let's remove all the rest and convert the logged values back to their original scale.



In [24]:

    
compared_df= np.exp(compared_df[['yhat', 'yhat_no_holiday']])

Now, let's take the percentage difference and the average difference for the model with holidays vs that without.



In [25]:

    
compared_df['diff_per'] = 100*(compared_df['yhat'] - compared_df['yhat_no_holiday']) / compared_df['yhat_no_holiday']
compared_df.tail()









    Out[25]:







  
    
      
      yhat
      yhat_no_holiday
      diff_per
    
    
      ds
      
      
      
    
  
  
    
      2017-04-30
      471194.800038
      471459.411475
      -0.056126
    
    
      2017-05-31
      467726.135806
      468116.264965
      -0.083340
    
    
      2017-06-30
      476645.649992
      476858.563032
      -0.044649
    
    
      2017-07-31
      465445.198797
      465480.406592
      -0.007564
    
    
      2017-08-31
      453073.129385
      453067.919980
      0.001150



In [26]:

    
compared_df['diff_per'].mean()









    Out[26]:





0.065771316509064426

This isn't an enormous differnece, (<1%) but there is some difference between using holidays and not using holidays.

If you know there are holidays or events happening that might help/hurt your forecasting efforts, prophet allows you to easily incorporate them into your modeling.



In [ ]:

	sales
date
2009-10-01	338630
2009-11-01	339386
2009-12-01	400264
2010-01-01	314640
2010-02-01	311022

	date	sales
0	2009-10-01	338630
1	2009-11-01	339386
2	2009-12-01	400264
3	2010-01-01	314640
4	2010-02-01	311022

	ds	y
0	2009-10-01	338630
1	2009-11-01	339386
2	2009-12-01	400264
3	2010-01-01	314640
4	2010-02-01	311022

	ds	y
67	2015-05-01	13.044650
68	2015-06-01	13.013060
69	2015-07-01	13.033991
70	2015-08-01	13.030993
71	2015-09-01	12.973671

	ds	holiday
0	2009-12-01	december_promotion
1	2010-12-01	december_promotion
2	2011-12-01	december_promotion
3	2012-12-01	december_promotion
4	2013-12-01	december_promotion
5	2014-12-01	december_promotion
6	2015-12-01	december_promotion

	ds
91	2017-04-30
92	2017-05-31
93	2017-06-30
94	2017-07-31
95	2017-08-31

	ds	t	trend	seasonal_lower	seasonal_upper	trend_lower	trend_upper	yhat_lower	yhat_upper	yearly	yearly_lower	yearly_upper	seasonal	yhat
91	2017-04-30	1.280888	13.045281	0.017746	0.017746	12.921667	13.164843	12.939614	13.184197	0.017746	0.017746	0.017746	0.017746	13.063027
92	2017-05-31	1.295234	13.046923	0.008715	0.008715	12.917939	13.175036	12.923197	13.182217	0.008715	0.008715	0.008715	0.008715	13.055638
93	2017-06-30	1.309116	13.048513	0.026016	0.026016	12.909993	13.184642	12.932721	13.211859	0.026016	0.026016	0.026016	0.026016	13.074529
94	2017-07-31	1.323461	13.050155	0.000594	0.000594	12.902636	13.196556	12.897856	13.198499	0.000594	0.000594	0.000594	0.000594	13.050750
95	2017-08-31	1.337807	13.051798	-0.027989	-0.027989	12.891882	13.209361	12.855695	13.179740	-0.027989	-0.027989	-0.027989	-0.027989	13.023809

	yhat	yhat_no_holiday	diff_per
ds
2017-04-30	471194.800038	471459.411475	-0.056126
2017-05-31	467726.135806	468116.264965	-0.083340
2017-06-30	476645.649992	476858.563032	-0.044649
2017-07-31	465445.198797	465480.406592	-0.007564
2017-08-31	453073.129385	453067.919980	0.001150