Import necessary libraries



In [1]:

    
import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt
 
%matplotlib inline
 
plt.rcParams['figure.figsize']=(20,10)
plt.style.use('ggplot')

Read in the data

Read the data in from the retail sales CSV file in the examples folder then set the index to the 'date' column. We are also parsing dates in the data file.



In [2]:

    
sales_df = pd.read_csv('../examples/retail_sales.csv', index_col='date', parse_dates=True)



In [3]:

    
sales_df.head()

Prepare for Prophet

For prophet to work, we need to change the names of these columns to 'ds' and 'y', so lets just create a new dataframe and keep our old one handy (you'll see why later). The new dataframe will initially be created with an integer index so we can rename the columns



In [4]:

    
df = sales_df.reset_index()



In [5]:

    
df.head()

Let's rename the columns as required by fbprophet. Additioinally, fbprophet doesn't like the index to be a datetime...it wants to see 'ds' as a non-index column, so we won't set an index differnetly than the integer index.



In [6]:

    
df=df.rename(columns={'date':'ds', 'sales':'y'})



In [7]:

    
df.head()

Now's a good time to take a look at your data. Plot the data using pandas' plot function



In [8]:

    
df.set_index('ds').y.plot()









    Out[8]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f33b05bf790>

When working with time-series data, its good to take a look at the data to determine if trends exist, whether it is stationary, has any outliers and/or any other anamolies. Facebook prophet's example uses the log-transform as a way to remove some of these anomolies but it isn't the absolute 'best' way to do this...but given that its the example and a simple data series, I'll follow their lead for now. Taking the log of a number is easily reversible to be able to see your original data.

To log-transform your data, you can use numpy's log() function



In [9]:

    
df['y'] = np.log(df['y'])



In [10]:

    
df.tail()



In [11]:

    
df.set_index('ds').y.plot()









    Out[11]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f3380c76390>

As you can see in the above chart, the plot looks the same as the first one but just at a different scale.

Running Prophet

Now, let's set prophet up to begin modeling our data.

Note: Since we are using monthly data, you'll see a message from Prophet saying Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this. This is OK since we are workign with monthly data but you can disable it by using weekly_seasonality=True in the instantiation of Prophet.



In [12]:

    
model = Prophet()
model.fit(df);









    



Disabling weekly seasonality. Run prophet with weekly_seasonality=True to override this.

Forecasting is fairly useless unless you can look into the future, so we need to add some future dates to our dataframe. For this example, I want to forecast 2 years into the future, so I'll built a future dataframe with 24 periods since we are working with monthly data. Note the freq='m' inclusion to ensure we are adding 24 months of data.

This can be done with the following code:



In [13]:

    
future = model.make_future_dataframe(periods=24, freq = 'm')
future.tail()

To forecast this future data, we need to run it through Prophet's model.



In [14]:

    
forecast = model.predict(future)

The resulting forecast dataframe contains quite a bit of data, but we really only care about a few columns. First, let's look at the full dataframe:



In [15]:

    
forecast.tail()









    Out[15]:







  
    
      
      ds
      t
      trend
      seasonal_lower
      seasonal_upper
      trend_lower
      trend_upper
      yhat_lower
      yhat_upper
      yearly
      yearly_lower
      yearly_upper
      seasonal
      yhat
    
  
  
    
      91
      2017-04-30
      1.280888
      13.042878
      0.020710
      0.020710
      12.933548
      13.143430
      12.952123
      13.165723
      0.020710
      0.020710
      0.020710
      0.020710
      13.063588
    
    
      92
      2017-05-31
      1.295234
      13.044540
      0.011932
      0.011932
      12.923032
      13.152482
      12.938400
      13.164907
      0.011932
      0.011932
      0.011932
      0.011932
      13.056472
    
    
      93
      2017-06-30
      1.309116
      13.046148
      0.028827
      0.028827
      12.915811
      13.162192
      12.944827
      13.193656
      0.028827
      0.028827
      0.028827
      0.028827
      13.074975
    
    
      94
      2017-07-31
      1.323461
      13.047810
      0.003016
      0.003016
      12.910398
      13.171669
      12.915154
      13.178260
      0.003016
      0.003016
      0.003016
      0.003016
      13.050825
    
    
      95
      2017-08-31
      1.337807
      13.049471
      -0.025674
      -0.025674
      12.901938
      13.182222
      12.880747
      13.154949
      -0.025674
      -0.025674
      -0.025674
      -0.025674
      13.023797

We really only want to look at yhat, yhat_lower and yhat_upper, so we can do that with:



In [16]:

    
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

Plotting Prophet results

Prophet has a plotting mechanism called plot. This plot functionality draws the original data (black dots), the model (blue line) and the error of the forecast (shaded blue area).



In [17]:

    
model.plot(forecast);

Personally, I'm not a fan of this visualization so I like to break the data up and build a chart myself. The next section describes how I build my own visualization for Prophet modeling

Visualizing Prophet models

In order to build a useful dataframe to visualize our model versus our original data, we need to combine the output of the Prophet model with our original data set, then we'll build a new chart manually using pandas and matplotlib.

First, let's set our dataframes to have the same index of ds



In [18]:

    
df.set_index('ds', inplace=True)
forecast.set_index('ds', inplace=True)

Now, we'll combine the original data and our forecast model data



In [19]:

    
viz_df = sales_df.join(forecast[['yhat', 'yhat_lower','yhat_upper']], how = 'outer')

If we look at the head(), we see the data has been joined correctly but the scales of our original data (sales) and our model (yhat) are different. We need to rescale the yhat colums(s) to get the same scale, so we'll use numpy's exp function to do that.



In [20]:

    
viz_df.head()



In [21]:

    
viz_df['yhat_rescaled'] = np.exp(viz_df['yhat'])



In [22]:

    
viz_df.head()









    Out[22]:







  
    
      
      sales
      yhat
      yhat_lower
      yhat_upper
      yhat_rescaled
    
  
  
    
      2009-10-01
      338630.0
      12.730339
      12.719124
      12.741363
      337843.878333
    
    
      2009-11-01
      339386.0
      12.748734
      12.737656
      12.760092
      344115.810258
    
    
      2009-12-01
      400264.0
      12.884803
      12.873969
      12.896787
      394274.522610
    
    
      2010-01-01
      314640.0
      12.660496
      12.649357
      12.671431
      315052.880217
    
    
      2010-02-01
      311022.0
      12.656080
      12.644973
      12.667275
      313664.686287

Let's take a look at the sales and yhat_rescaled data together in a chart.



In [23]:

    
viz_df[['sales', 'yhat_rescaled']].plot()









    Out[23]:





<matplotlib.axes._subplots.AxesSubplot at 0x7f33800fc090>

You can see from the chart that the model (blue) is pretty good when plotted against the actual signal (orange) but I like to make my vizualization's a little better to understand. To build my 'better' visualization, we'll need to go back to our original sales_df and forecast dataframes.

First things first - we need to find the 2nd to last date of the original sales data in sales_df in order to ensure the original sales data and model data charts are connected.



In [24]:

    
sales_df.index = pd.to_datetime(sales_df.index) #make sure our index as a datetime object
connect_date = sales_df.index[-2] #select the 2nd to last date

Using the connect_date we can now grab only the model data that after that date (you'll see why in a minute). To do this, we'll mask the forecast data.



In [25]:

    
mask = (forecast.index > connect_date)
predict_df = forecast.loc[mask]



In [26]:

    
predict_df.head()









    Out[26]:







  
    
      
      t
      trend
      seasonal_lower
      seasonal_upper
      trend_lower
      trend_upper
      yhat_lower
      yhat_upper
      yearly
      yearly_lower
      yearly_upper
      seasonal
      yhat
    
    
      ds
      
      
      
      
      
      
      
      
      
      
      
      
      
    
  
  
    
      2015-09-01
      1.000000
      13.010341
      -0.036137
      -0.036137
      13.010341
      13.010341
      12.962424
      12.985457
      -0.036137
      -0.036137
      -0.036137
      -0.036137
      12.974204
    
    
      2015-09-30
      1.013420
      13.011896
      -0.034123
      -0.034123
      13.011743
      13.012067
      12.966155
      12.988926
      -0.034123
      -0.034123
      -0.034123
      -0.034123
      12.977772
    
    
      2015-10-31
      1.027765
      13.013557
      -0.013423
      -0.013423
      13.011848
      13.015386
      12.988682
      13.012529
      -0.013423
      -0.013423
      -0.013423
      -0.013423
      13.000134
    
    
      2015-11-30
      1.041647
      13.015166
      0.156795
      0.156795
      13.010909
      13.019452
      13.158479
      13.184817
      0.156795
      0.156795
      0.156795
      0.156795
      13.171960
    
    
      2015-12-31
      1.055993
      13.016827
      -0.137438
      -0.137438
      13.008524
      13.024350
      12.863865
      12.893395
      -0.137438
      -0.137438
      -0.137438
      -0.137438
      12.879390

Now, let's build a dataframe to use in our new visualization. We'll follow the same steps we did before.



In [27]:

    
viz_df = sales_df.join(predict_df[['yhat', 'yhat_lower','yhat_upper']], how = 'outer')
viz_df['yhat_scaled']=np.exp(viz_df['yhat'])

Now, if we take a look at the head() of viz_df we'll see 'NaN's everywhere except for our original data rows.



In [28]:

    
viz_df.head()









    Out[28]:







  
    
      
      sales
      yhat
      yhat_lower
      yhat_upper
      yhat_scaled
    
  
  
    
      2009-10-01
      338630.0
      NaN
      NaN
      NaN
      NaN
    
    
      2009-11-01
      339386.0
      NaN
      NaN
      NaN
      NaN
    
    
      2009-12-01
      400264.0
      NaN
      NaN
      NaN
      NaN
    
    
      2010-01-01
      314640.0
      NaN
      NaN
      NaN
      NaN
    
    
      2010-02-01
      311022.0
      NaN
      NaN
      NaN
      NaN

If we take a look at the tail() of the viz_df you'll see we have data for the forecasted data and NaN's for the original data series.



In [29]:

    
viz_df.tail()









    Out[29]:







  
    
      
      sales
      yhat
      yhat_lower
      yhat_upper
      yhat_scaled
    
  
  
    
      2017-04-30
      NaN
      13.063588
      12.952123
      13.165723
      471459.411475
    
    
      2017-05-31
      NaN
      13.056472
      12.938400
      13.164907
      468116.264965
    
    
      2017-06-30
      NaN
      13.074975
      12.944827
      13.193656
      476858.563032
    
    
      2017-07-31
      NaN
      13.050825
      12.915154
      13.178260
      465480.406592
    
    
      2017-08-31
      NaN
      13.023797
      12.880747
      13.154949
      453067.919980

time to plot

Now, let's plot everything to get the 'final' visualization of our sales data and forecast with errors.



In [30]:

    
fig, ax1 = plt.subplots()
ax1.plot(viz_df.sales)
ax1.plot(viz_df.yhat_scaled, color='black', linestyle=':')
ax1.fill_between(viz_df.index, np.exp(viz_df['yhat_upper']), np.exp(viz_df['yhat_lower']), alpha=0.5, color='darkgray')
ax1.set_title('Sales (Orange) vs Sales Forecast (Black)')
ax1.set_ylabel('Dollar Sales')
ax1.set_xlabel('Date')

L=ax1.legend() #get the legend
L.get_texts()[0].set_text('Actual Sales') #change the legend text for 1st plot
L.get_texts()[1].set_text('Forecasted Sales') #change the legend text for 2nd plot

This visualization is much better (in my opinion) than the default fbprophet plot. It is much easier to quickly understand and describe what's happening. The orange line is actual sales data and the black dotted line is the forecast. The gray shaded area is the uncertaintity estimation of the forecast.



In [ ]:

	ds	y
67	2015-05-01	13.044650
68	2015-06-01	13.013060
69	2015-07-01	13.033991
70	2015-08-01	13.030993
71	2015-09-01	12.973671

	sales
date
2009-10-01	338630
2009-11-01	339386
2009-12-01	400264
2010-01-01	314640
2010-02-01	311022

	ds
91	2017-04-30
92	2017-05-31
93	2017-06-30
94	2017-07-31
95	2017-08-31

	ds	t	trend	seasonal_lower	seasonal_upper	trend_lower	trend_upper	yhat_lower	yhat_upper	yearly	yearly_lower	yearly_upper	seasonal	yhat
91	2017-04-30	1.280888	13.042878	0.020710	0.020710	12.933548	13.143430	12.952123	13.165723	0.020710	0.020710	0.020710	0.020710	13.063588
92	2017-05-31	1.295234	13.044540	0.011932	0.011932	12.923032	13.152482	12.938400	13.164907	0.011932	0.011932	0.011932	0.011932	13.056472
93	2017-06-30	1.309116	13.046148	0.028827	0.028827	12.915811	13.162192	12.944827	13.193656	0.028827	0.028827	0.028827	0.028827	13.074975
94	2017-07-31	1.323461	13.047810	0.003016	0.003016	12.910398	13.171669	12.915154	13.178260	0.003016	0.003016	0.003016	0.003016	13.050825
95	2017-08-31	1.337807	13.049471	-0.025674	-0.025674	12.901938	13.182222	12.880747	13.154949	-0.025674	-0.025674	-0.025674	-0.025674	13.023797

	sales	yhat	yhat_lower	yhat_upper
2009-10-01	338630.0	12.730339	12.719124	12.741363
2009-11-01	339386.0	12.748734	12.737656	12.760092
2009-12-01	400264.0	12.884803	12.873969	12.896787
2010-01-01	314640.0	12.660496	12.649357	12.671431
2010-02-01	311022.0	12.656080	12.644973	12.667275

	t	trend	seasonal_lower	seasonal_upper	trend_lower	trend_upper	yhat_lower	yhat_upper	yearly	yearly_lower	yearly_upper	seasonal	yhat
ds
2015-09-01	1.000000	13.010341	-0.036137	-0.036137	13.010341	13.010341	12.962424	12.985457	-0.036137	-0.036137	-0.036137	-0.036137	12.974204
2015-09-30	1.013420	13.011896	-0.034123	-0.034123	13.011743	13.012067	12.966155	12.988926	-0.034123	-0.034123	-0.034123	-0.034123	12.977772
2015-10-31	1.027765	13.013557	-0.013423	-0.013423	13.011848	13.015386	12.988682	13.012529	-0.013423	-0.013423	-0.013423	-0.013423	13.000134
2015-11-30	1.041647	13.015166	0.156795	0.156795	13.010909	13.019452	13.158479	13.184817	0.156795	0.156795	0.156795	0.156795	13.171960
2015-12-31	1.055993	13.016827	-0.137438	-0.137438	13.008524	13.024350	12.863865	12.893395	-0.137438	-0.137438	-0.137438	-0.137438	12.879390