Notebook from Forecast Website Traffic Using Facebook's Prophet Library

Note: The recommended way to install prophet is to use the following command:

conda install -c conda-forge fbprophet



In [1]:

    
import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt

Use ggplot style for ascentic reasons



In [2]:

    
%matplotlib inline
plt.style.use('ggplot')

Read in the data file and store as a pandas dataframe



In [3]:

    
data_file = "https://github.com/chris1610/pbpython/blob/master/data/All-Web-Site-Data-Audience-Overview.xlsx?raw=True"
df = pd.read_excel(data_file)
df.head()

Check the data types to make sure the Day Index is a datetime type



In [4]:

    
df.dtypes









    Out[4]:





Day Index    datetime64[ns]
Sessions              int64
dtype: object

Do a simple plot



In [5]:

    
df.set_index('Day Index').plot();

Filter out the outlier traffic spike



In [6]:

    
df.loc[(df['Sessions'] > 5000), 'Sessions'] = np.nan
df.set_index('Day Index').plot();

More info on why we use log here - https://people.duke.edu/~rnau/411log.htm



In [7]:

    
df['Sessions'] = np.log(df['Sessions'])
df.head()



In [8]:

    
df.set_index('Day Index').plot();

Rename the columns to comply with the prophet API



In [9]:

    
df.columns = ["ds", "y"]
df.head()

Create a prophet object and fit it to our data



In [10]:

    
m1 = Prophet()
m1.fit(df)









    Out[10]:





<fbprophet.forecaster.Prophet at 0x7f8590e28a20>

Create the future days we want to predict



In [11]:

    
future1 = m1.make_future_dataframe(periods=365)
future1.tail()

Predict the future!



In [12]:

    
forecast1 = m1.predict(future1)

Look at the values contained in the forecast dataframe



In [13]:

    
forecast1.head()









    Out[13]:







  
    
      
      ds
      t
      trend
      seasonal_lower
      seasonal_upper
      trend_lower
      trend_upper
      yhat_lower
      yhat_upper
      weekly
      weekly_lower
      weekly_upper
      yearly
      yearly_lower
      yearly_upper
      seasonal
      yhat
    
  
  
    
      0
      2014-09-25
      0.000000
      3.664253
      -0.012358
      -0.012358
      3.664253
      3.664253
      3.062057
      4.292373
      0.184423
      0.184423
      0.184423
      -0.196780
      -0.196780
      -0.196780
      -0.012358
      3.651895
    
    
      1
      2014-09-26
      0.001124
      3.688755
      -0.228551
      -0.228551
      3.688755
      3.688755
      2.841947
      4.023345
      0.000932
      0.000932
      0.000932
      -0.229483
      -0.229483
      -0.229483
      -0.228551
      3.460204
    
    
      2
      2014-09-27
      0.002247
      3.713258
      -0.780730
      -0.780730
      3.713258
      3.713258
      2.319354
      3.564319
      -0.520686
      -0.520686
      -0.520686
      -0.260044
      -0.260044
      -0.260044
      -0.780730
      2.932528
    
    
      3
      2014-09-28
      0.003371
      3.737760
      -0.673173
      -0.673173
      3.737760
      3.737760
      2.442374
      3.660751
      -0.385316
      -0.385316
      -0.385316
      -0.287857
      -0.287857
      -0.287857
      -0.673173
      3.064587
    
    
      4
      2014-09-29
      0.004494
      3.762263
      -0.028859
      -0.028859
      3.762263
      3.762263
      3.158738
      4.331406
      0.283490
      0.283490
      0.283490
      -0.312349
      -0.312349
      -0.312349
      -0.028859
      3.733404



In [14]:

    
forecast1[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()



In [15]:

    
np.exp(forecast1[['yhat', 'yhat_lower', 'yhat_upper']].tail())









    Out[15]:







  
    
      
      yhat
      yhat_lower
      yhat_upper
    
  
  
    
      1250
      2560.709477
      713.156765
      9734.015115
    
    
      1251
      2409.836175
      686.885011
      8759.156056
    
    
      1252
      2333.549138
      691.566553
      9027.025287
    
    
      1253
      1905.275686
      576.990613
      6944.894977
    
    
      1254
      1109.484324
      311.912946
      4403.857201

Plot the forecast for the next year



In [16]:

    
m1.plot(forecast1);



In [17]:

    
m1.plot_components(forecast1);

Create a dataframe with all the blog posts as well as the new one happening on March 6th, 2017.



In [18]:

    
articles = pd.DataFrame({
  'holiday': 'publish',
  'ds': pd.to_datetime(['2014-09-27', '2014-10-05', '2014-10-14', '2014-10-26', '2014-11-9', 
                        '2014-11-18', '2014-11-30', '2014-12-17', '2014-12-29', '2015-01-06',
                        '2015-01-20', '2015-02-02', '2015-02-16', '2015-03-23', '2015-04-08',
                        '2015-05-04', '2015-05-17', '2015-06-09', '2015-07-02', '2015-07-13',
                        '2015-08-17', '2015-09-14', '2015-10-26', '2015-12-07', '2015-12-30',
                        '2016-01-26', '2016-04-06', '2016-05-16', '2016-06-15', '2016-08-23',
                        '2016-08-29', '2016-09-06', '2016-11-21', '2016-12-19', '2017-01-17',
                        '2017-02-06', '2017-02-21', '2017-03-06']),
  'lower_window': 0,
  'upper_window': 5,
})



In [19]:

    
articles.head()









    Out[19]:







  
    
      
      ds
      holiday
      lower_window
      upper_window
    
  
  
    
      0
      2014-09-27
      publish
      0
      5
    
    
      1
      2014-10-05
      publish
      0
      5
    
    
      2
      2014-10-14
      publish
      0
      5
    
    
      3
      2014-10-26
      publish
      0
      5
    
    
      4
      2014-11-09
      publish
      0
      5

Create a new model with the holidays defined



In [20]:

    
m2 = Prophet(holidays=articles).fit(df)



In [21]:

    
future2 = m2.make_future_dataframe(periods=90)



In [22]:

    
forecast2 = m2.predict(future2)



In [23]:

    
m2.plot(forecast2);



In [24]:

    
m2.plot_components(forecast2);

Create the final model using holidays and applying some bayesian sampling to handle the errors with the holidays. This will take a couple of minutes to run.



In [25]:

    
m3 = Prophet(holidays=articles, mcmc_samples=500).fit(df)
future3 = m3.make_future_dataframe(periods=90)
forecast3 = m3.predict(future3)

Convert the log values back to the "real world" values.



In [26]:

    
forecast3["Sessions"] = np.exp(forecast3.yhat).round()
forecast3["Sessions_lower"] = np.exp(forecast3.yhat_lower).round()
forecast3["Sessions_upper"] = np.exp(forecast3.yhat_upper).round()
forecast3[(forecast3.ds > "3-5-2017") & 
          (forecast3.ds < "4-1-2017")][["ds", "yhat", "Sessions_lower", "Sessions", "Sessions_upper"]]









    Out[26]:







  
    
      
      ds
      yhat
      Sessions_lower
      Sessions
      Sessions_upper
    
  
  
    
      892
      2017-03-06
      7.846788
      1387.0
      2558.0
      4578.0
    
    
      893
      2017-03-07
      8.087021
      1822.0
      3252.0
      5831.0
    
    
      894
      2017-03-08
      7.584845
      1138.0
      1968.0
      3403.0
    
    
      895
      2017-03-09
      7.560417
      1079.0
      1921.0
      3497.0
    
    
      896
      2017-03-10
      7.422831
      943.0
      1674.0
      2808.0
    
    
      897
      2017-03-11
      6.795996
      538.0
      894.0
      1566.0
    
    
      898
      2017-03-12
      6.635982
      435.0
      762.0
      1320.0
    
    
      899
      2017-03-13
      7.246892
      811.0
      1404.0
      2303.0
    
    
      900
      2017-03-14
      7.146214
      745.0
      1269.0
      2176.0
    
    
      901
      2017-03-15
      7.107122
      726.0
      1221.0
      2237.0
    
    
      902
      2017-03-16
      7.109091
      673.0
      1223.0
      2082.0
    
    
      903
      2017-03-17
      6.937251
      569.0
      1030.0
      1839.0
    
    
      904
      2017-03-18
      6.431930
      356.0
      621.0
      1054.0
    
    
      905
      2017-03-19
      6.582660
      400.0
      722.0
      1310.0
    
    
      906
      2017-03-20
      7.193989
      770.0
      1331.0
      2338.0
    
    
      907
      2017-03-21
      7.093497
      682.0
      1204.0
      2158.0
    
    
      908
      2017-03-22
      7.054459
      629.0
      1158.0
      1979.0
    
    
      909
      2017-03-23
      7.056471
      639.0
      1160.0
      2001.0
    
    
      910
      2017-03-24
      6.884807
      555.0
      977.0
      1661.0
    
    
      911
      2017-03-25
      6.379954
      334.0
      590.0
      1031.0
    
    
      912
      2017-03-26
      6.531608
      394.0
      687.0
      1226.0
    
    
      913
      2017-03-27
      7.144480
      704.0
      1267.0
      2220.0
    
    
      914
      2017-03-28
      7.046306
      640.0
      1149.0
      2077.0
    
    
      915
      2017-03-29
      7.010502
      625.0
      1108.0
      1984.0
    
    
      916
      2017-03-30
      7.016783
      637.0
      1115.0
      1967.0
    
    
      917
      2017-03-31
      6.850514
      509.0
      944.0
      1653.0



In [27]:

    
forecast3.to_excel("March-2017-forecast.xlsx")



In [ ]:

	ds	yhat	yhat_lower	yhat_upper
1250	2018-02-27	7.848040	6.569701	9.183382
1251	2018-02-28	7.787314	6.532167	9.077855
1252	2018-03-01	7.755146	6.538959	9.107978
1253	2018-03-02	7.552382	6.357826	8.845762
1254	2018-03-03	7.011651	5.742724	8.390236

	Day Index	Sessions
0	2014-09-25	1
1	2014-09-26	4
2	2014-09-27	8
3	2014-09-28	42
4	2014-09-29	233

	Day Index	Sessions
0	2014-09-25	0.000000
1	2014-09-26	1.386294
2	2014-09-27	2.079442
3	2014-09-28	3.737670
4	2014-09-29	5.451038

	ds	t	trend	seasonal_lower	seasonal_upper	trend_lower	trend_upper	yhat_lower	yhat_upper	weekly	weekly_lower	weekly_upper	yearly	yearly_lower	yearly_upper	seasonal	yhat
0	2014-09-25	0.000000	3.664253	-0.012358	-0.012358	3.664253	3.664253	3.062057	4.292373	0.184423	0.184423	0.184423	-0.196780	-0.196780	-0.196780	-0.012358	3.651895
1	2014-09-26	0.001124	3.688755	-0.228551	-0.228551	3.688755	3.688755	2.841947	4.023345	0.000932	0.000932	0.000932	-0.229483	-0.229483	-0.229483	-0.228551	3.460204
2	2014-09-27	0.002247	3.713258	-0.780730	-0.780730	3.713258	3.713258	2.319354	3.564319	-0.520686	-0.520686	-0.520686	-0.260044	-0.260044	-0.260044	-0.780730	2.932528
3	2014-09-28	0.003371	3.737760	-0.673173	-0.673173	3.737760	3.737760	2.442374	3.660751	-0.385316	-0.385316	-0.385316	-0.287857	-0.287857	-0.287857	-0.673173	3.064587
4	2014-09-29	0.004494	3.762263	-0.028859	-0.028859	3.762263	3.762263	3.158738	4.331406	0.283490	0.283490	0.283490	-0.312349	-0.312349	-0.312349	-0.028859	3.733404

	yhat	yhat_lower	yhat_upper
1250	2560.709477	713.156765	9734.015115
1251	2409.836175	686.885011	8759.156056
1252	2333.549138	691.566553	9027.025287
1253	1905.275686	576.990613	6944.894977
1254	1109.484324	311.912946	4403.857201

	ds	holiday	upper_window
0	2014-09-27	publish	5
1	2014-10-05	publish	5
2	2014-10-14	publish	5
3	2014-10-26	publish	5
4	2014-11-09	publish	5

	ds	yhat	Sessions_lower	Sessions	Sessions_upper
892	2017-03-06	7.846788	1387.0	2558.0	4578.0
893	2017-03-07	8.087021	1822.0	3252.0	5831.0
894	2017-03-08	7.584845	1138.0	1968.0	3403.0
895	2017-03-09	7.560417	1079.0	1921.0	3497.0
896	2017-03-10	7.422831	943.0	1674.0	2808.0
897	2017-03-11	6.795996	538.0	894.0	1566.0
898	2017-03-12	6.635982	435.0	762.0	1320.0
899	2017-03-13	7.246892	811.0	1404.0	2303.0
900	2017-03-14	7.146214	745.0	1269.0	2176.0
901	2017-03-15	7.107122	726.0	1221.0	2237.0
902	2017-03-16	7.109091	673.0	1223.0	2082.0
903	2017-03-17	6.937251	569.0	1030.0	1839.0
904	2017-03-18	6.431930	356.0	621.0	1054.0
905	2017-03-19	6.582660	400.0	722.0	1310.0
906	2017-03-20	7.193989	770.0	1331.0	2338.0
907	2017-03-21	7.093497	682.0	1204.0	2158.0
908	2017-03-22	7.054459	629.0	1158.0	1979.0
909	2017-03-23	7.056471	639.0	1160.0	2001.0
910	2017-03-24	6.884807	555.0	977.0	1661.0
911	2017-03-25	6.379954	334.0	590.0	1031.0
912	2017-03-26	6.531608	394.0	687.0	1226.0
913	2017-03-27	7.144480	704.0	1267.0	2220.0
914	2017-03-28	7.046306	640.0	1149.0	2077.0
915	2017-03-29	7.010502	625.0	1108.0	1984.0
916	2017-03-30	7.016783	637.0	1115.0	1967.0
917	2017-03-31	6.850514	509.0	944.0	1653.0