Note: The recommended way to install prophet is to use the following command:

conda install -c conda-forge fbprophet


In [1]:
import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt

Use ggplot style for ascentic reasons


In [2]:
%matplotlib inline
plt.style.use('ggplot')

Read in the data file and store as a pandas dataframe


In [3]:
data_file = "https://github.com/chris1610/pbpython/blob/master/data/All-Web-Site-Data-Audience-Overview.xlsx?raw=True"
df = pd.read_excel(data_file)
df.head()


Out[3]:
Day Index Sessions
0 2014-09-25 1
1 2014-09-26 4
2 2014-09-27 8
3 2014-09-28 42
4 2014-09-29 233

Check the data types to make sure the Day Index is a datetime type


In [4]:
df.dtypes


Out[4]:
Day Index    datetime64[ns]
Sessions              int64
dtype: object

Do a simple plot


In [5]:
df.set_index('Day Index').plot();


Filter out the outlier traffic spike


In [6]:
df.loc[(df['Sessions'] > 5000), 'Sessions'] = np.nan
df.set_index('Day Index').plot();


More info on why we use log here - https://people.duke.edu/~rnau/411log.htm


In [7]:
df['Sessions'] = np.log(df['Sessions'])
df.head()


Out[7]:
Day Index Sessions
0 2014-09-25 0.000000
1 2014-09-26 1.386294
2 2014-09-27 2.079442
3 2014-09-28 3.737670
4 2014-09-29 5.451038

In [8]:
df.set_index('Day Index').plot();


Rename the columns to comply with the prophet API


In [9]:
df.columns = ["ds", "y"]
df.head()


Out[9]:
ds y
0 2014-09-25 0.000000
1 2014-09-26 1.386294
2 2014-09-27 2.079442
3 2014-09-28 3.737670
4 2014-09-29 5.451038

Create a prophet object and fit it to our data


In [10]:
m1 = Prophet()
m1.fit(df)


Out[10]:
<fbprophet.forecaster.Prophet at 0x7f8590e28a20>

Create the future days we want to predict


In [11]:
future1 = m1.make_future_dataframe(periods=365)
future1.tail()


Out[11]:
ds
1250 2018-02-27
1251 2018-02-28
1252 2018-03-01
1253 2018-03-02
1254 2018-03-03

Predict the future!


In [12]:
forecast1 = m1.predict(future1)

Look at the values contained in the forecast dataframe


In [13]:
forecast1.head()


Out[13]:
ds t trend seasonal_lower seasonal_upper trend_lower trend_upper yhat_lower yhat_upper weekly weekly_lower weekly_upper yearly yearly_lower yearly_upper seasonal yhat
0 2014-09-25 0.000000 3.664253 -0.012358 -0.012358 3.664253 3.664253 3.062057 4.292373 0.184423 0.184423 0.184423 -0.196780 -0.196780 -0.196780 -0.012358 3.651895
1 2014-09-26 0.001124 3.688755 -0.228551 -0.228551 3.688755 3.688755 2.841947 4.023345 0.000932 0.000932 0.000932 -0.229483 -0.229483 -0.229483 -0.228551 3.460204
2 2014-09-27 0.002247 3.713258 -0.780730 -0.780730 3.713258 3.713258 2.319354 3.564319 -0.520686 -0.520686 -0.520686 -0.260044 -0.260044 -0.260044 -0.780730 2.932528
3 2014-09-28 0.003371 3.737760 -0.673173 -0.673173 3.737760 3.737760 2.442374 3.660751 -0.385316 -0.385316 -0.385316 -0.287857 -0.287857 -0.287857 -0.673173 3.064587
4 2014-09-29 0.004494 3.762263 -0.028859 -0.028859 3.762263 3.762263 3.158738 4.331406 0.283490 0.283490 0.283490 -0.312349 -0.312349 -0.312349 -0.028859 3.733404

In [14]:
forecast1[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()


Out[14]:
ds yhat yhat_lower yhat_upper
1250 2018-02-27 7.848040 6.569701 9.183382
1251 2018-02-28 7.787314 6.532167 9.077855
1252 2018-03-01 7.755146 6.538959 9.107978
1253 2018-03-02 7.552382 6.357826 8.845762
1254 2018-03-03 7.011651 5.742724 8.390236

In [15]:
np.exp(forecast1[['yhat', 'yhat_lower', 'yhat_upper']].tail())


Out[15]:
yhat yhat_lower yhat_upper
1250 2560.709477 713.156765 9734.015115
1251 2409.836175 686.885011 8759.156056
1252 2333.549138 691.566553 9027.025287
1253 1905.275686 576.990613 6944.894977
1254 1109.484324 311.912946 4403.857201

Plot the forecast for the next year


In [16]:
m1.plot(forecast1);



In [17]:
m1.plot_components(forecast1);


Create a dataframe with all the blog posts as well as the new one happening on March 6th, 2017.


In [18]:
articles = pd.DataFrame({
  'holiday': 'publish',
  'ds': pd.to_datetime(['2014-09-27', '2014-10-05', '2014-10-14', '2014-10-26', '2014-11-9', 
                        '2014-11-18', '2014-11-30', '2014-12-17', '2014-12-29', '2015-01-06',
                        '2015-01-20', '2015-02-02', '2015-02-16', '2015-03-23', '2015-04-08',
                        '2015-05-04', '2015-05-17', '2015-06-09', '2015-07-02', '2015-07-13',
                        '2015-08-17', '2015-09-14', '2015-10-26', '2015-12-07', '2015-12-30',
                        '2016-01-26', '2016-04-06', '2016-05-16', '2016-06-15', '2016-08-23',
                        '2016-08-29', '2016-09-06', '2016-11-21', '2016-12-19', '2017-01-17',
                        '2017-02-06', '2017-02-21', '2017-03-06']),
  'lower_window': 0,
  'upper_window': 5,
})

In [19]:
articles.head()


Out[19]:
ds holiday lower_window upper_window
0 2014-09-27 publish 0 5
1 2014-10-05 publish 0 5
2 2014-10-14 publish 0 5
3 2014-10-26 publish 0 5
4 2014-11-09 publish 0 5

Create a new model with the holidays defined


In [20]:
m2 = Prophet(holidays=articles).fit(df)

In [21]:
future2 = m2.make_future_dataframe(periods=90)

In [22]:
forecast2 = m2.predict(future2)

In [23]:
m2.plot(forecast2);



In [24]:
m2.plot_components(forecast2);


Create the final model using holidays and applying some bayesian sampling to handle the errors with the holidays. This will take a couple of minutes to run.


In [25]:
m3 = Prophet(holidays=articles, mcmc_samples=500).fit(df)
future3 = m3.make_future_dataframe(periods=90)
forecast3 = m3.predict(future3)

Convert the log values back to the "real world" values.


In [26]:
forecast3["Sessions"] = np.exp(forecast3.yhat).round()
forecast3["Sessions_lower"] = np.exp(forecast3.yhat_lower).round()
forecast3["Sessions_upper"] = np.exp(forecast3.yhat_upper).round()
forecast3[(forecast3.ds > "3-5-2017") & 
          (forecast3.ds < "4-1-2017")][["ds", "yhat", "Sessions_lower", "Sessions", "Sessions_upper"]]


Out[26]:
ds yhat Sessions_lower Sessions Sessions_upper
892 2017-03-06 7.846788 1387.0 2558.0 4578.0
893 2017-03-07 8.087021 1822.0 3252.0 5831.0
894 2017-03-08 7.584845 1138.0 1968.0 3403.0
895 2017-03-09 7.560417 1079.0 1921.0 3497.0
896 2017-03-10 7.422831 943.0 1674.0 2808.0
897 2017-03-11 6.795996 538.0 894.0 1566.0
898 2017-03-12 6.635982 435.0 762.0 1320.0
899 2017-03-13 7.246892 811.0 1404.0 2303.0
900 2017-03-14 7.146214 745.0 1269.0 2176.0
901 2017-03-15 7.107122 726.0 1221.0 2237.0
902 2017-03-16 7.109091 673.0 1223.0 2082.0
903 2017-03-17 6.937251 569.0 1030.0 1839.0
904 2017-03-18 6.431930 356.0 621.0 1054.0
905 2017-03-19 6.582660 400.0 722.0 1310.0
906 2017-03-20 7.193989 770.0 1331.0 2338.0
907 2017-03-21 7.093497 682.0 1204.0 2158.0
908 2017-03-22 7.054459 629.0 1158.0 1979.0
909 2017-03-23 7.056471 639.0 1160.0 2001.0
910 2017-03-24 6.884807 555.0 977.0 1661.0
911 2017-03-25 6.379954 334.0 590.0 1031.0
912 2017-03-26 6.531608 394.0 687.0 1226.0
913 2017-03-27 7.144480 704.0 1267.0 2220.0
914 2017-03-28 7.046306 640.0 1149.0 2077.0
915 2017-03-29 7.010502 625.0 1108.0 1984.0
916 2017-03-30 7.016783 637.0 1115.0 1967.0
917 2017-03-31 6.850514 509.0 944.0 1653.0

In [27]:
forecast3.to_excel("March-2017-forecast.xlsx")

In [ ]: