Follow up notebook that describes how accurately prophet predicted several weeks of website traffic. Full article is here
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In [2]:
%matplotlib inline
plt.style.use('ggplot')
In [3]:
proj = pd.read_excel('https://github.com/chris1610/pbpython/blob/master/data/March-2017-forecast-article.xlsx?raw=True')
Look at the prediction (aka yhat) values
In [4]:
proj[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].head()
Out[4]:
Convert the values from log format and filter for the right daterange
In [5]:
proj["Projected_Sessions"] = np.exp(proj.yhat).round()
proj["Projected_Sessions_lower"] = np.exp(proj.yhat_lower).round()
proj["Projected_Sessions_upper"] = np.exp(proj.yhat_upper).round()
final_proj = proj[(proj.ds > "3-5-2017") &
(proj.ds < "5-20-2017")][["ds", "Projected_Sessions_lower",
"Projected_Sessions", "Projected_Sessions_upper"]]
Read in the Google Analytics file
In [6]:
actual = pd.read_excel('https://github.com/chris1610/pbpython/blob/master/data/Traffic_20170306-20170519.xlsx?raw=True')
actual.columns = ["ds", "Actual_Sessions"]
In [7]:
actual.head()
Out[7]:
Combine the predictions and the merge
In [8]:
df = pd.merge(actual, final_proj)
df.head()
Out[8]:
See how big the delta is between the prediction and actual
In [9]:
df["Session_Delta"] = df.Actual_Sessions - df.Projected_Sessions
df.Session_Delta.describe()
Out[9]:
In [10]:
# Need to convert to just a date in order to keep plot from throwing errors
df['ds'] = df['ds'].dt.date
Quick Plot of the delta
In [11]:
fig, ax = plt.subplots(figsize=(9, 6))
df.plot("ds", "Session_Delta", ax=ax)
fig.autofmt_xdate(bottom=0.2, rotation=30, ha='right');
More comprehensive plot showing the upper and lower bound
In [12]:
fig, ax = plt.subplots(figsize=(9, 6))
df.plot(kind='line', x='ds', y=['Actual_Sessions', 'Projected_Sessions'], ax=ax, style=['-','--'])
ax.fill_between(df['ds'].values, df['Projected_Sessions_lower'], df['Projected_Sessions_upper'], alpha=0.2)
ax.set(title='Pbpython Traffic Prediction Accuracy', xlabel='', ylabel='Sessions')
fig.autofmt_xdate(bottom=0.2, rotation=30, ha='right')
In [ ]: