Lok at time series of daily page views for the Wikipedia page for Peyton Manning. The csv is available here
In [1]:
#wp_R_dataset_url = 'https://github.com/facebookincubator/prophet/blob/master/examples/example_wp_R.csv'
wp_peyton_manning_filename = '../datasets/example_wp_peyton_manning.csv'
In [2]:
import pandas as pd
import numpy as np
from fbprophet import Prophet
In [3]:
df = pd.read_csv(wp_peyton_manning_filename)
# transform to log scale
df['y']=np.log(df['y'])
df.head()
Out[3]:
Looking at this dataset, it is clear that real time series frequently have abrupt changes in their trajectories. By default, Prophet will automatically detect these changepoints and will allow the trend to adapt appropriately. However, if you wish to have finer control over this process (e.g., Prophet missed a rate change, or is overfitting rate changes in the history), then there are several input arguments you can use.
Prophet detects changepoints by first specifying a large number of potential changepoints at which the rate is allowed to change. It then puts a sparse prior on the magnitudes of the rate of changes. This is equivalent to L1 regularization, meaning that Prophet has a large number of possible places where the rate can change, but will use as few of them as possible. Consider the Peyton Manning forecast in the Quickstart example. By default, Prophet specifies 25 potential changepoints which are uniformly placed in the first 80% of the time series.
In [4]:
m = Prophet()
m.fit(df);
In [5]:
future = m.make_future_dataframe(periods=365)
forecast = m.predict(future)
forecast.tail()
Out[5]:
In [10]:
%matplotlib inline
import matplotlib.pyplot as plt
Below the dashed lines show where the potential changepoints wer placed.
In [21]:
ax = m.plot(forecast)
for ts in m.changepoints.values:
plt.vlines(x=ts, ymin=5, ymax=13, linestyles='--')
Even though we have a lot of places where the rate can possibly change because of the sparse prior, most of these changepoints are unused (i.e. 0). We can determine this by plotting the magnitude of the rate change at each changepoint:
In [31]:
deltas = m.params['delta'].mean(0)
fig = plt.figure(facecolor='w', figsize=(10, 6))
ax = fig.add_subplot(111)
ax.bar(range(len(deltas)), deltas, facecolor='#0072B2', edgecolor='#2072B2')
ax.grid(True, which='major', c='gray', ls='-', lw=1, alpha=0.2)
ax.set_ylabel('Rate change')
ax.set_xlabel('Potential changepoint')
fig.tight_layout()
The number of potential changepoints can be set using the argument n_changepoints
, but this is better tuned by adjusting the regularization.
If the trend changes are being overfit (too much flexibility) or underfit (not enough flexibility), you can adjust the strength of the sparse prior using the input argument changepoint_prior_scale
. By default, this parameter is set to 0.05. Increasing it will make the trend more flexible:
In [32]:
m = Prophet(changepoint_prior_scale=0.5)
forecast = m.fit(df).predict(future)
m.plot(forecast);
In [33]:
deltas = m.params['delta'].mean(0)
fig = plt.figure(facecolor='w', figsize=(10, 6))
ax = fig.add_subplot(111)
ax.bar(range(len(deltas)), deltas, facecolor='#0092B2', edgecolor='#2072B2')
ax.grid(True, which='major', c='gray', ls='-', lw=1, alpha=0.2)
ax.set_ylabel('Rate change')
ax.set_xlabel('Potential changepoint')
fig.tight_layout()
Whereas decreasing it will make the trend less flexible
In [34]:
m = Prophet(changepoint_prior_scale=0.001)
forecast = m.fit(df).predict(future)
m.plot(forecast);
In [35]:
deltas = m.params['delta'].mean(0)
fig = plt.figure(facecolor='w', figsize=(10, 6))
ax = fig.add_subplot(111)
ax.bar(range(len(deltas)), deltas, facecolor='#0032B2', edgecolor='#2072B2')
ax.grid(True, which='major', c='gray', ls='-', lw=1, alpha=0.2)
ax.set_ylabel('Rate change')
ax.set_xlabel('Potential changepoint')
fig.tight_layout()
In [36]:
m = Prophet(changepoints=['2014-01-01'])
forecast = m.fit(df).predict(future)
m.plot(forecast);
In [ ]: