Using Prophet to Forecast the Market

In a previous post, I used stock market data to show how prophet detects changepoints in a signal (http://pythondata.com/forecasting-time-series-data-prophet-trend-changepoints/). After publishing that article, I've received a few questions asking how well (or poorly) prophet can forecast the stock market.

This article highlights using prophet for forecasting the markets.


In [24]:
import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt

from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

%matplotlib inline
 
plt.rcParams['figure.figsize']=(20,10)
plt.style.use('ggplot')

Load data

Let's load our data to analyze. For this example, I'm going to use some stock market data to be able to show some clear trend changes. This data can be downloaded from FRED (https://fred.stlouisfed.org/series/SP500) or just grab it from the examples directory. Note: The data used in this example was not cherry picked...I just grabbed the available data on S&P 500 from FRED.


In [2]:
market_df = pd.read_csv('../examples/SP500.csv', index_col='DATE', parse_dates=True)

In [3]:
market_df.tail()


Out[3]:
SP500
DATE
2008-12-12 879.73
2008-12-11 873.59
2008-12-10 899.24
2008-12-09 888.67
2008-12-08 909.70

In [4]:
market_df.plot()


Out[4]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f2075abe310>

Prepare for Prophet


In [5]:
df = market_df.reset_index().rename(columns={'DATE':'ds', 'SP500':'y'})
df['y'] = np.log(df['y'])

Running Prophet

As before, let's instantiate prophet and fit our data (including our future dataframe). Take a look at http://pythondata.com/forecasting-time-series-data-prophet-jupyter-notebook/ for more information on the basics of Prophet.


In [6]:
model = Prophet()
model.fit(df);
future = model.make_future_dataframe(periods=365) #forecasting for 1 year from now.
forecast = model.predict(future)

Plotting the forecast


In [7]:
figure=model.plot(forecast)


With the data that we have, it is hard to see how good/bad the forecast (blue line) is compared to the actual data (black dots). Let's take a look at the last 800 data points (~2 years) of forecast vs actual without looking at the future forecast (because we are just interested in getting a visual of the error between actual vs forecast).


In [32]:
two_years = forecast.set_index('ds').join(market_df)
two_years = two_years[['SP500', 'yhat', 'yhat_upper', 'yhat_lower' ]].dropna().tail(800)
two_years['yhat']=np.exp(two_years.yhat)
two_years['yhat_upper']=np.exp(two_years.yhat_upper)
two_years['yhat_lower']=np.exp(two_years.yhat_lower)

In [33]:
two_years.tail()


Out[33]:
SP500 yhat yhat_upper yhat_lower
ds
2017-08-24 2438.97 2446.821521 2518.482362 2367.538010
2017-08-25 2443.05 2446.424253 2525.895457 2372.269734
2017-08-28 2444.24 2448.639709 2528.047411 2371.787731
2017-08-29 2446.30 2451.481344 2531.331576 2375.011177
2017-08-30 2457.59 2454.181239 2534.814490 2372.473553

In [34]:
two_years[['SP500', 'yhat']].plot()


Out[34]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f20719e8dd0>

You can see from the above chart, our forecast follows the trend quite well but doesn't seem to that great at catching the 'volatility' of the market. Don't fret though...this may be a very good thing though for us if we are interested in 'riding the trend' rather than trying to catch peaks and dips perfectly.

Let's take a look at a few measures of accuracy. First, the old fashioned 'average error'.


In [11]:
two_years_AE = (two_years.yhat - two_years.SP500)
print two_years_AE.describe()


count    800.000000
mean      -0.540173
std       47.568987
min     -141.265774
25%      -29.383549
50%       -1.548716
75%       25.878416
max      168.898459
dtype: float64

Those really aren't bad numbers but they don't really tell all of the story. Let's take a look at a few more measures of accuracy.

First, let's look at R-squared:


In [15]:
r2_score(two_years.SP500, two_years.yhat)


Out[15]:
0.90563333683064451

R-squared looks really good...I'll take a 0.9 value in any first-go-round modeling approach.

Now, let's look at mean squared error:


In [16]:
mean_squared_error(two_years.SP500, two_years.yhat)


Out[16]:
2260.2718233576029

And there we have it...the real pointer to this modeling technique being a bit wonky.

An MSE of 2260.28 for a model that is trying to predict the S&P500 with values between 1900 and 2500 isn't that good (remember...for MSE, closer to zero is better) if you are trying to predict exact changes and movements up/down.

Now, let's look at the mean absolute error (MAE). The MAE is the measurement of absolute error between two continuous variables and can give us a much better look at error rates than the standard mean.


In [26]:
mean_absolute_error(two_years.SP500, two_years.yhat)


Out[26]:
36.179476001483771

The MAE is continuing to tell us that the forecast by prophet isn't ideal to use this forecast in trading.

Another way to look at the usefulness of this forecast is to plot the upper and lower confidence bands of the forecast against the actuals. You can do that by plotting yhat_upper and yhat_lower.


In [59]:
fig, ax1 = plt.subplots()
ax1.plot(two_years.SP500)
ax1.plot(two_years.yhat)
ax1.plot(two_years.yhat_upper, color='black',  linestyle=':', alpha=0.5)
ax1.plot(two_years.yhat_lower, color='black',  linestyle=':', alpha=0.5)

ax1.set_title('Actual S&P 500 (Orange) vs S&P 500 Forecasted Upper & Lower Confidence (Black)')
ax1.set_ylabel('Price')
ax1.set_xlabel('Date')


Out[59]:
<matplotlib.text.Text at 0x7f20711fd550>

In the above chart, we can see the forecast (in blue) vs the actuals (in orange) with the upper and lower confidence bands in gray.

You can't really tell anything quantifiable from this chart, but you can make a judgement on the value of the forecast. If you are trying to trade short-term (1 day to a few weeks) this forecast is almost useless but if you are investing with a timeframe of months to years, this forecast might provide some value to better understand the trend of the market and the forecasted trend.

Let's go back and look at the actual forecast to see if it might tell us anything different than the forecast vs the actual data.


In [63]:
full_df = forecast.set_index('ds').join(market_df)
full_df['yhat']=np.exp(full_df['yhat'])

In [69]:
fig, ax1 = plt.subplots()
ax1.plot(full_df.SP500)
ax1.plot(full_df.yhat, color='black', linestyle=':')
ax1.fill_between(full_df.index, np.exp(full_df['yhat_upper']), np.exp(full_df['yhat_lower']), alpha=0.5, color='darkgray')
ax1.set_title('Actual S&P 500 (Orange) vs S&P 500 Forecasted (Black) with Confidence Bands')
ax1.set_ylabel('Price')
ax1.set_xlabel('Date')

L=ax1.legend() #get the legend
L.get_texts()[0].set_text('S&P 500 Actual') #change the legend text for 1st plot
L.get_texts()[1].set_text('S&P 5600 Forecasted') #change the legend text for 2nd plot


This chart is a bit easier to understand vs the default prophet chart (in my opinion at least). We can see throughout the history of the actuals vs forecast, that prophet does an OK job forecasting but has trouble with the areas when the market become very volatile.

Looking specifically at the future forecast, prophet is telling us that the market is going to continue rising and should be around 2750 at the end of the forecast period, with confidence bands stretching from 2000-ish to 4000-ish.

If you show this forecast to any serious trader / investor, they'd quickly shrug it off as a terrible forecast. Anything that has a 2000 point confidence interval is worthless in the short- and long-term investing world.

That said, is there some value in prophet's forecasting for the markets? Maybe.

Maybe we can use the forecast on weekly or monthly data with better accuracy. Or...maybe we can use the forecast combined with other forecasts to make a better forecast. I may dig into that a bit more at some point in the future. Stay tuned.


In [ ]: