This tutorial is based on Time Series Forecasting with the Long Short-Term Memory Network in Python by Jason Brownlee.
Before we get into the example, lets look at some visitor data from Yellowstone National park.
In [1]:
# load and plot dataset
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
# load dataset
def parser(x):
return datetime.strptime(x, '%Y-%m-%d')
series = read_csv('../data/yellowstone-visitors.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# summarize first few rows
print(series.head())
# line plot
series.plot()
pyplot.show()
The park's recreational visits are highly seasonable with the peak season in July. The park tracks monthly averages from the last four years on it's web site. A simple approach to predict the next years visitors, is to use these averages.
In [2]:
prev_4_years = series[-60:-12]
last_year = series[12:]
pred = prev_4_years.groupby(by=prev_4_years.index.month).mean()
pred.plot()
act = last_year.groupby(by=last_year.index.month).mean()
act.plot()
pyplot.show()
In [3]:
from math import sqrt
from sklearn.metrics import mean_squared_error
rmse = sqrt(mean_squared_error(act, pred))
print('Test RMSE: %.3f' % rmse)
In [ ]: