Before we get into the example, let's talk about old fashioned computer memory. Mercury delay lines are an early form of computer memory. They basically recycled electrical signals until they where needed. They also could replace or reshape the signal with new information (i.e. forgeting the old information).
Image Source: Delay Line Memory
Note: This tutorial is based on Time Series Forecasting with the Long Short-Term Memory Network in Python by Jason Brownlee.
In [1]:
from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.layers import GRU
from math import sqrt
from matplotlib import pyplot
import numpy
In [2]:
def parser(x):
return datetime.strptime(x, '%Y-%m-%d')
dataset = read_csv('../data/yellowstone-visitors-ur-weather.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
dataset.head()
Out[2]:
In [3]:
def difference(dataset, interval=1):
diff = list()
for i in range(interval, len(dataset)):
value = dataset[i] - dataset[i - interval]
diff.append(value)
return diff
diff_values = difference(dataset.values)
diff_df = DataFrame(diff_values, columns=dataset.columns.values)
diff_df.head()
Out[3]:
In [4]:
labels = diff_df['visitors'].rename(index='label')
shifted = diff_df.shift(1)
supervised = concat([shifted, labels], axis=1)
supervised.fillna(0, inplace=True)
supervised.head()
Out[4]:
In [5]:
supervised_values = supervised.values
train, test = supervised_values[0:-12], supervised_values[-12:]
scaler = MinMaxScaler(feature_range=(-1, 1))
scaler = scaler.fit(train)
train = train.reshape(train.shape[0], train.shape[1])
train_scaled = scaler.transform(train)
test = test.reshape(test.shape[0], test.shape[1])
test_scaled = scaler.transform(test)
print('training set shape: {}'.format(train_scaled.shape))
print(train_scaled[0])
This network defines the following layers (excluding dropouts):
The north remebers and so do LSTMs... Until they forget on purpose.
LSTM maintain state over sequences. Unlike simple RNNs, LSTM can forget...
Image Source: Christopher Olah, Understanding LSTMs
If the model uses a larger batch than this example, the Stateless LSTM could update the gradients more efficiently because the state is only maintained over one batch.
This example only uses a batch size of one, so a stateless LSTM will only remember one month. Not very helpful.
The model adds a hidden layer because why not? The model also uses a single neuron for the layer output to predict next months visitors.
In [6]:
batch_size = 1 # required for stateful LSTM
neurons = 20
features = 6
labels = 1
model = Sequential()
model.add(LSTM(neurons, batch_input_shape=(batch_size, 1, features), stateful=True))
model.add(Dense(features))
model.add(Dropout(0.5))
model.add(Dense(1))
In [7]:
model.compile(loss='mean_squared_error', optimizer='adam')
In [8]:
nb_epoch = 300
X, y = train_scaled[:, 0:-1], train_scaled[:, -1]
X = X.reshape(X.shape[0], 1, X.shape[1])
for i in range(nb_epoch):
if(i % 50 == 0):
print(i)
model.fit(X, y, epochs=1, batch_size=batch_size, verbose=0, shuffle=False)
model.reset_states()
In [9]:
# inverse scaling for a forecasted value
def invert_scale(scaler, X, value):
new_row = [x for x in X] + [value]
array = numpy.array(new_row)
array = array.reshape(1, len(array))
inverted = scaler.inverse_transform(array)
return inverted[0, -1]
In [10]:
visitor_history = dataset['visitors'].values
predictions = list()
for i in range(len(test_scaled)):
X, y = test_scaled[i, 0:-1], test_scaled[i, -1]
X = X.reshape(1, 1, len(X))
scaled_pred = model.predict(X, batch_size=batch_size)
visitor_delta = invert_scale(scaler, X[0,0], scaled_pred[0,0])
prev_mon_vistitor = visitor_history[-len(test_scaled)+1-i]
pred = prev_mon_vistitor + visitor_delta
expected = visitor_history[len(test_scaled) + i + 1]
print('Month=%d, Predicted=%f, Expected=%f' % (i+1, pred, expected))
predictions.append(pred)
Let's plot the results...
In [11]:
pyplot.plot(visitor_history[-12:])
pyplot.plot(predictions)
pyplot.show()
Did the LSTM work any better than the simple monthly average?
In [12]:
rmse = sqrt(mean_squared_error(visitor_history[-12:], predictions))
print('Test RMSE: %.3f' % rmse)
In [ ]:
In [ ]:
In [ ]: