Time Series / Sequences

Example, some code and a lot of inspiration taken from: https://machinelearningmastery.com/how-to-develop-lstm-models-for-time-series-forecasting/

Univariate Sequences

just one variable per time step

Challenge

We have a known series of events, possibly in time and you want to know what is the next event. Like this

[10, 20, 30, 40, 50, 60, 70, 80, 90]


In [0]:
# univariate data preparation
import numpy as np

# split a univariate sequence into samples
def split_sequence(sequence, n_steps):
	X, y = list(), list()
	for i in range(len(sequence)):
		# find the end of this pattern
		end_ix = i + n_steps
		# check if we are beyond the sequence
		if end_ix > len(sequence)-1:
			break
		# gather input and output parts of the pattern
		seq_x, seq_y = sequence[i:end_ix], sequence[end_ix]
		X.append(seq_x)
		y.append(seq_y)
	return np.array(X), np.array(y)

In [2]:
# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]

# choose a number of time steps
n_steps = 3

# split into samples
X, y = split_sequence(raw_seq, n_steps)

# summarize the data
list(zip(X, y))


Out[2]:
[(array([10, 20, 30]), 40),
 (array([20, 30, 40]), 50),
 (array([30, 40, 50]), 60),
 (array([40, 50, 60]), 70),
 (array([50, 60, 70]), 80),
 (array([60, 70, 80]), 90)]

In [3]:
X


Out[3]:
array([[10, 20, 30],
       [20, 30, 40],
       [30, 40, 50],
       [40, 50, 60],
       [50, 60, 70],
       [60, 70, 80]])

Converting shapes

  • one of the most frequent, yet most tedious steps
  • match between what you have and what an interface needs
  • expected input of RNN: 3D tensor featureswith shape (samples, timesteps, input_dim)
  • we have: (samples, timesteps)
  • reshape on np arrays can do all that

In [4]:
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))
X


Out[4]:
array([[[10],
        [20],
        [30]],

       [[20],
        [30],
        [40]],

       [[30],
        [40],
        [50]],

       [[40],
        [50],
        [60]],

       [[50],
        [60],
        [70]],

       [[60],
        [70],
        [80]]])

In [0]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Dense, LSTM, GRU, SimpleRNN, Bidirectional
from tensorflow.keras.models import Sequential, Model

model = Sequential()
model.add(SimpleRNN(units=50, activation='relu', input_shape=(n_steps, n_features), name="RNN_Input"))
model.add(Dense(units=1, name="Linear_Output"))
model.compile(optimizer='adam', loss='mse')

In [6]:
%time history = model.fit(X, y, epochs=500, verbose=0)


CPU times: user 1.58 s, sys: 56.5 ms, total: 1.64 s
Wall time: 1.3 s

In [7]:
import matplotlib.pyplot as plt

plt.plot(history.history['loss'])


Out[7]:
[<matplotlib.lines.Line2D at 0x7f2e151bfba8>]

In [8]:
# this does not look too bad
X_sample = np.array([[10, 20, 30], [70, 80, 90]])
X_sample = X_sample.reshape((X_sample.shape[0], X_sample.shape[1], n_features))
X_sample


Out[8]:
array([[[10],
        [20],
        [30]],

       [[70],
        [80],
        [90]]])

In [9]:
y_pred = model.predict(X_sample)
y_pred


Out[9]:
array([[ 38.85929],
       [100.90913]], dtype=float32)

In [0]:
def predict(model, samples, n_features=1):
  input = np.array(samples)
  input = input.reshape((input.shape[0], input.shape[1], n_features))
  y_pred = model.predict(input)
  return y_pred

In [11]:
# do not look too close, though
predict(model, [[100, 110, 120], [200, 210, 220], [200, 300, 400]])


Out[11]:
array([[133.06836],
       [240.18936],
       [482.26587]], dtype=float32)

Input and output of an RNN layer


In [12]:
# https://keras.io/layers/recurrent/
# input: (samples, timesteps, input_dim)
# output: (samples, units)

# let's have a look at the actual output for an example
rnn_layer = model.get_layer("RNN_Input")
model_stub = Model(inputs = model.input, outputs = rnn_layer.output)
hidden = predict(model_stub, [[10, 20, 30]])
hidden


Out[12]:
array([[ 6.312214  ,  0.        ,  0.        ,  7.8945413 ,  0.        ,
         0.        ,  8.674644  , 10.3216095 ,  0.        ,  0.        ,
        10.498358  , 11.888763  ,  8.759704  ,  0.        , 12.425329  ,
         0.        ,  0.        ,  0.        ,  0.        ,  0.        ,
         0.        , 13.695377  ,  0.        ,  0.        ,  0.71732473,
         0.        ,  0.        ,  0.        ,  9.951456  ,  0.        ,
        12.402657  , 10.01326   , 10.412435  ,  2.1506536 ,  0.        ,
         0.        ,  0.        ,  0.        , 23.847918  ,  0.        ,
         0.        ,  0.        ,  8.327042  ,  0.        ,  0.        ,
        13.307503  ,  0.        ,  6.722261  ,  0.40568876, 13.559036  ]],
      dtype=float32)

What do we see?

  • each unit (50) has a single output
  • as a sidenote you nicely see the RELU nature of the output
  • so the timesteps are lost
  • we are only looking at the final output
  • still with each timestep, the layer does produce a unique output we can use

We need to look into RNNs a bit more deeply now

RNNs - Networks with Loops

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Unrolling the loop

http://colah.github.io/posts/2015-08-Understanding-LSTMs/

Simple RNN internals

$output_t = \tanh(W input_t + U output_{t-1} + b)$

From Deep Learning with Python, Chapter 6, François Chollet, Manning: https://livebook.manning.com/#!/book/deep-learning-with-python/chapter-6/129

Activation functions

Sigmoid compressing between 0 and 1

Hyperbolic tangent, like sigmoind, but compressing between -1 and 1, thus allowing for negative values as well

Advanced part follows


In [13]:
# https://arxiv.org/ftp/arxiv/papers/1701/1701.05923.pdf
# n = output dimension
# m = input dimension
# Total number of parameters for 
# Simple RNN = n**2 + nm + n
# GRU = 3 × (n**2 + nm + n)
# LSTM = 4 × (n**2 + nm + n)

rnn_units = 1

model = Sequential()
model.add(SimpleRNN(units=rnn_units, activation='relu', input_shape=(n_steps, n_features), name="RNN_Input"))
# model.add(GRU(units=rnn_units, activation='relu', input_shape=(n_steps, n_features), name="RNN_Input"))
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
RNN_Input (SimpleRNN)        (None, 1)                 3         
=================================================================
Total params: 3
Trainable params: 3
Non-trainable params: 0
_________________________________________________________________

In [14]:
output_dimension = rnn_units
input_dimension = n_features
parameters = 1 * (output_dimension ** 2 +  output_dimension * input_dimension + output_dimension) 
parameters


Out[14]:
3

In [15]:
# from only a single output for the final timestep
# ideal for feeding into something that *does not* handle timesteps
rnn_units = 1
model = Sequential([
    SimpleRNN(units=rnn_units, activation='relu', input_shape=(n_steps, n_features))
])
predict(model, [[10, 20, 30]])


Out[15]:
array([[58.223083]], dtype=float32)

Multi Layer RNNs


In [16]:
# to one output for each timestep
# ideal for feeding into something that *expects* timesteps
rnn_units = 1
model = Sequential([
    SimpleRNN(units=rnn_units, activation='relu', input_shape=(n_steps, n_features), return_sequences=True)
])

# https://keras.io/layers/recurrent/
# input: (samples, timesteps, input_dim)
# output with return_sequences: (samples, timesteps, units)

predict(model, [[10, 20, 30]])


Out[16]:
array([[[ 7.988285],
        [23.964855],
        [47.92971 ]]], dtype=float32)

In [17]:
rnn_units = 50

model = Sequential([
    SimpleRNN(units=rnn_units, activation='relu', input_shape=(n_steps, n_features), return_sequences=True, name="RNN_Input"),
    SimpleRNN(units=rnn_units, activation='relu', name="RNN_Latent"),
    Dense(units=1, name="Linear_Output")
])
model.compile(optimizer='adam', loss='mse')
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
RNN_Input (SimpleRNN)        (None, 3, 50)             2600      
_________________________________________________________________
RNN_Latent (SimpleRNN)       (None, 50)                5050      
_________________________________________________________________
Linear_Output (Dense)        (None, 1)                 51        
=================================================================
Total params: 7,701
Trainable params: 7,701
Non-trainable params: 0
_________________________________________________________________

In [18]:
%time history = model.fit(X, y, epochs=500, verbose=0)
plt.plot(history.history['loss'])


CPU times: user 2.51 s, sys: 58.4 ms, total: 2.57 s
Wall time: 2.01 s
Out[18]:
[<matplotlib.lines.Line2D at 0x7f2e11aa7048>]

In [19]:
predict(model, [[10, 20, 30], [70, 80, 90], [100, 110, 120], [200, 210, 220], [200, 300, 400]])


Out[19]:
array([[ 40.016705],
       [101.12904 ],
       [134.53055 ],
       [246.12546 ],
       [490.72736 ]], dtype=float32)

Bidirectional RNNs


In [0]:
rnn_units = 50

model = Sequential([
    Bidirectional(SimpleRNN(units=rnn_units, activation='relu', input_shape=(n_steps, n_features), name="RNN_Input")),
    Dense(units=1, name="Linear_Output")
])
model.compile(optimizer='adam', loss='mse')

In [21]:
%time history = model.fit(X, y, epochs=500, verbose=0)
plt.plot(history.history['loss'])


CPU times: user 2.92 s, sys: 76.3 ms, total: 3 s
Wall time: 2.5 s
Out[21]:
[<matplotlib.lines.Line2D at 0x7f2e10f6f940>]

In [22]:
predict(model, [[10, 20, 30], [70, 80, 90], [100, 110, 120], [200, 210, 220], [200, 300, 400]])


Out[22]:
array([[ 39.936214],
       [100.38759 ],
       [131.34027 ],
       [234.24709 ],
       [491.69153 ]], dtype=float32)

LSMTs / GRUs

  • mainly beneficial for long sequences
  • but also 3-4 times more expensive
  • might not have better results for short sequences like these

In [23]:
rnn_units = 50

model = Sequential([
    LSTM(units=rnn_units, activation='relu', input_shape=(n_steps, n_features), name="RNN_Input"),
    Dense(units=1, name="Linear_Output")
])
model.compile(optimizer='adam', loss='mse')
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
RNN_Input (LSTM)             (None, 50)                10400     
_________________________________________________________________
Linear_Output (Dense)        (None, 1)                 51        
=================================================================
Total params: 10,451
Trainable params: 10,451
Non-trainable params: 0
_________________________________________________________________

In [24]:
output_dimension = rnn_units
input_dimension = n_features
parameters = 4 * (output_dimension ** 2 +  output_dimension * input_dimension + output_dimension) 
parameters


Out[24]:
10400

In [25]:
%time history = model.fit(X, y, epochs=500, verbose=0)
plt.plot(history.history['loss'])


CPU times: user 4.78 s, sys: 154 ms, total: 4.93 s
Wall time: 3.74 s
Out[25]:
[<matplotlib.lines.Line2D at 0x7f2e100e14a8>]

In [26]:
predict(model, [[10, 20, 30], [70, 80, 90], [100, 110, 120], [200, 210, 220], [200, 300, 400]])


Out[26]:
array([[ 40.0106 ],
       [103.86981],
       [144.77574],
       [285.70535],
       [359.83347]], dtype=float32)

In [27]:
rnn_units = 50

model = Sequential([
    GRU(units=rnn_units, activation='relu', input_shape=(n_steps, n_features), name="RNN_Input"),
    Dense(units=1, name="Linear_Output")
])
model.compile(optimizer='adam', loss='mse')
model.summary()


_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
RNN_Input (GRU)              (None, 50)                7800      
_________________________________________________________________
Linear_Output (Dense)        (None, 1)                 51        
=================================================================
Total params: 7,851
Trainable params: 7,851
Non-trainable params: 0
_________________________________________________________________

In [28]:
output_dimension = rnn_units
input_dimension = n_features
parameters = 3 * (output_dimension ** 2 +  output_dimension * input_dimension + output_dimension) 
parameters


Out[28]:
7800

In [29]:
%time history = model.fit(X, y, epochs=500, verbose=0)
plt.plot(history.history['loss'])


CPU times: user 4.49 s, sys: 108 ms, total: 4.6 s
Wall time: 3.6 s
Out[29]:
[<matplotlib.lines.Line2D at 0x7f2e0f4ca668>]

In [30]:
predict(model, [[10, 20, 30], [70, 80, 90], [100, 110, 120], [200, 210, 220], [200, 300, 400]])


Out[30]:
array([[ 39.999657],
       [102.85135 ],
       [141.43193 ],
       [267.96152 ],
       [420.1227  ]], dtype=float32)

Multivariate LSTM Models

Multiple Input Series


In [31]:
in_seq1 = [10, 20, 30, 40, 50, 60, 70, 80, 90]
in_seq2 = [15, 25, 35, 45, 55, 65, 75, 85, 95]
out_seq = [in1 + in2 for in1, in2 in zip(in_seq1, in_seq2)]
out_seq


Out[31]:
[25, 45, 65, 85, 105, 125, 145, 165, 185]

In [32]:
# convert to [rows, columns] structure
in_seq1 = np.array(in_seq1).reshape((len(in_seq1), 1))
in_seq2 = np.array(in_seq2).reshape((len(in_seq2), 1))
out_seq = np.array(out_seq).reshape((len(out_seq), 1))
out_seq


Out[32]:
array([[ 25],
       [ 45],
       [ 65],
       [ 85],
       [105],
       [125],
       [145],
       [165],
       [185]])

In [33]:
# horizontally stack columns
dataset = np.hstack((in_seq1, in_seq2, out_seq))
dataset


Out[33]:
array([[ 10,  15,  25],
       [ 20,  25,  45],
       [ 30,  35,  65],
       [ 40,  45,  85],
       [ 50,  55, 105],
       [ 60,  65, 125],
       [ 70,  75, 145],
       [ 80,  85, 165],
       [ 90,  95, 185]])

In [0]:
# split a multivariate sequence into samples
def split_sequences(sequences, n_steps):
	X, y = list(), list()
	for i in range(len(sequences)):
		# find the end of this pattern
		end_ix = i + n_steps
		# check if we are beyond the dataset
		if end_ix > len(sequences):
			break
		# gather input and output parts of the pattern
		seq_x, seq_y = sequences[i:end_ix, :-1], sequences[end_ix-1, -1]
		X.append(seq_x)
		y.append(seq_y)
	return np.array(X), np.array(y)

In [35]:
# choose a number of time steps
n_steps = 3

# convert into input/output
X, y = split_sequences(dataset, n_steps)

# summarize the data
list(zip(X, y))


Out[35]:
[(array([[10, 15],
         [20, 25],
         [30, 35]]), 65), (array([[20, 25],
         [30, 35],
         [40, 45]]), 85), (array([[30, 35],
         [40, 45],
         [50, 55]]), 105), (array([[40, 45],
         [50, 55],
         [60, 65]]), 125), (array([[50, 55],
         [60, 65],
         [70, 75]]), 145), (array([[60, 65],
         [70, 75],
         [80, 85]]), 165), (array([[70, 75],
         [80, 85],
         [90, 95]]), 185)]

In [0]:
# the dataset knows the number of features, e.g. 2
n_features = X.shape[2]

# define model
model = Sequential()
model.add(GRU(units=50, activation='relu', input_shape=(n_steps, n_features), name="RNN_Input"))
model.add(Dense(units=1, name="Linear_Output"))
model.compile(optimizer='adam', loss='mse')

In [37]:
# fit model
%time history = model.fit(X, y, epochs=500, verbose=0)
import matplotlib.pyplot as plt

plt.yscale('log')
plt.plot(history.history['loss'])


CPU times: user 4.18 s, sys: 122 ms, total: 4.3 s
Wall time: 3.35 s
Out[37]:
[<matplotlib.lines.Line2D at 0x7f2e0e8200f0>]

In [0]:
def predict_multi(model, samples):
  input = np.array(samples)
  input = input.reshape(1, input.shape[0], input.shape[1])
  y_pred = model.predict(input)
  return y_pred

In [39]:
predict_multi(model, [[80, 85], [90, 95], [100, 105]])


Out[39]:
array([[206.78265]], dtype=float32)

In [40]:
predict_multi(model, [[10, 15], [20, 25], [30, 35]])


Out[40]:
array([[64.88253]], dtype=float32)

In [41]:
predict_multi(model, [[180, 185], [190, 195], [200, 205]])


Out[41]:
array([[423.78033]], dtype=float32)

Let's make this a little bit harder

  • output y can be inferred from final timestep
  • now we try to infer following ouput

In [42]:
y += 20
list(zip(X, y))


Out[42]:
[(array([[10, 15],
         [20, 25],
         [30, 35]]), 85), (array([[20, 25],
         [30, 35],
         [40, 45]]), 105), (array([[30, 35],
         [40, 45],
         [50, 55]]), 125), (array([[40, 45],
         [50, 55],
         [60, 65]]), 145), (array([[50, 55],
         [60, 65],
         [70, 75]]), 165), (array([[60, 65],
         [70, 75],
         [80, 85]]), 185), (array([[70, 75],
         [80, 85],
         [90, 95]]), 205)]

In [43]:
model = Sequential()
model.add(GRU(units=50, activation='relu', input_shape=(n_steps, n_features), name="RNN_Input"))
model.add(Dense(units=1, name="Linear_Output"))
model.compile(optimizer='adam', loss='mse')

# train a little bit longer, as this should be harder now
%time history = model.fit(X, y, epochs=2000, verbose=0)
import matplotlib.pyplot as plt

plt.yscale('log')
plt.plot(history.history['loss'])


CPU times: user 11.2 s, sys: 365 ms, total: 11.5 s
Wall time: 7.74 s
Out[43]:
[<matplotlib.lines.Line2D at 0x7f2e0daf2748>]

In [44]:
predict_multi(model, [[80, 85], [90, 95], [100, 105]])


Out[44]:
array([[228.60007]], dtype=float32)

In [45]:
predict_multi(model, [[10, 15], [20, 25], [30, 35]])


Out[45]:
array([[84.943115]], dtype=float32)

In [46]:
predict_multi(model, [[180, 185], [190, 195], [200, 205]])


Out[46]:
array([[461.72055]], dtype=float32)

Multi-Step LSTM Models

  • this might just as well be an encoder / decoder approach

In [47]:
# split a univariate sequence into samples
def split_sequence(sequence, n_steps_in, n_steps_out):
	X, y = list(), list()
	for i in range(len(sequence)):
		# find the end of this pattern
		end_ix = i + n_steps_in
		out_end_ix = end_ix + n_steps_out
		# check if we are beyond the sequence
		if out_end_ix > len(sequence):
			break
		# gather input and output parts of the pattern
		seq_x, seq_y = sequence[i:end_ix], sequence[end_ix:out_end_ix]
		X.append(seq_x)
		y.append(seq_y)
	return np.array(X), np.array(y)
 
# define input sequence
raw_seq = [10, 20, 30, 40, 50, 60, 70, 80, 90]
# choose a number of time steps
n_steps_in, n_steps_out = 3, 2
# split into samples
X, y = split_sequence(raw_seq, n_steps_in, n_steps_out)
# summarize the data
for input, output in zip(X, y):
  print (input, output)


[10 20 30] [40 50]
[20 30 40] [50 60]
[30 40 50] [60 70]
[40 50 60] [70 80]
[50 60 70] [80 90]

In [0]:
# reshape from [samples, timesteps] into [samples, timesteps, features]
n_features = 1
X = X.reshape((X.shape[0], X.shape[1], n_features))
# define model
model = Sequential()
model.add(GRU(100, activation='relu', input_shape=(n_steps_in, n_features)))
# model.add(GRU(100, activation='relu', return_sequences=True, input_shape=(n_steps_in, n_features)))
# model.add(GRU(100, activation='relu'))
model.add(Dense(n_steps_out))
model.compile(optimizer='adam', loss='mse')

In [49]:
# fit model
%time history = model.fit(X, y, epochs=500, verbose=0)
import matplotlib.pyplot as plt

plt.yscale('log')
plt.plot(history.history['loss'])


CPU times: user 5.59 s, sys: 143 ms, total: 5.73 s
Wall time: 4.23 s
Out[49]:
[<matplotlib.lines.Line2D at 0x7f2e0cd740f0>]

In [50]:
X_sample = np.array([70, 80, 90]).reshape((1, n_steps_in, n_features))
y_pred = model.predict(X_sample)
print(y_pred)


[[104.24542 116.23607]]

In [51]:
X_sample = np.array([10, 20, 30]).reshape((1, n_steps_in, n_features))
y_pred = model.predict(X_sample)
print(y_pred)


[[39.993275 50.004936]]

In [0]: