In the sequence of transformations (see fig. above), $x$ is the input vector at the current timestep. Notice that the memory cell state vectors $m$ and $c$ are updated at each timestep.
The goal of this exercise is to demystify LSTMs. In order to do this, you will construct your very own LSTM from the ground up.
Let's get started!
Let's suppose we have a data set $x$ of 7 features and 10 samples. Let's create some dummy data of the appropriate dimensions:
In [238]:
n_features = 7
# input data (one single data point)
x = np.random.random([1, n_features])
print x
In [ ]:
n_hidden = 5
n_output = 3 # can be at most n_hidden
# memory cell *output* (from previous step)
h = np.random.random([1, n_hidden])
# candidate memory cell *c-state* (from previous timestep)
c = ...
# memory cell *m-state* (from previous timestep)
m = ...
# input gate weights
w_i = ...
u_i = ...
b_i = ...
# candidate memory (c-state) weights
w_c = ...
u_c = ...
b_c = ...
# forget gate weights
w_f = ...
u_f = ...
b_f = ...
# output gate weights
w_o = ...
u_o = ...
v_o = ...
b_o = ...
# output projection weights
w_h = ...
b_h = ...
In [240]:
# %load sol/ex_lstm_init_weights.py
Here is the sequence of operations again:
$ i\ \leftarrow\ \text{sigmoid}\left(x\cdot w_i + m\cdot u_i + b_i\right) $
$ f\ \leftarrow\ \text{sigmoid}\left(x\cdot w_f + m\cdot u_f + b_f\right) $
$ c\ \leftarrow\ f\odot c + i\odot \tanh\left(x\cdot w_c + m\cdot u_c + b_c\right) $
$ o\ \leftarrow\ \text{sigmoid}\left(x\cdot w_o + m\cdot u_o + c\cdot v_o + b_o\right) $
$ m\ \leftarrow\ o\odot\tanh\left(c\right) $
$ h\ \leftarrow\ \tanh(m\cdot w_h + b_h) $
In [ ]:
from scipy.special import expit as sigmoid
# input gate
i = sigmoid(np.dot(x, w_i) + b_i)
# forget gate
f = ...
# candidate memory (c-state)
c = ...
# output gate
o = ...
# memory cell state (m-state)
m = ...
# output
h = ...
In [245]:
# %load sol/ex_lstm_operations.py
In [ ]:
class MyLSTMCell:
def __init__(self, n_hidden, n_output):
self.n_hidden = n_hidden
self.n_output = n_output
raise NotImplementedError('__init__')
def _init_weights(self, n_features):
raise NotImplementedError('_init_weights')
def __call__(self, x):
raise NotImplementedError('__call__')
return self.h
lstm_cell = MyLSTMCell(5)
lstm_cell(x)
In [ ]:
# %load sol/ex_lstm_class_numpy.py
In [ ]:
import tensorflow as tf
def init_weights(shape):
return (...)
def get_n_features(x):
return (...)
sigmoid = tf.sigmoid
tanh = (...)
dot = (...)
class MyLSTMCell:
"same definition as the numpy version in the solution above"
In [369]:
# %load sol/ex_lstm_class_tensorflow.py
I hope you're proud of yourself, you just created an LSTM cell very much like how it's defined in tensorflow. In fact, from now on, we're going to use the tensorflow's LSTMCell:
from tensorflow.contrib.rnn import LSTMCell
lstm_cell = LSTMCell(num_units=n_hidden, num_proj=n_output)
This class provides some additional functionality compared to our MyLSTMCell class.
In the next notebook, we'll string together the LSTM cells in a simple model that will generate the next character in a sequence of characters, i.e. text.