Exercise: Build an LSTM cell

In the sequence of transformations (see fig. above), $x$ is the input vector at the current timestep. Notice that the memory cell state vectors $m$ and $c$ are updated at each timestep.

The Goal

The goal of this exercise is to demystify LSTMs. In order to do this, you will construct your very own LSTM from the ground up.

Let's get started!

Create some dummy data

Let's suppose we have a data set $x$ of 7 features and 10 samples. Let's create some dummy data of the appropriate dimensions:


In [238]:
n_features = 7

# input data (one single data point)
x = np.random.random([1, n_features])


print x


[[ 0.9187018   0.33828914  0.31343234  0.59327048  0.01080375  0.6280915
   0.39356742]]

a) Initialize the weights

Randomly initialize all the weights using 5 hidden units:


In [ ]:
n_hidden = 5
n_output = 3  # can be at most n_hidden


# memory cell *output* (from previous step)
h = np.random.random([1, n_hidden])

# candidate memory cell *c-state* (from previous timestep)
c = ...

# memory cell *m-state* (from previous timestep)
m = ...

# input gate weights
w_i = ...
u_i = ...
b_i = ...

# candidate memory (c-state) weights
w_c = ...
u_c = ...
b_c = ...

# forget gate weights
w_f = ...
u_f = ...
b_f = ...

# output gate weights
w_o = ...
u_o = ...
v_o = ...
b_o = ...

# output projection weights
w_h = ...
b_h = ...

In [240]:
# %load sol/ex_lstm_init_weights.py

b) Implement the operations

Here is the sequence of operations again:

$ i\ \leftarrow\ \text{sigmoid}\left(x\cdot w_i + m\cdot u_i + b_i\right) $

$ f\ \leftarrow\ \text{sigmoid}\left(x\cdot w_f + m\cdot u_f + b_f\right) $

$ c\ \leftarrow\ f\odot c + i\odot \tanh\left(x\cdot w_c + m\cdot u_c + b_c\right) $

$ o\ \leftarrow\ \text{sigmoid}\left(x\cdot w_o + m\cdot u_o + c\cdot v_o + b_o\right) $

$ m\ \leftarrow\ o\odot\tanh\left(c\right) $

$ h\ \leftarrow\ \tanh(m\cdot w_h + b_h) $


In [ ]:
from scipy.special import expit as sigmoid

# input gate
i = sigmoid(np.dot(x, w_i) + b_i)

# forget gate
f = ...

# candidate memory (c-state)
c = ...

# output gate
o = ...

# memory cell state (m-state)
m = ...

# output
h = ...

In [245]:
# %load sol/ex_lstm_operations.py


Overwriting sol/ex_lstm_operations.py

That's it! You've just implemented your own LSTM cell. All that remains now is to organize our code a little bit.

c) Create your own LSTMCell class

This exercise requires a slightly more advanced understanding of python, so feel free to peek at the solution below.


In [ ]:
class MyLSTMCell:
    def __init__(self, n_hidden, n_output):
        self.n_hidden = n_hidden
        self.n_output = n_output
        raise NotImplementedError('__init__')
        
    def _init_weights(self, n_features):
        raise NotImplementedError('_init_weights')
        
    def __call__(self, x):
        raise NotImplementedError('__call__')
        
        return self.h


lstm_cell = MyLSTMCell(5)
lstm_cell(x)

In [ ]:
# %load sol/ex_lstm_class_numpy.py

d) Translate to tensorflow

Now define the MyLSTMCell class again, this time using tensorflow operations instead of numpy.


In [ ]:
import tensorflow as tf


def init_weights(shape):
    return (...)


def get_n_features(x):
    return (...)


sigmoid = tf.sigmoid
tanh = (...)
dot = (...)



class MyLSTMCell:
    "same definition as the numpy version in the solution above"

In [369]:
# %load sol/ex_lstm_class_tensorflow.py

I hope you're proud of yourself, you just created an LSTM cell very much like how it's defined in tensorflow. In fact, from now on, we're going to use the tensorflow's LSTMCell:

from tensorflow.contrib.rnn import LSTMCell

lstm_cell = LSTMCell(num_units=n_hidden, num_proj=n_output)

This class provides some additional functionality compared to our MyLSTMCell class.

In the next notebook, we'll string together the LSTM cells in a simple model that will generate the next character in a sequence of characters, i.e. text.