Day 6: Sequence Models in Deep Learning

In [ ]:
%load_ext autoreload
%autoreload 2

Exercise 6.1

Convince yourself that a RNN is just an FF unfolded in time. Complete the backpropagation() method in the NumpyRNN class in


and compare it with


WSJ Data

To work with RNNs we will use the Part-of-speech data-set seen in the sequence models day.

In [ ]:
# Load Part-of-Speech data 
from lxmls.readers.pos_corpus import PostagCorpusData
data = PostagCorpusData()

Load and configure the NumpyRNN. Remember to use reload if you want to modify the code inside the rnns module

In [ ]:
from lxmls.deep_learning.numpy_models.rnn import NumpyRNN
model = NumpyRNN(

Milestone 1:

As in the case of the feed-forward networks you can use the following setup to test step by step the implementation of the gradients. First compute the cost variation for the variation of a single weight

In [ ]:
print([x.shape for x in model.parameters])

In [ ]:
from lxmls.deep_learning.rnn import get_rnn_parameter_handlers, get_rnn_loss_range

# Get functions to get and set values of a particular weight of the model
get_parameter, set_parameter = get_rnn_parameter_handlers(

# Get batch of data
batch = data.batches('train', batch_size=1)[0]

# Get loss and weight value
current_loss = model.cross_entropy_loss(batch['input'], batch['output'])
current_weight = get_parameter(model.parameters)

# Get range of values of the weight and loss around current parameters values
weight_range, loss_range = get_rnn_loss_range(model, get_parameter, set_parameter, batch)

Then compute the desired gradient from your implementation

In [ ]:
# Get the gradient value for that weight
gradients = model.backpropagation(batch['input'], batch['output'])
current_gradient = get_parameter(gradients)

And finally call matlplotlib to plot the loss variation versus the gradient

In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
# Plot empirical
plt.plot(weight_range, loss_range)
plt.plot(current_weight, current_loss, 'xr')
plt.ylabel('loss value')
plt.xlabel('weight value')
# Plot real
h = plt.plot(
    current_gradient*(weight_range - current_weight) + current_loss, 

Milestone 2:

After you have completed the gradients you can run the model in the POS task

In [ ]:
# Hyper-parameters
num_epochs = 20

import numpy as np

# Get batch iterators for train and test
train_batches = data.batches('train', batch_size=1)
dev_set = data.batches('dev', batch_size=1)
test_set = data.batches('test', batch_size=1)

# Epoch loop
import time
start = time.time()
for epoch in range(num_epochs):

    # Batch loop
    for batch in train_batches:
        model.update(input=batch['input'], output=batch['output'])

    # Evaluation dev
    is_hit = []
    for batch in dev_set:
        is_hit.extend(model.predict(input=batch['input']) == batch['output'])
    accuracy = 100*np.mean(is_hit)
    print("Epoch %d: dev accuracy %2.2f %%" % (epoch+1, accuracy))

print("Training took %2.2f seconds per epoch" % ((time.time() - start)/num_epochs))    
# Evaluation test
is_hit = []
for batch in test_set:
    is_hit.extend(model.predict(input=batch['input']) == batch['output'])
accuracy = 100*np.mean(is_hit)

# Inform user
print("Test accuracy %2.2f %%" % accuracy)