Annotating sentences with a Bidirectional RNN


In [19]:
import numpy as np

import theano
from theano import tensor

#from blocks import initialization
from blocks.bricks import Identity, Linear, Tanh, MLP, Softmax
from blocks.bricks.lookup import LookupTable
from blocks.bricks.recurrent import SimpleRecurrent, Bidirectional, BaseRecurrent
from blocks.bricks.parallel import Merge
#from blocks.bricks.parallel import Fork

from blocks.bricks.cost import CategoricalCrossEntropy
from blocks.initialization import IsotropicGaussian, Constant

from blocks.graph import ComputationGraph
from blocks.filter import VariableFilter
from blocks.roles import INPUT, WEIGHT, OUTPUT

In [43]:
vocab_size=3
embedding_dim=3
labels_size=10

lookup = LookupTable(vocab_size, embedding_dim)

encoder = Bidirectional(SimpleRecurrent(dim=embedding_dim, activation=Tanh()))

mlp = MLP([Softmax()], [embedding_dim, labels_size],
          weights_init=IsotropicGaussian(0.01),
          biases_init=Constant(0))

#encoder.prototype.apply.sequences
#dir(encoder.prototype.apply.sequences)

#combine = Merge(input_dims=dict(), output_dim=labels_size)
#labelled = Softmax( encoder )


x = tensor.imatrix('features')
y = tensor.imatrix('targets')

probs = mlp.apply(encoder.apply(lookup.apply(x)))

cost = CategoricalCrossEntropy().apply(y.flatten(), probs)

#cg = ComputationGraph([cost])
cg = ComputationGraph([probs])
cg.variables
#VariableFilter(roles=[OUTPUT])(cg.variables)

#dir(cg.outputs)
#np.shape(cg.outputs)

#mlp = MLP([Softmax()], [embedding_dim*2, labels_size],
#          weights_init=IsotropicGaussian(0.01),
#          biases_init=Constant(0))
#mlp.initialize()

#fork = Fork([name for name in encoder.prototype.apply.sequences if name != 'mask'])
#fork.input_dim = dimension
#fork.output_dims = [dimension for name in fork.input_names]
#print(fork.output_dims)

Now the output layer needs to gather the two hidden layers (one from each direction) :


In [4]:
#readout = Readout(
#    readout_dim=labels_size,
#    source_names=[encodetransition.apply.states[0], attention.take_glimpses.outputs[0]],
#    emitter=SoftmaxEmitter(name="emitter"),
#    #feedback_brick=LookupFeedback(alphabet_size, dimension),
#    name="readout")

Note that in order to double the input we had to apply a bricks.Linear brick to x, even though $h_t=f(Vh_{t−1}+Wx_t+b)$ is what is usually thought of as the RNN equation. The reason why recurrent bricks work that way is it allows greater flexibility and modularity: $Wx_t$ can be replaced by a whole neural network if we want.

Initial States

Recurrent models all have in common that their initial state has to be specified. However, in constructing our toy examples, we omitted to pass h0 when applying the recurrent brick. What happened?

It turns out that recurrent bricks set that initial state to zero if it’s not passed as argument, which is a good sane default in most cases, but we can just as well set it explicitly.

We will modify the starting example so that it accumulates the input it receives, but starting from one instead of zero (figure above):

$h_t=h_{t−1}+x_t, h_0=1$


In [8]:
h0 = tensor.matrix('h0')
h = rnn.apply(inputs=x, states=h0)

f = theano.function([x, h0], h)
print(f(np.ones((3, 1, 3), dtype=theano.config.floatX),
        np.ones((1, 3), dtype=theano.config.floatX)))

Iterate (or not)

The apply method of a recurrent brick accepts an iterate argument, which defaults to True. Setting it to False causes the apply method to compute only one step in the sequence.

This is very useful when you’re trying to combine multiple recurrent layers in a network.

Imagine you’d like to build a network with two recurrent layers. The second layer accumulates the output of the first layer, while the first layer accumulates the input of the network and the output of the second layer (see figure below).


In [ ]:

Here’s how you can create a recurrent brick that encapsulate the two layers:


In [5]:
class FeedbackRNN(BaseRecurrent):
    def __init__(self, dim, **kwargs):
        super(FeedbackRNN, self).__init__(**kwargs)
        self.dim = dim
        self.first_recurrent_layer = SimpleRecurrent(
            dim=self.dim, activation=Identity(), name='first_recurrent_layer',
            weights_init=initialization.Identity())
        self.second_recurrent_layer = SimpleRecurrent(
            dim=self.dim, activation=Identity(), name='second_recurrent_layer',
            weights_init=initialization.Identity())
        self.children = [self.first_recurrent_layer,
                         self.second_recurrent_layer]

    @recurrent(sequences=['inputs'], contexts=[],
               states=['first_states', 'second_states'],
               outputs=['first_states', 'second_states'])
    def apply(self, inputs, first_states=None, second_states=None):
        first_h = self.first_recurrent_layer.apply(
            inputs=inputs, states=first_states + second_states, iterate=False)
        second_h = self.second_recurrent_layer.apply(
            inputs=first_h, states=second_states, iterate=False)
        return first_h, second_h

    def get_dim(self, name):
        return (self.dim if name in ('inputs', 'first_states', 'second_states')
                else super(FeedbackRNN, self).get_dim(name))

x = tensor.tensor3('x')

feedback = FeedbackRNN(dim=3)
feedback.initialize()
first_h, second_h = feedback.apply(inputs=x)

f = theano.function([x], [first_h, second_h])
for states in f(np.ones((3, 1, 3), dtype=theano.config.floatX)):
    print(states)

There’s a lot of things going on here!

We defined a recurrent brick class called FeedbackRNN whose constructor initializes two bricks.recurrent.SimpleRecurrent bricks as its children.

The class has a get_dim method whose purpose is to tell the dimensionality of each input to the brick’s apply method.

The core of the class resides in its apply method. The @recurrent decorator is used to specify which of the arguments to the method are sequences to iterate over, what is returned when the method is called and which of those returned values correspond to recurrent states. Its relationship with the inputs and outputs arguments to the @application decorator is as follows:

  • outputs, like in @application, defines everything that’s returned by apply, including recurrent outputs
  • states is a subset of outputs that corresponds to recurrent outputs, which means that the union of sequences and states forms what would be inputs in @application

Notice how no call to theano.scan() is being made. This is because the implementation of apply is responsible for computing one time step of the recurrent application of the brick. It takes states at time $t−1$ and inputs at time $t$ and produces the output for time $t$. The rest is all handled by the @recurrent decorator behind the scenes.

This is why the iterate argument of the apply method is so useful: it allows to combine multiple recurrent brick applications within another apply implementation.

Tip

When looking at a recurrent brick’s documentation, keep in mind that the parameters to its apply method are explained in terms of a single iteration, i.e. with the assumption that iterate = False.