In [1]:
# we assume that we have the dynet module in your path.
# OUTDATED: we also assume that LD_LIBRARY_PATH includes a pointer to where libcnn_shared.so is.
from dynet import *
An (1-layer) RNN can be thought of as a sequence of cells, $h_1,...,h_k$, where $h_i$ indicates the time dimenstion.
Each cell $h_i$ has an input $x_i$ and an output $r_i$. In addition to $x_i$, cell $h_i$ receives as input also $r_{i-1}$.
In a deep (multi-layer) RNN, we don't have a sequence, but a grid. That is we have several layers of sequences:
Let $r_i^j$ be the output of cell $h_i^j$. Then:
The input to $h_i^1$ is $x_i$ and $r_{i-1}^1$.
The input to $h_i^2$ is $r_i^1$ and $r_{i-1}^2$, and so on.
In [2]:
pc = ParameterCollection()
NUM_LAYERS=2
INPUT_DIM=50
HIDDEN_DIM=10
builder = LSTMBuilder(NUM_LAYERS, INPUT_DIM, HIDDEN_DIM, pc)
# or:
# builder = SimpleRNNBuilder(NUM_LAYERS, INPUT_DIM, HIDDEN_DIM, pc)
Note that when we create the builder, it adds the internal RNN parameters to the ParameterCollection
.
We do not need to care about them, but they will be optimized together with the rest of the network's parameters.
In [3]:
s0 = builder.initial_state()
In [4]:
x1 = vecInput(INPUT_DIM)
In [5]:
s1=s0.add_input(x1)
y1 = s1.output()
# here, we add x1 to the RNN, and the output we get from the top is y (a HIDEN_DIM-dim vector)
In [6]:
y1.npvalue().shape
Out[6]:
In [7]:
s2=s1.add_input(x1) # we can add another input
y2=s2.output()
If our LSTM/RNN was one layer deep, y2 would be equal to the hidden state. However, since it is 2 layers deep, y2 is only the hidden state (= output) of the last layer.
If we were to want access to the all the hidden state (the output of both the first and the last layers), we could use the .h()
method, which returns a list of expressions, one for each layer:
In [8]:
print s2.h()
The same interface that we saw until now for the LSTM, holds also for the Simple RNN:
In [9]:
# create a simple rnn builder
rnnbuilder=SimpleRNNBuilder(NUM_LAYERS, INPUT_DIM, HIDDEN_DIM, pc)
# initialize a new graph, and a new sequence
rs0 = rnnbuilder.initial_state()
# add inputs
rs1 = rs0.add_input(x1)
ry1 = rs1.output()
print "all layers:", s1.h()
In [10]:
print s1.s()
To summarize, when calling .add_input(x)
on an RNNState
what happens is that the state creates a new RNN/LSTM column, passing it:
x
The state is then returned, and we can call it's output()
method to get the output y
, which is the output at the top of the column. We can access the outputs of all the layers (not only the last one) using the .h()
method of the state.
.s()
The internal state of the RNN may be more involved than just the outputs $h$. This is the case for the LSTM, that keeps an extra "memory" cell, that is used when calculating $h$, and which is also passed to the next column. To access the entire hidden state, we use the .s()
method.
The output of .s()
differs by the type of RNN being used. For the simple-RNN, it is the same as .h()
. For the LSTM, it is more involved.
In [11]:
rnn_h = rs1.h()
rnn_s = rs1.s()
print "RNN h:", rnn_h
print "RNN s:", rnn_s
lstm_h = s1.h()
lstm_s = s1.s()
print "LSTM h:", lstm_h
print "LSTM s:", lstm_s
As we can see, the LSTM has two extra state expressions (one for each hidden layer) before the outputs h.
Stack LSTM The RNN's are shaped as a stack: we can remove the top and continue from the previous state.
This is done either by remembering the previous state and continuing it with a new .add_input()
, or using
we can access the previous state of a given state using the .prev()
method of state.
Initializing a new sequence with a given state When we call builder.initial_state()
, we are assuming the state has random /0 initialization. If we want, we can specify a list of expressions that will serve as the initial state. The expected format is the same as the results of a call to .final_s()
. TODO: this is not supported yet.
In [12]:
s2=s1.add_input(x1)
s3=s2.add_input(x1)
s4=s3.add_input(x1)
# let's continue s3 with a new input.
s5=s3.add_input(x1)
# we now have two different sequences:
# s0,s1,s2,s3,s4
# s0,s1,s2,s3,s5
# the two sequences share parameters.
assert(s5.prev() == s3)
assert(s4.prev() == s3)
s6=s3.prev().add_input(x1)
# we now have an additional sequence:
# s0,s1,s2,s6
In [13]:
s6.h()
Out[13]:
In [14]:
s6.s()
Out[14]:
The RNNState
interface is convenient, and allows for incremental input construction.
However, sometimes we know the sequence of inputs in advance, and care only about the sequence of
output expressions. In this case, we can use the add_inputs(xs)
method, where xs
is a list of Expression.
In [15]:
state = rnnbuilder.initial_state()
xs = [x1,x1,x1]
states = state.add_inputs(xs)
outputs = [s.output() for s in states]
hs = [s.h() for s in states]
print outputs, hs
This is convenient.
What if we do not care about .s()
and .h()
, and do not need to access the previous vectors? In such cases
we can use the transduce(xs)
method instead of add_inputs(xs)
.
transduce
takes in a sequence of Expression
s, and returns a sequence of Expression
s.
As a consequence of not returning RNNState
s, trnasduce
is much more memory efficient than add_inputs
or a series of calls to add_input
.
In [16]:
state = rnnbuilder.initial_state()
xs = [x1,x1,x1]
outputs = state.transduce(xs)
print outputs
In [17]:
import random
from collections import defaultdict
from itertools import count
import sys
LAYERS = 2
INPUT_DIM = 50
HIDDEN_DIM = 50
characters = list("abcdefghijklmnopqrstuvwxyz ")
characters.append("<EOS>")
int2char = list(characters)
char2int = {c:i for i,c in enumerate(characters)}
VOCAB_SIZE = len(characters)
In [18]:
pc = ParameterCollection()
srnn = SimpleRNNBuilder(LAYERS, INPUT_DIM, HIDDEN_DIM, pc)
lstm = LSTMBuilder(LAYERS, INPUT_DIM, HIDDEN_DIM, pc)
params = {}
params["lookup"] = pc.add_lookup_parameters((VOCAB_SIZE, INPUT_DIM))
params["R"] = pc.add_parameters((VOCAB_SIZE, HIDDEN_DIM))
params["bias"] = pc.add_parameters((VOCAB_SIZE))
# return compute loss of RNN for one sentence
def do_one_sentence(rnn, sentence):
# setup the sentence
renew_cg()
s0 = rnn.initial_state()
R = parameter(params["R"])
bias = parameter(params["bias"])
lookup = params["lookup"]
sentence = ["<EOS>"] + list(sentence) + ["<EOS>"]
sentence = [char2int[c] for c in sentence]
s = s0
loss = []
for char,next_char in zip(sentence,sentence[1:]):
s = s.add_input(lookup[char])
probs = softmax(R*s.output() + bias)
loss.append( -log(pick(probs,next_char)) )
loss = esum(loss)
return loss
# generate from model:
def generate(rnn):
def sample(probs):
rnd = random.random()
for i,p in enumerate(probs):
rnd -= p
if rnd <= 0: break
return i
# setup the sentence
renew_cg()
s0 = rnn.initial_state()
R = parameter(params["R"])
bias = parameter(params["bias"])
lookup = params["lookup"]
s = s0.add_input(lookup[char2int["<EOS>"]])
out=[]
while True:
probs = softmax(R*s.output() + bias)
probs = probs.vec_value()
next_char = sample(probs)
out.append(int2char[next_char])
if out[-1] == "<EOS>": break
s = s.add_input(lookup[next_char])
return "".join(out[:-1]) # strip the <EOS>
# train, and generate every 5 samples
def train(rnn, sentence):
trainer = SimpleSGDTrainer(pc)
for i in xrange(200):
loss = do_one_sentence(rnn, sentence)
loss_value = loss.value()
loss.backward()
trainer.update()
if i % 5 == 0:
print loss_value,
print generate(rnn)
Notice that:
do_one_sentence
over and over again.
We must re-use the same rnn-builder, as this is where the shared parameters are kept.renew_cg()
before each sentence -- because we want to have a new graph (new network) for this sentence.
The parameters will be shared through the model and the shared rnn-builder.
In [19]:
sentence = "a quick brown fox jumped over the lazy dog"
train(srnn, sentence)
In [20]:
sentence = "a quick brown fox jumped over the lazy dog"
train(lstm, sentence)
The model seem to learn the sentence quite well.
Somewhat surprisingly, the Simple-RNN model learn quicker than the LSTM!
How can that be?
The answer is that we are cheating a bit. The sentence we are trying to learn has each letter-bigram exactly once. This means a simple trigram model can memorize it very well.
Try it out with more complex sequences.
In [21]:
train(srnn, "these pretzels are making me thirsty")
In [ ]: