Learning to speak like Alice

A generative character based language model is created by training an RNN on the text of Alice in Wonderland.

Setup Imports


In [1]:
from __future__ import division, print_function
from keras.layers.recurrent import SimpleRNN
from keras.models import Sequential
from keras.layers import Dense, Activation
from keras.utils.visualize_util import plot
import numpy as np
%matplotlib inline


Using Theano backend.

Read input


In [2]:
fin = open("../data/alice_in_wonderland.txt", "rb")
lines = []
for line in fin:
    line = line.strip().lower().decode("ascii", "ignore")
    if len(line) == 0:
        continue
    lines.append(line)
fin.close()
text = "".join(lines)

Build vocabulary lookup tables


In [3]:
chars = set([c for c in text])
vocab_size = len(chars)
char2index = dict((c, i) for i, c in enumerate(chars))
index2char = dict((i, c) for i, c in enumerate(chars))

Create training data

We want to create fixed size strings of characters as the input sequence and the following character as the label. So for example, if the input is "the sky was falling", then the following sequence of training chars and label chars would be created:

the sky wa => s
he sky was => 
e sky was  => f
 sky was f => a
sky was fa => l

and so on.


In [4]:
seqlen = 10
step = 1
input_chars = []
label_chars = []
for i in range(0, len(text) - seqlen, step):
    input_chars.append(text[i:i+seqlen])
    label_chars.append(text[i+seqlen])

We now vectorize the input and label chars. Each row of input is represented by seqlen characters, each character is represented as a 1-hot encoding of size vocab_size. Thus the shape of X is (len(input_chars), seqlen, vocab_size).

Each row of the label is a single character, represented by a 1-hot encoding of size vocab_size. The corresponding prediction row (output of the network) would be a dense vector of size vocab_size. Hence the shape of y is (len(input_chars), vocab_size).


In [5]:
X = np.zeros((len(input_chars), seqlen, vocab_size), dtype=np.bool)
y = np.zeros((len(input_chars), vocab_size), dtype=np.bool)
for i, input_char in enumerate(input_chars):
    for j, ch in enumerate(input_char):
        X[i, j, char2index[ch]] = 1
    y[i, char2index[label_chars[i]]] = 1

Build the model


In [6]:
model = Sequential()
model.add(SimpleRNN(512, return_sequences=False, input_shape=(seqlen, vocab_size)))
model.add(Dense(vocab_size))
model.add(Activation("softmax"))

model.compile(loss="categorical_crossentropy", optimizer="rmsprop")

Train Model and Evaluate

We train the model in batches and evaluate the output generated at each step. There is no training set here, so evaluation is manual.

In each iteration, we fit the model for a single epoch, then randomly choose a row from the input_chars, then use it to generate text from the model for the next 100 chars.


In [7]:
batch_size = 128
for iteration in range(51):
    print("=" * 50)
    print("Iteration #: %d" % (iteration))
    
    model.fit(X, y, batch_size=batch_size, nb_epoch=1, verbose=0)
    
    # test model
    test_idx = np.random.randint(len(input_chars))
    test_chars = input_chars[test_idx]
    print("Seed: %s" % (test_chars))
    print(test_chars, end="")
    for i in range(100):
        Xtest = np.zeros((1, seqlen, vocab_size))
        for i, ch in enumerate(test_chars):
            Xtest[0, i, char2index[ch]] = 1
        pred = model.predict(Xtest, verbose=0)[0]
        ypred = index2char[np.argmax(pred)]
        print(ypred, end="")
        # move the input one step forward
        test_chars = test_chars[1:] + ypred
    print()


==================================================
Iteration #: 0
Seed: ow are you
ow are you the wase the wase the wase the wase the wase the wase the wase the wase the wase the wase the wase 
==================================================
Iteration #: 1
Seed: l looked s
l looked soute sad the the the the the the the the the the the the the the the the the the the the the the the
==================================================
Iteration #: 2
Seed: or speaker
or speaker and the dor the more the the dor the more the the dor the more the the dor the more the the dor the
==================================================
Iteration #: 3
Seed: out this, 
out this, and the sall and the the she sall the she sall the she sall the she sall the she sall the she sall t
==================================================
Iteration #: 4
Seed: remember,'
remember,' said the crows and all the treperse for and all the treperse for and all the treperse for and all t
==================================================
Iteration #: 5
Seed: rightened 
rightened in the said the cate the was in a the cate the was in a the cate the was in a the cate the was in a 
==================================================
Iteration #: 6
Seed: ly.'that's
ly.'that's the was a dowe the reat the was a dowe the reat the was a dowe the reat the was a dowe the reat the
==================================================
Iteration #: 7
Seed: joythe pep
joythe pepperst the gryphon. 'i don't see the gryphon. 'i don't see the gryphon. 'i don't see the gryphon. 'i 
==================================================
Iteration #: 8
Seed: re them, i
re them, it had the mock turtle the was a down the was a down the was a down the was a down the was a down the
==================================================
Iteration #: 9
Seed: it, she fo
it, she fort as the cater the gryphon it was the began the bext the began the bext the began the bext the bega
==================================================
Iteration #: 10
Seed: ' the gryp
' the gryphon a the was get the was get the was get the was get the was get the was get the was get the was ge
==================================================
Iteration #: 11
Seed: 'why is a 
'why is a a don't be the did the said the caterpillar same and the said the caterpillar same and the said the 
==================================================
Iteration #: 12
Seed: leepy and 
leepy and the was a some thing it alice was a looked at the was a some thing it alice was a looked at the was 
==================================================
Iteration #: 13
Seed: set, and t
set, and the dormouse, and the dormouse, and the dormouse, and the dormouse, and the dormouse, and the dormous
==================================================
Iteration #: 14
Seed: se they le
se they lear the courden the gryphon, and was speaked to her firdor and words it down the said to see in the g
==================================================
Iteration #: 15
Seed:  opened it
 opened it was going out of the said to alice soon she had no such a turred to alice soon she had no such a tu
==================================================
Iteration #: 16
Seed: head first
head first one i'm a rarden she was got to see it was the cats as it was got to see it was the cats as it was 
==================================================
Iteration #: 17
Seed: id the you
id the you sous of the same the said the mock turtle so alice said the mock turtle so alice said the mock turt
==================================================
Iteration #: 18
Seed:  and it se
 and it seally at the caterpillar say she was sat the began surpring of the pupper hardly at all the beautiful
==================================================
Iteration #: 19
Seed: memory, an
memory, and the little good the long and the hatter with the lobst resting to see the moment the mock turtle, 
==================================================
Iteration #: 20
Seed: them about
them about as an the dormouse said to herself, and she don't sous you pered to her sear the dormouse said to h
==================================================
Iteration #: 21
Seed: ed dinah!'
ed dinah!' said the king said to herself, 'i wonder, she said to herself, 'i wonder, she said to herself, 'i w
==================================================
Iteration #: 22
Seed: he stairs.
he stairs.'the rabbit say the way of the some to be never herself and the little to see the little to see the 
==================================================
Iteration #: 23
Seed: il she mad
il she made alice as it as she canerally, 'and then the white rabbit spicked the caterpillar that it was the c
==================================================
Iteration #: 24
Seed: is a very 
is a very care of the top thing the top of her to say and down the mock turtle in a low, that it might as well
==================================================
Iteration #: 25
Seed: nd they al
nd they all as she had been anthe pighous and she had been anthe pighous and she had been anthe pighous and sh
==================================================
Iteration #: 26
Seed:  wereanima
 wereanimade the said the look and the sorpers as it all the party were the ore what it last the said the look
==================================================
Iteration #: 27
Seed: he king: '
he king: 'the door with the white rabbit wander when she was the white rabbit wander when she was the white ra
==================================================
Iteration #: 28
Seed:  out to he
 out to herself as she was a to do alice, as she said the king, and the dormouse into the jury, and the dormou
==================================================
Iteration #: 29
Seed:  in a deep
 in a deep to the gryphon was stielding her hands and the pool of the word to herself and all the pool of the 
==================================================
Iteration #: 30
Seed: hat she wa
hat she wasn't out of the wind of little things a little to see the with on a long as a little to see the with
==================================================
Iteration #: 31
Seed: as he foun
as he found the dormouse shook the first face at once torked round her feel very sing the dormouse shook the f
==================================================
Iteration #: 32
Seed: ht us,' sa
ht us,' said the mouse all the jumped in the pigeon in the dingen the dingen the dingen the dingen the dingen 
==================================================
Iteration #: 33
Seed: tiful soup
tiful soup! the king said to the white rabbit see in the side with the rabbit is the whole she had here with t
==================================================
Iteration #: 34
Seed: l as she c
l as she could not she was so she was so she was so she was so she was so she was so she was so she was so she
==================================================
Iteration #: 35
Seed: ment, spla
ment, splash the gryphon in a very don't believe there was reamed the rest of the more the reat surprised to s
==================================================
Iteration #: 36
Seed: ent on aga
ent on again, and alice as it a little sisters, and seemed to be a stor to do the white rabbit, who sung itsou
==================================================
Iteration #: 37
Seed: he had wep
he had wept the three gardeners, but she took more speaked at the white rabbit spoke, and the white rabbit spo
==================================================
Iteration #: 38
Seed: ome of the
ome of the court.'(alice hastily replied the queen, and looked at the sort of the court.'(alice hastily replie
==================================================
Iteration #: 39
Seed: dle yet?' 
dle yet?' said the king to she had chind alice took like that care so much of expected to herself, 'i wonder w
==================================================
Iteration #: 40
Seed:  down the 
 down the door and her head to find her for as the door and her head to find her for as the door and her head 
==================================================
Iteration #: 41
Seed: , and i do
, and i don't know what a courted to the dire to alice, and the mouse to be a looking the one, 'i don't know w
==================================================
Iteration #: 42
Seed: rpose"?' s
rpose"?' said alice, 'why, i wonder?' said alice, 'why, i wonder?' said alice, 'why, i wonder?' said alice, 'w
==================================================
Iteration #: 43
Seed: to see som
to see some of the table, and the gryphon.'the little door to the caterpillar the caterpillar the caterpillar 
==================================================
Iteration #: 44
Seed:  for a goo
 for a good learne of me like the little did not quite a serpent, and then and looked at the mock turtle said 
==================================================
Iteration #: 45
Seed: hepool, 'a
hepool, 'and the mouse were and the dormouse was speaking to her child alice was not at the startles for a lit
==================================================
Iteration #: 46
Seed: ,' the mou
,' the mouse the mock turtle, 'but it so she went on the same showly to see how she was so she went on the sam
==================================================
Iteration #: 47
Seed: ly takes s
ly takes so langed a little way one of the caterpillar.'you mean in she was an on!' come of the words won't yo
==================================================
Iteration #: 48
Seed: ound as sh
ound as she said to the juryment that it made rever head down about the cook the caterpillar the mock turtle r
==================================================
Iteration #: 49
Seed: oks, and s
oks, and she went on partick and he went so minuted to speak the mouse to alice, 'that's the queen the dormous
==================================================
Iteration #: 50
Seed: ords:'yes,
ords:'yes,' said the gryphon, and the mock turtlerpent to see her very mind as all to long as it was into the 

Looks like the RNN has learnt how to spell, although it hasn't learnt too much about grammar.


In [ ]: