Character LSTM Generative Model

Imports



In [1]:

    
from __future__ import division, print_function
from keras.callbacks import ModelCheckpoint
from keras.layers.core import Dense, Activation
from keras.layers.recurrent import LSTM
from keras.models import Sequential, load_model
from keras.optimizers import RMSprop
from urllib import urlopen
import numpy as np
import matplotlib.pyplot as plt
import operator
import os
import re
%matplotlib inline









    



Using TensorFlow backend.

Constants



In [2]:

    
DATA_DIR = "../../data"

INPUT_URL = "http://www.gutenberg.org/files/11/11-0.txt"
INPUT_TEXT = os.path.join(DATA_DIR, "alice.txt")

BEST_MODEL_PATH = os.path.join(DATA_DIR, "05-char-lstm-gen-best.h5")
FINAL_MODEL_PATH = os.path.join(DATA_DIR, "05-char-lstm-gen-final.h5")

TRAIN_SEQLEN = 40
PREDICT_SEQLEN = 100

NUM_EPOCHS = 60
BATCH_SIZE = 128

Download Data

Our data comes from the text of the children's novel Alice in Wonderland by Lewis Carroll, hosted on Project Gutenberg. In an effort to make the notebook self-sufficient, we download the file from the project site if it is not already available in our data directory.



In [3]:

    
def maybe_download(url, file):
    if os.path.exists(file):
        return
    fin = urlopen(url)
    fout = open(file, "wb")
    fout.write(fin.read())
    fout.close()
    fin.close()


maybe_download(INPUT_URL, INPUT_TEXT)
text = open(INPUT_TEXT).read().decode("ascii", "ignore").lower()
# text = open(INPUT_TEXT).read().lower()

Setup Vocabulary and Lookup Tables



In [4]:

    
chars = sorted(list(set([c for c in text])))
vocab_size = len(chars)
print("vocab_size:", vocab_size)









    



vocab_size: 57



In [5]:

    
char2idx = dict([(c, i) for i, c in enumerate(chars)])
idx2char = dict([(i, c) for i, c in enumerate(chars)])

Chunk and Vectorize



In [6]:

    
xdata, ydata = [], []
for i in range(0, len(text) - TRAIN_SEQLEN):
    xdata.append(text[i : i + TRAIN_SEQLEN])
    ydata.append(text[i + TRAIN_SEQLEN])

assert(len(xdata) == len(ydata))
print("# sequences:", len(xdata))
for i in range(10):
    print("{:d}: {:s}\t{:s}".format(i, xdata[i], ydata[i]))









    



# sequences: 164492
0: project gutenbergs alices adventures in 	w
1: roject gutenbergs alices adventures in w	o
2: oject gutenbergs alices adventures in wo	n
3: ject gutenbergs alices adventures in won	d
4: ect gutenbergs alices adventures in wond	e
5: ct gutenbergs alices adventures in wonde	r
6: t gutenbergs alices adventures in wonder	l
7:  gutenbergs alices adventures in wonderl	a
8: gutenbergs alices adventures in wonderla	n
9: utenbergs alices adventures in wonderlan	d



In [7]:

    
def vectorize(vocab_size, char2idx, xdata, ydata=None):
    X = np.zeros((len(xdata), TRAIN_SEQLEN, vocab_size), dtype=np.bool)
    Y = None if ydata is None else np.zeros((len(xdata), vocab_size), dtype=np.bool) 
    for i, x in enumerate(xdata):
        for j, c in enumerate(x):
            X[i, j, char2idx[c]] = 1
        if ydata is not None:
            Y[i, char2idx[ydata[i]]] = 1
    return X, Y


X, Y = vectorize(vocab_size, char2idx, xdata, ydata)
print(X.shape, Y.shape)









    



(164492, 40, 57) (164492, 57)

Build and train model

This step takes a while the first time. The model itself is quite simple and is adapted from the Keras example of character LSTM based generative model that is given 40 characters of text and asked to predict the next character. Network architecture and optimizer settings are identical to that in the example. It takes a while to train, so we will only train it the first time, the next time the model is just loaded from the best saved model during training.



In [8]:

    
history = None
if not os.path.exists(BEST_MODEL_PATH):
    # build model
    model = Sequential()
    model.add(LSTM(128, input_shape=(TRAIN_SEQLEN, vocab_size)))
    model.add(Dense(vocab_size))
    model.add(Activation("softmax"))
    # compile model
    optim = RMSprop(lr=1e-2)
    model.compile(loss="categorical_crossentropy", optimizer=optim)
    # train model
    checkpoint = ModelCheckpoint(filepath=BEST_MODEL_PATH, save_best_only=True)
    history = model.fit(X, Y, batch_size=BATCH_SIZE, epochs=NUM_EPOCHS,
                        validation_split=0.1, callbacks=[checkpoint])
    final_model = model.save(FINAL_MODEL_PATH)









    



Train on 148042 samples, validate on 16450 samples
Epoch 1/60
148042/148042 [==============================] - 155s 1ms/step - loss: 1.8959 - val_loss: 2.2568
Epoch 2/60
148042/148042 [==============================] - 155s 1ms/step - loss: 1.5029 - val_loss: 2.1349
Epoch 3/60
148042/148042 [==============================] - 155s 1ms/step - loss: 1.3966 - val_loss: 2.1081
Epoch 4/60
148042/148042 [==============================] - 155s 1ms/step - loss: 1.3405 - val_loss: 2.1214
Epoch 5/60
148042/148042 [==============================] - 155s 1ms/step - loss: 1.3068 - val_loss: 2.1032
Epoch 6/60
148042/148042 [==============================] - 155s 1ms/step - loss: 1.2792 - val_loss: 2.1987
Epoch 7/60
148042/148042 [==============================] - 155s 1ms/step - loss: 1.2601 - val_loss: 2.1281
Epoch 8/60
148042/148042 [==============================] - 155s 1ms/step - loss: 1.2439 - val_loss: 2.1206
Epoch 9/60
148042/148042 [==============================] - 156s 1ms/step - loss: 1.2311 - val_loss: 2.1547
Epoch 10/60
148042/148042 [==============================] - 157s 1ms/step - loss: 1.2185 - val_loss: 2.1288
Epoch 11/60
148042/148042 [==============================] - 156s 1ms/step - loss: 1.2071 - val_loss: 2.1438
Epoch 12/60
148042/148042 [==============================] - 156s 1ms/step - loss: 1.1992 - val_loss: 2.1687
Epoch 13/60
148042/148042 [==============================] - 156s 1ms/step - loss: 1.1913 - val_loss: 2.1492
Epoch 14/60
148042/148042 [==============================] - 156s 1ms/step - loss: 1.1842 - val_loss: 2.1986
Epoch 15/60
148042/148042 [==============================] - 156s 1ms/step - loss: 1.1758 - val_loss: 2.1678
Epoch 16/60
148042/148042 [==============================] - 156s 1ms/step - loss: 1.1701 - val_loss: 2.1799
Epoch 17/60
148042/148042 [==============================] - 156s 1ms/step - loss: 1.1655 - val_loss: 2.2237
Epoch 18/60
148042/148042 [==============================] - 157s 1ms/step - loss: 1.2203 - val_loss: 2.3773
Epoch 19/60
148042/148042 [==============================] - 157s 1ms/step - loss: 1.3991 - val_loss: 2.5511
Epoch 20/60
148042/148042 [==============================] - 156s 1ms/step - loss: 1.6566 - val_loss: 2.5567
Epoch 21/60
148042/148042 [==============================] - 157s 1ms/step - loss: 1.7444 - val_loss: 2.8030
Epoch 22/60
148042/148042 [==============================] - 157s 1ms/step - loss: 2.0064 - val_loss: 2.8399
Epoch 23/60
148042/148042 [==============================] - 157s 1ms/step - loss: 2.1143 - val_loss: 3.0252
Epoch 24/60
148042/148042 [==============================] - 157s 1ms/step - loss: 2.4493 - val_loss: 3.4726
Epoch 25/60
148042/148042 [==============================] - 158s 1ms/step - loss: 2.7747 - val_loss: 3.7091
Epoch 26/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.2503 - val_loss: 3.7835
Epoch 27/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.2044 - val_loss: 3.6144
Epoch 28/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.0934 - val_loss: 3.3210
Epoch 29/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.1206 - val_loss: 3.9000
Epoch 30/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.2617 - val_loss: 3.3960
Epoch 31/60
148042/148042 [==============================] - 157s 1ms/step - loss: 2.9697 - val_loss: 3.3176
Epoch 32/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.0581 - val_loss: 3.4518
Epoch 33/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.1476 - val_loss: 3.2563
Epoch 34/60
148042/148042 [==============================] - 157s 1ms/step - loss: 2.9189 - val_loss: 3.2840
Epoch 35/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.0840 - val_loss: 3.3647
Epoch 36/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.0157 - val_loss: 3.3761
Epoch 37/60
148042/148042 [==============================] - 157s 1ms/step - loss: 2.9833 - val_loss: 3.2954
Epoch 38/60
148042/148042 [==============================] - 159s 1ms/step - loss: 2.9767 - val_loss: 3.2313
Epoch 39/60
148042/148042 [==============================] - 158s 1ms/step - loss: 2.9632 - val_loss: 3.2306
Epoch 40/60
148042/148042 [==============================] - 158s 1ms/step - loss: 2.9848 - val_loss: 3.2819
Epoch 41/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.0104 - val_loss: 3.2635
Epoch 42/60
148042/148042 [==============================] - 157s 1ms/step - loss: 2.9996 - val_loss: 3.3168
Epoch 43/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0352 - val_loss: 3.3476
Epoch 44/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0560 - val_loss: 3.2755
Epoch 45/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0450 - val_loss: 3.2908
Epoch 46/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0834 - val_loss: 3.2992
Epoch 47/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.0566 - val_loss: 3.2941
Epoch 48/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0853 - val_loss: 3.3501
Epoch 49/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.1123 - val_loss: 3.3444
Epoch 50/60
148042/148042 [==============================] - 157s 1ms/step - loss: 3.1083 - val_loss: 3.3446
Epoch 51/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0956 - val_loss: 3.2806
Epoch 52/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0704 - val_loss: 3.2944
Epoch 53/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0633 - val_loss: 3.3087
Epoch 54/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0615 - val_loss: 3.2624
Epoch 55/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0759 - val_loss: 3.3077
Epoch 56/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0833 - val_loss: 3.3058
Epoch 57/60
148042/148042 [==============================] - 159s 1ms/step - loss: 3.0846 - val_loss: 3.3208
Epoch 58/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0905 - val_loss: 3.2968
Epoch 59/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0855 - val_loss: 3.2950
Epoch 60/60
148042/148042 [==============================] - 158s 1ms/step - loss: 3.0807 - val_loss: 3.2909



In [9]:

    
if history is not None:
    plt.title("Loss")
    plt.plot(history.history["loss"], color="r", label="Train")
    plt.plot(history.history["val_loss"], color="b", label="Validation")
    plt.legend(loc="best")
    plt.show()

Predict

We want to compare greedy search, ie, selecting the best predicted character at each step with beam search, that considers the top k predictions at each step and predicts the best resulting sequence. We will take a set of random 40 character sequences from the text and generate results of greedy search vs beam search for each of them.

From Jason Brownlee's tutorial on How to Implement a Beam Search Decoder for Natural Language Processing:

Instead of greedily choosing the most likely next step as the sequence is constructed, the beam search expands all possible next steps and keeps the k most likely, where k is a user-specified parameter and controls the number of beams or parallel searches through the sequence of probabilities.

We do not need to start with random states; instead, we start with the k most likely words as the first step in the sequence.

Note that greedy search is just a degenerate case of beam search where the beam size k=1.



In [10]:

    
model = load_model(BEST_MODEL_PATH)



In [14]:

    
def predict_using_beam_search(model, seed_text, predict_seqlen, vocab_size, 
                              char2idx, idx2char, k):
    inputs = [(seed_text, 0.0, "")]
    for ppos in range(predict_seqlen):
        outputs = []
        # predict for each input tuple
        for input_text, log_prob, pred_chars in inputs:
            Xpred, _ = vectorize(vocab_size, char2idx, [input_text])
            Ypred = model.predict(Xpred)[0]
            top_idxs = np.argsort(Ypred)[-k:]
            for top_idx in top_idxs:
                pred_char = idx2char[top_idx]
                new_input_text = input_text[1:] + pred_char
                new_log_prob = log_prob + np.log(Ypred[top_idx])
                new_pred_chars = pred_chars + pred_char
                outputs.append((new_input_text, new_log_prob, new_pred_chars))
        if len(outputs) > k:
            top_outputs = sorted(outputs, key=operator.itemgetter(1),
                                reverse=True)[0:k]
            inputs = top_outputs
        else:
            inputs = outputs
    predicted_seq = sorted(inputs, key=operator.itemgetter(1), reverse=True)[0][2]
    return predicted_seq


for tid in range(10):
    start_idx = np.random.randint(len(text) - TRAIN_SEQLEN)
    seed_text = text[start_idx : start_idx + TRAIN_SEQLEN]
    print("Example {:d}\n".format(tid + 1))
    for beam_size in [1, 3, 5, 7]:
        predicted_seq = predict_using_beam_search(model, seed_text, PREDICT_SEQLEN,
                                                  vocab_size, char2idx, idx2char, 
                                                  beam_size)
        line_output = "---- beam size {:d}: {:s}\n".format(
            beam_size, "____".join([seed_text, predicted_seq])
                             .replace("\r", " ")
                             .replace("\n", " "))
        line_output = re.sub("\s+", " ", line_output)
        print(line_output)
    print("\n")









    



Example 1

---- beam size 1: id alice in a tone of great surprise. ____ alice said to the conversatic of the this before the three conclutions and grow the white rabbit a 
---- beam size 3: id alice in a tone of great surprise. ____ alice replied to herself. alice replied to herself. alice replied to herself. alice replie 
---- beam size 5: id alice in a tone of great surprise. ____ alice replied. alice replied. alice replied. alice replied. alice replied. alice had 
---- beam size 7: id alice in a tone of great surprise. ____ alice replied. alice replied. alice replied. alice replied. alice replied. alice had 


Example 2

---- beam size 1: shillings and pence. take off your ha____nd as it was a little sharply she was a little sharply she was a little sharply she was a little sha 
---- beam size 3: shillings and pence. take off your ha____nd of the court, and she was a little shall of the court, and she was a little shall of the court, a 
---- beam size 5: shillings and pence. take off your ha____nds, and the court, and the court, and the court, and the court, and the court, and she had not the 
---- beam size 7: shillings and pence. take off your ha____nds, and she had never said to herself. alice replied. alice replied. alice replied. the 


Example 3

---- beam size 1: and seemed not to be listening, so she____ was so she was a little sharply she was a little sharply she was a little sharply she was a little 
---- beam size 3: and seemed not to be listening, so she____ said to herself, and the conclutions, and the court, and she had not the court, and she was a littl 
---- beam size 5: and seemed not to be listening, so she____ said to herself. alice replied. alice replied. alice replied. alice replied to herself, 
---- beam size 7: and seemed not to be listening, so she____ said to herself. alice replied. alice replied. alice replied. alice replied. alice s 


Example 4

---- beam size 1: ry! alice called after it; and the othe____r soon the conclution of the three work and she was a little sharply she was a little sharply she wa 
---- beam size 3: ry! alice called after it; and the othe____r. alice said to the white rabbit, and the conclutions, and the court, and she had not the conclu 
---- beam size 5: ry! alice called after it; and the othe____r. alice replied. alice replied. alice replied. alice replied. alice replied. alic 
---- beam size 7: ry! alice called after it; and the othe____r. alice replied. alice replied. alice replied. alice replied. alice replied. alic 


Example 5

---- beam size 1: e could remember about ravens and writin____g to the conclutions and grow the white rabbit a little sharply she was a little sharply she was a l 
---- beam size 3: e could remember about ravens and writin____g, and the conclutions, and the court, and she had not the court, and she was a little shall of the 
---- beam size 5: e could remember about ravens and writin____g to herself. alice replied. alice replied. alice replied. alice replied. alice said 
---- beam size 7: e could remember about ravens and writin____g to herself. alice replied. alice said to herself, said the mock turtle. alice replied. 


Example 6

---- beam size 1: or two, it was as much as she could do t____he conclution of the three work and she was a little sharply she was a little sharply she was a litt 
---- beam size 3: or two, it was as much as she could do t____hat they were to be nothing to the white rabbit, and the conclutions, and the court, and she had not 
---- beam size 5: or two, it was as much as she could do t____hat she had been the mouse said to herself. alice replied. alice replied. alice replied. 
---- beam size 7: or two, it was as much as she could do t____he hatter. alice replied. alice replied. alice replied. alice replied. alice did not 


Example 7

---- beam size 1: er to the other side of the court. a____lice said to the conversatic of the this before the three conclutions and grow the white rabbit a li 
---- beam size 3: er to the other side of the court. a____lice replied to herself. alice replied to herself. alice replied to herself. alice replied 
---- beam size 5: er to the other side of the court. a____lice replied. alice replied. alice replied. alice replied. alice replied. alice said 
---- beam size 7: er to the other side of the court. a____lice replied. alice replied. alice replied. alice replied. alice replied. alice looke 


Example 8

---- beam size 1: . the only things in the kitchen that d____id not the conclutions and grow the white rabbit a little sharply she was a little sharply she was a 
---- beam size 3: . the only things in the kitchen that d____ear! she had never said to herself. alice replied to herself. alice replied to herself. ali 
---- beam size 5: . the only things in the kitchen that d____ear, and the court, and the court, and the court, and the court, and the court, and she had not the 
---- beam size 7: . the only things in the kitchen that d____ifferent! 


Example 9

---- beam size 1: was a large mushroom growing near her, ____and the conclutions and grow the white rabbit a little sharply she was a little sharply she was a li 
---- beam size 3: was a large mushroom growing near her, ____said the mock turtle. alice replied to herself. alice replied to herself. alice said to the 
---- beam size 5: was a large mushroom growing near her, ____said the mock turtle. alice replied. alice replied. alice replied. alice replied. the 
---- beam size 7: was a large mushroom growing near her, ____said the mock turtle. alice said to herself. alice replied. alice replied. alice replied 


Example 10

---- beam size 1: hich puzzled her a good deal until she ____was a little sharply she was a little sharply she was a little sharply she was a little sharply she 
---- beam size 3: hich puzzled her a good deal until she ____had been the conversatic of the court, and she was a little shall of the court, and she was a little 
---- beam size 5: hich puzzled her a good deal until she ____could, and the queen, said the mock turtle. alice replied. alice replied. alice replied. 
---- beam size 7: hich puzzled her a good deal until she ____could not the court, and she said to herself. alice replied. alice replied. alice looked up

Conclusion

In the outputs above, the seed text is separated visually from the predicted sequence of characters by 4 underscore characters. Regardless of the beam size chosen, the resulting sequences still seem mostly meaningless, although in some cases, longer beam sizes seem to result in less repetitive subsequences.



In [ ]: