Lab 6.1 - Keras for RNN

In this lab we will use the Keras deep learning library to construct a simple recurrent neural network (RNN) that can learn linguistic structure from a piece of text, and use that knowledge to generate new text passages. To review general RNN architecture, specific types of RNN networks such as the LSTM networks we'll be using here, and other concepts behind this type of machine learning, you should consult the following resources:

This code is an adaptation of these two examples:

You can consult the original sites for more information and documentation.

Let's start by importing some of the libraries we'll be using in this lab:


In [1]:
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

from time import gmtime, strftime
import os
import re
import pickle
import random
import sys


Using TensorFlow backend.

The first thing we need to do is generate our training data set. In this case we will use a recent article written by Barack Obama for The Economist newspaper. Make sure you have the obama.txt file in the /data folder within the /week-6 folder in your repository.


In [2]:
# load ascii text from file
filename = "data/obama.txt"
raw_text = open(filename).read()

# get rid of any characters other than letters, numbers, 
# and a few special characters
raw_text = re.sub('[^\nA-Za-z0-9 ,.:;?!-]+', '', raw_text)

# convert all text to lowercase
raw_text = raw_text.lower()

n_chars = len(raw_text)
print "length of text:", n_chars
print "text preview:", raw_text[:500]


length of text: 18312
text preview: wherever i go these days, at home or abroad, people ask me the same question: what is happening in the american political system? how has a country that has benefitedperhaps more than any otherfrom immigration, trade and technological innovation suddenly developed a strain of anti-immigrant, anti-innovation protectionism? why have some on the far left and even more on the far right embraced a crude populism that promises a return to a past that is not possible to restoreand that, for most americ

Next, we use python's set() function to generate a list of all unique characters in the text. This will form our 'vocabulary' of characters, which is similar to the categories found in typical ML classification problems.

Since neural networks work with numerical data, we also need to create a mapping between each character and a unique integer value. To do this we create two dictionaries: one which has characters as keys and the associated integers as the value, and one which has integers as keys and the associated characters as the value. These dictionaries will allow us to do translation both ways.


In [3]:
# extract all unique characters in the text
chars = sorted(list(set(raw_text)))
n_vocab = len(chars)
print "number of unique characters found:", n_vocab

# create mapping of characters to integers and back
char_to_int = dict((c, i) for i, c in enumerate(chars))
int_to_char = dict((i, c) for i, c in enumerate(chars))

# test our mapping
print 'a', "- maps to ->", char_to_int["a"]
print 25, "- maps to ->", int_to_char[25]


number of unique characters found: 44
a - maps to -> 18
25 - maps to -> h

Now we need to define the training data for our network. With RNN's, the training data usually takes the shape of a three-dimensional matrix, with the size of each dimension representing:

[# of training sequences, # of training samples per sequence, # of features per sample]

  • The training sequences are the sets of data subjected to the RNN at each training step. As with all neural networks, these training sequences are presented to the network in small batches during training.
  • Each training sequence is composed of some number of training samples. The number of samples in each sequence dictates how far back in the data stream the algorithm will learn, and sets the depth of the RNN layer.
  • Each training sample within a sequence is composed of some number of features. This is the data that the RNN layer is learning from at each time step. In our example, the training samples and targets will use one-hot encoding, so will have a feature for each possible character, with the actual character represented by 1, and all others by 0.

To prepare the data, we first set the length of training sequences we want to use. In this case we will set the sequence length to 100, meaning the RNN layer will be able to predict future characters based on the 100 characters that came before.

We will then slide this 100 character 'window' over the entire text to create input and output arrays. Each entry in the input array contains 100 characters from the text, and each entry in the output array contains the single character that came after.


In [4]:
# prepare the dataset of input to output pairs encoded as integers
seq_length = 100

inputs = []
outputs = []

for i in range(0, n_chars - seq_length, 1):
    inputs.append(raw_text[i:i + seq_length])
    outputs.append(raw_text[i + seq_length])
    
n_sequences = len(inputs)
print "Total sequences: ", n_sequences


Total sequences:  18212

Now let's shuffle both the input and output data so that we can later have Keras split it automatically into a training and test set. To make sure the two lists are shuffled the same way (maintaining correspondance between inputs and outputs), we create a separate shuffled list of indeces, and use these indeces to reorder both lists.


In [5]:
indeces = range(len(inputs))
random.shuffle(indeces)

inputs = [inputs[x] for x in indeces]
outputs = [outputs[x] for x in indeces]

Let's visualize one of these sequences to make sure we are getting what we expect:


In [6]:
print inputs[0], "-->", outputs[0]


ut energy-sector emissions by 6, even as our economy has grown by 11 see chart 4. progress in americ --> a

Next we will prepare the actual numpy datasets which will be used to train our network. We first initialize two empty numpy arrays in the proper formatting:

  • X --> [# of training sequences, # of training samples, # of features]
  • y --> [# of training sequences, # of features]

We then iterate over the arrays we generated in the previous step and fill the numpy arrays with the proper data. Since all character data is formatted using one-hot encoding, we initialize both data sets with zeros. As we iterate over the data, we use the char_to_int dictionary to map each character to its related position integer, and use that position to change the related value in the data set to 1.


In [7]:
# create two empty numpy array with the proper dimensions
X = np.zeros((n_sequences, seq_length, n_vocab), dtype=np.bool)
y = np.zeros((n_sequences, n_vocab), dtype=np.bool)

# iterate over the data and build up the X and y data sets
# by setting the appropriate indices to 1 in each one-hot vector
for i, example in enumerate(inputs):
    for t, char in enumerate(example):
        X[i, t, char_to_int[char]] = 1
    y[i, char_to_int[outputs[i]]] = 1
    
print 'X dims -->', X.shape
print 'y dims -->', y.shape


X dims --> (18212, 100, 44)
y dims --> (18212, 44)

Next, we define our RNN model in Keras. This is very similar to how we defined the CNN model, except now we use the LSTM() function to create an LSTM layer with an internal memory of 128 neurons. LSTM is a special type of RNN layer which solves the unstable gradients issue seen in basic RNN. Along with LSTM layers, Keras also supports basic RNN layers and GRU layers, which are similar to LSTM. You can find full documentation for recurrent layers in Keras' documentation

As before, we need to explicitly define the input shape for the first layer. Also, we need to tell Keras whether the LSTM layer should pass its sequence of predictions or its internal memory as the output to the next layer. If you are connecting the LSTM layer to a fully connected layer as we do in this case, you should set the return_sequences parameter to False to have the layer pass the value of its hidden neurons. If you are connecting multiple LSTM layers, you should set the parameter to True in all but the last layer, so that subsequent layers can learn from the sequence of predictions of previous layers.

We will use dropout with a probability of 50% to regularize the network and prevent overfitting on our training data. The output of the network will be a fully connected layer with one neuron for each character in the vocabulary. The softmax function will convert this output to a probability distribution across all characters.


In [8]:
# define the LSTM model
model = Sequential()
model.add(LSTM(128, return_sequences=False, input_shape=(X.shape[1], X.shape[2])))
model.add(Dropout(0.50))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')

Next, we define two helper functions: one to select a character based on a probability distribution, and one to generate a sequence of predicted characters based on an input (or 'seed') list of characters.

The sample() function will take in a probability distribution generated by the softmax() function, and select a character based on the 'temperature' input. The temperature (also often called the 'diversity') effects how strictly the probability distribution is sampled.

  • Lower values (closer to zero) output more confident predictions, but are also more conservative. In our case, if the model has overfit the training data, lower values are likely to give back exactly what is found in the text
  • Higher values (1 and above) introduce more diversity and randomness into the results. This can lead the model to generate novel information not found in the training data. However, you are also likely to see more errors such as grammatical or spelling mistakes.

In [9]:
def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

The generate() function will take in:

  • input sentance ('seed')
  • number of characters to generate
  • and target diversity or temperature

and print the resulting sequence of characters to the screen.


In [10]:
def generate(sentence, prediction_length=50, diversity=0.35):
    print '----- diversity:', diversity 

    generated = sentence
    sys.stdout.write(generated)

    # iterate over number of characters requested
    for i in range(prediction_length):
        
        # build up sequence data from current sentence
        x = np.zeros((1, X.shape[1], X.shape[2]))
        for t, char in enumerate(sentence):
            x[0, t, char_to_int[char]] = 1.

        # use trained model to return probability distribution
        # for next character based on input sequence
        preds = model.predict(x, verbose=0)[0]
        
        # use sample() function to sample next character 
        # based on probability distribution and desired diversity
        next_index = sample(preds, diversity)
        
        # convert integer to character
        next_char = int_to_char[next_index]

        # add new character to generated text
        generated += next_char
        
        # delete the first character from beginning of sentance, 
        # and add new caracter to the end. This will form the 
        # input sequence for the next predicted character.
        sentence = sentence[1:] + next_char

        # print results to screen
        sys.stdout.write(next_char)
        sys.stdout.flush()
    print

Next, we define a system for Keras to save our model's parameters to a local file after each epoch where it achieves an improvement in the overall loss. This will allow us to reuse the trained model at a later time without having to retrain it from scratch. This is useful for recovering models incase your computer crashes, or you want to stop the training early.


In [11]:
filepath="-basic_LSTM.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='loss', verbose=0, save_best_only=True, mode='min')
callbacks_list = [checkpoint]

Now we are finally ready to train the model. We want to train the model over 50 epochs, but we also want to output some generated text after each epoch to see how our model is doing.

To do this we create our own loop to iterate over each epoch. Within the loop we first train the model for one epoch. Since all parameters are stored within the model, training one epoch at a time has the same exact effect as training over a longer series of epochs. We also use the model's validation_split parameter to tell Keras to automatically split the data into 80% training data and 20% test data for validation. Remember to always shuffle your data if you will be using validation!

After each epoch is trained, we use the raw_text data to extract a new sequence of 100 characters as the 'seed' for our generated text. Finally, we use our generate() helper function to generate text using two different diversity settings.

Warning: because of their large depth (remember that an RNN trained on a 100 long sequence effectively has 100 layers!), these networks typically take a much longer time to train than traditional multi-layer ANN's and CNN's. You shoud expect these models to train overnight on the virtual machine, but you should be able to see enough progress after the first few epochs to know if it is worth it to train a model to the end. For more complex RNN models with larger data sets in your own work, you should consider a native installation, along with a dedicated GPU if possible.


In [12]:
epochs = 50
prediction_length = 100

for iteration in range(epochs):
    
    print 'epoch:', iteration + 1, '/', epochs
    model.fit(X, y, validation_split=0.2, batch_size=256, nb_epoch=1, callbacks=callbacks_list)
    
    # get random starting point for seed
    start_index = random.randint(0, len(raw_text) - seq_length - 1)
    # extract seed sequence from raw text
    seed = raw_text[start_index: start_index + seq_length]
    
    print '----- generating with seed:', seed
    
    for diversity in [0.5, 1.2]:
        generate(seed, prediction_length, diversity)


epoch: 1 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 49s - loss: 3.1911 - val_loss: 2.9652
----- generating with seed: d a major role. in the past, differences in pay between corporate executives and their workers were 
----- diversity: 0.5
d a major role. in the past, differences in pay between corporate executives and their workers were  u t ien  ea oaamtl eto  tnnatces nt rh vo etasini nit we eie in or c  duaitan t  iea  tr antetsde e
----- diversity: 1.2
d a major role. in the past, differences in pay between corporate executives and their workers were mt l0s tlsn.eakcg  rm
teugt nti9cd, ttoiooa ak t rla ;c8a  rfnrauercrtgdikeh0k5dshbhssteniw--;nsttto
epoch: 2 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 3.0172 - val_loss: 2.9324
----- generating with seed: uard against systemic failure and ensure fair competition.

post-crisis reforms to wall street have 
----- diversity: 0.5
uard against systemic failure and ensure fair competition.

post-crisis reforms to wall street have e  ancaintit aenesa   eaa  oae t e gsat  s se res i ic i ratt lan nfa  cst e ton    rroaasgso nn ine
----- diversity: 1.2
uard against systemic failure and ensure fair competition.

post-crisis reforms to wall street have ngrpernunivatps6 nesrkpint8 nr u t9igiroe8  ihgmton fko uotnytwdush s beer r sdtfsfhipfbi lfs 
 eyia
epoch: 3 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 49s - loss: 2.9633 - val_loss: 2.8812
----- generating with seed: e presidency is a relay race, requiring each of us to do our part to bring the country closer to its
----- diversity: 0.5
e presidency is a relay race, requiring each of us to do our part to bring the country closer to itsps ala les  aot   re   iliiti tt  ntnet igtor a  ain t roatpe  oce eth  en ec ent es  he oon tarl in
----- diversity: 1.2
e presidency is a relay race, requiring each of us to do our part to bring the country closer to itsod nwec e;exait  rwansr qahgten-  vreygllcivuorscoohk.iel.1eeedie t jt rse,h2tysto a d rrdtshootetcc
epoch: 4 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 2.8845 - val_loss: 2.7754
----- generating with seed: ers see chart 3. in 1953, just 3 of men between 25 and 54 years old were out of the labour force. to
----- diversity: 0.5
ers see chart 3. in 1953, just 3 of men between 25 and 54 years old were out of the labour force. toniu  cate in ctocte onorat t oe tao ce rta tte fin oed rreh orre iho  or yte rorbd te  oane tall eo 
----- diversity: 1.2
ers see chart 3. in 1953, just 3 of men between 25 and 54 years old were out of the labour force. toia  sornirq ioveripaodiuuiuf upxinsintpe r 3eneriresuorlevlis,xobawas .rl orhle. crmsdl h bmwe hpfo 
epoch: 5 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.7754 - val_loss: 2.6615
----- generating with seed: ont-loaded fiscal stimulus than even president roosevelts new deal and oversaw the most comprehensiv
----- diversity: 0.5
ont-loaded fiscal stimulus than even president roosevelts new deal and oversaw the most comprehensivs at enuor hee tore te anes on pas seit an mih fcer con nolenh oas rethe art se toren an nore inure 
----- diversity: 1.2
ont-loaded fiscal stimulus than even president roosevelts new deal and oversaw the most comprehensivthemn rsd sniler tuwcrcaunr howod-sine-retesetmmetyewgiyarny- rgds rlyotc aws:almitcaoraltousmi lhdn
epoch: 6 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 2.6753 - val_loss: 2.5639
----- generating with seed:  development. policies focused on education are critical both for increasing economic growth and for
----- diversity: 0.5
 development. policies focused on education are critical both for increasing economic growth and fore res as re aas tore an e sond pae ion ag the bus get tee tos re oor ao ins er eleane tiat anoute ao
----- diversity: 1.2
 development. policies focused on education are critical both for increasing economic growth and forcof ton  ans
teo dhe oooms cs anpetors ompositre re7 bthuai1w teyadlltinl anrulit i6t p vtiscs baad 
epoch: 7 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.5934 - val_loss: 2.4902
----- generating with seed: bilising our economy. unfortunately, good economics can be overridden by bad politics. my administra
----- diversity: 0.5
bilising our economy. unfortunately, good economics can be overridden by bad politics. my administras noros the antine ind an an on har ble aocide touuican shas to anl the ar ono al ehe toal ind uceit
----- diversity: 1.2
bilising our economy. unfortunately, good economics can be overridden by bad politics. my administrait bwthopt-ing wos lte o te fogtec man yocbed.s vmhoalpbemeomun:e9oacme mverl2d y male, eldasu domi 
epoch: 8 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.5261 - val_loss: 2.4433
----- generating with seed: embraced a crude populism that promises a return to a past that is not possible to restoreand that, 
----- diversity: 0.5
embraced a crude populism that promises a return to a past that is not possible to restoreand that, the toall sonle sade the thar canse inte ins merer the and tina te femer inl the ter fome poines an 
----- diversity: 1.2
embraced a crude populism that promises a return to a past that is not possible to restoreand that, ian  begtdaisr ipte lectse d1s fomas ar unncamtremi aetmict;eccfn lh oi8lenur am olgc?l urteot toed 
epoch: 9 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.4764 - val_loss: 2.4013
----- generating with seed:  states. in 1979, the top 1 of american families received 7 of all after-tax income. by 2007, that s
----- diversity: 0.5
 states. in 1979, the top 1 of american families received 7 of all after-tax income. by 2007, that sare an inthe se the acoremcesting on reecthe tins be wole eand ce the the oar inas and were that ant
----- diversity: 1.2
 states. in 1979, the top 1 of american families received 7 of all after-tax income. by 2007, that snmextirb romem1wis ac korcey ans cesuresoke so?e poruee inry mhinougtacsgrtesicr pusgower asc ug fca
epoch: 10 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 2.4384 - val_loss: 2.3732
----- generating with seed: econd, alongside slowing productivity, inequality has risen in most advanced economies, with that in
----- diversity: 0.5
econd, alongside slowing productivity, inequality has risen in most advanced economies, with that inge the be that an ar aat ees meres one pond the tre the thatle thes iolo tan ancomem the porlete the
----- diversity: 1.2
econd, alongside slowing productivity, inequality has risen in most advanced economies, with that ingdwcinsantonc thl. ;ithaliwyachat oand mheltrereo e7ejlbove goses,dinn a9st in., tose wer orn thor e
epoch: 11 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.4109 - val_loss: 2.3457
----- generating with seed: , at their childrens schools, in civic organisations. thats why ceos took home about 20- to 30-times
----- diversity: 0.5
, at their childrens schools, in civic organisations. thats why ceos took home about 20- to 30-times pand tha s are the in the onoun alis iul eve py tho thes to to the ar the engroat . be oulina se th
----- diversity: 1.2
, at their childrens schools, in civic organisations. thats why ceos took home about 20- to 30-timesim smangy 8erion wot ur ristg iasyb a adac in sanmwnndlatireedabicte thmics yocfelsn bt,wurusgfemeve
epoch: 12 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.3820 - val_loss: 2.3173
----- generating with seed: d a pervasive sense of injustice undermines peoples faith in the system. without trust, capitalism a
----- diversity: 0.5
d a pervasive sense of injustice undermines peoples faith in the system. without trust, capitalism and ane pore the the deald pant ecato th ald crest at ar fer the les wor mere the the the pall ins co
----- diversity: 1.2
d a pervasive sense of injustice undermines peoples faith in the system. without trust, capitalism ay anqudtilst,
inb y9taly ooswbmelbemor
inessartowlhbl jlthint riyenit yhatoani
s
ndoptg le n bcen 3a
epoch: 13 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.3523 - val_loss: 2.2957
----- generating with seed: not continue to deliver the gains they have delivered in the past centuries.

this paradox of progre
----- diversity: 0.5
not continue to deliver the gains they have delivered in the past centuries.

this paradox of progre worus on the gar on bere an toe the the tho acreincentint ans rome the to an the forwe the the and 
----- diversity: 1.2
not continue to deliver the gains they have delivered in the past centuries.

this paradox of progreus bcolanmongiun mra0t kaoka ge ubissdutsiunty thesmpofonmin smorld trec acsi ig thss us, movar dy.i
epoch: 14 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.3282 - val_loss: 2.2792
----- generating with seed: rden of stabilising our economy. unfortunately, good economics can be overridden by bad politics. my
----- diversity: 0.5
rden of stabilising our economy. unfortunately, good economics can be overridden by bad politics. my ehte the and perter an thanle ghan 4re ingredes in pretot on mare an re che the d aod the and are a
----- diversity: 1.2
rden of stabilising our economy. unfortunately, good economics can be overridden by bad politics. mytey tad;tuast hat koty. inycwellis cor 8sett, sadms buredee theom eunane wej
wthupid f3.t porneorime
epoch: 15 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 49s - loss: 2.3039 - val_loss: 2.2613
----- generating with seed: a resilient economy thats primed for future growth.

restoring economic dynamism

first, in recent y
----- diversity: 0.5
a resilient economy thats primed for future growth.

restoring economic dynamism

first, in recent y beas of incers and the onores mase tho n3es and the poreche store te the and the mevereng in rapre 
----- diversity: 1.2
a resilient economy thats primed for future growth.

restoring economic dynamism

first, in recent youd gboutr-stoeses of damunedt, np puriorysaled se thloden?niny th cl;auning pureincltakisgwand fyei
epoch: 16 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.2796 - val_loss: 2.2434
----- generating with seed: t yet substantially boosted measured productivity growth. over the past decade, america has enjoyed 
----- diversity: 0.5
t yet substantially boosted measured productivity growth. over the past decade, america has enjoyed th of the iang constere cor wer ther of the patins for end the for the wore fore the to and in fores
----- diversity: 1.2
t yet substantially boosted measured productivity growth. over the past decade, america has enjoyed mus  e0ssumg fote saorl ar1-attore veat abpedeicmmpowt0psy volm atur theh ls, soog the ofy palles qe
epoch: 17 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.2514 - val_loss: 2.2264
----- generating with seed:  successful when we close the gap between rich and poor and growth is broadly based. this is not jus
----- diversity: 0.5
 successful when we close the gap between rich and poor and growth is broadly based. this is not just mere corest has the ins the the hand withe the the for the patistiris anoum arecand the and hald t
----- diversity: 1.2
 successful when we close the gap between rich and poor and growth is broadly based. this is not just sasshing, rasine oho copuhlideve wain ogioncenes, eoucad west t. then sanst. ale9wplt ooviwonnses,
epoch: 18 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.2271 - val_loss: 2.2092
----- generating with seed: ed a major role. in the past, differences in pay between corporate executives and their workers were
----- diversity: 0.5
ed a major role. in the past, differences in pay between corporate executives and their workers were mhes ee pore thes enangures enomer bate mor worle singe and decint om the the tha ge the be of arl 
----- diversity: 1.2
ed a major role. in the past, differences in pay between corporate executives and their workers were mgr-egfant5 ie omvilacs, ouse buidlscerstwets sante thab fesktamed to,ry onxincom arcamise pnontt b
epoch: 19 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 2.2159 - val_loss: 2.1989
----- generating with seed: e and an associated increase in overdose deaths and suicides among non-college-educated americansthe
----- diversity: 0.5
e and an associated increase in overdose deaths and suicides among non-college-educated americansthe purcon the enould as in wilh ald the enon grome the beatd and withe and un be that ad precone the w
----- diversity: 1.2
e and an associated increase in overdose deaths and suicides among non-college-educated americansthe thwheu?dod ghave, atky.
tuhe int fe waobd sonc-uuds,e,abd graticat foadingm weredecence ufaru pamir
epoch: 20 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.1946 - val_loss: 2.1838
----- generating with seed: s climate agreement, which presents the best opportunity to save the planet for future generations.

----- diversity: 0.5
s climate agreement, which presents the best opportunity to save the planet for future generations.

at on encertions cortereis and in in oul sy sow yel on elomes and mor the porting co are red predit
----- diversity: 1.2
s climate agreement, which presents the best opportunity to save the planet for future generations.

heveutheve, t7e o0s the motgerelg, lesive eigeht py auntiop bedasuntcaled ontorresssess catins inss
epoch: 21 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 2.1750 - val_loss: 2.1690
----- generating with seed: f prime-age women were out of the labour force. today, it is 26. people joining or rejoining the wor
----- diversity: 0.5
f prime-age women were out of the labour force. today, it is 26. people joining or rejoining the worl the at on thes and bevere sor corumes son the and the that the pald tha eroress the pranting are a
----- diversity: 1.2
f prime-age women were out of the labour force. today, it is 26. people joining or rejoining the wornlo6 by oreioutime sabse fofinmeasion m ionreipabge worgemert encolevatt thac pr orvearidy s cantiri
epoch: 22 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.1525 - val_loss: 2.1644
----- generating with seed: he income distribution by 18 by 2017, while raising the average tax rates on households projected to
----- diversity: 0.5
he income distribution by 18 by 2017, while raising the average tax rates on households projected to enout the catrecand the efor ace andaticave tat  hevex the ser of the wor the the bees fardes and t
----- diversity: 1.2
he income distribution by 18 by 2017, while raising the average tax rates on households projected tok he werd cencutsenfdim -palbly ial fhcf the pwfsrt9uitt
 andetacoato dhiss  o bicedes at hupp t wt 
epoch: 23 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.1331 - val_loss: 2.1453
----- generating with seed: es, can fail. this can happen through the tendency towards monopoly and rent-seeking that this newsp
----- diversity: 0.5
es, can fail. this can happen through the tendency towards monopoly and rent-seeking that this newsprest of and resturt the prowend are wore of ore the the on alle tale and the are ard and bicing ol l
----- diversity: 1.2
es, can fail. this can happen through the tendency towards monopoly and rent-seeking that this newspebtepriqt aid focd anidy 1agsouis coptoly io ingrurine thatts npapeonicajs gforte monsiin on wutrome
epoch: 24 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.1223 - val_loss: 2.1453
----- generating with seed: ght embraced a crude populism that promises a return to a past that is not possible to restoreand th
----- diversity: 0.5
ght embraced a crude populism that promises a return to a past that is not possible to restoreand the aredtins and insumeros manl ins oul prolise for buticisy and seconge to ens concessent dowists con
----- diversity: 1.2
ght embraced a crude populism that promises a return to a past that is not possible to restoreand th ceding thit b-enutile o? preses1ng, dicilicat cald coty. the fous thingncen aund;li and iulllyasims
epoch: 25 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.1058 - val_loss: 2.1256
----- generating with seed: ning a historic debt default. my successors should not have to fight for emergency measures in a tim
----- diversity: 0.5
ning a historic debt default. my successors should not have to fight for emergency measures in a time and the anitimaning walle sulit. the the anated more the werbe and the port the that bat oum to pr
----- diversity: 1.2
ning a historic debt default. my successors should not have to fight for emergency measures in a tim 12;, th. lsturom r6s anfretigst ysresticaun cutsoon siwe in cienecegol wifilte in niblive of thesua
epoch: 26 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.0836 - val_loss: 2.1280
----- generating with seed: s we have made, instead choosing to condemn the system as a whole. americans should debate how best 
----- diversity: 0.5
s we have made, instead choosing to condemn the system as a whole. americans should debate how best at an andion eroule cores in the and enoures and wererage the enomeng thes and wore of poldica averi
----- diversity: 1.2
s we have made, instead choosing to condemn the system as a whole. americans should debate how best s0iicetis treas is nos ene ilirliginansy heowh to tae  fonleqdetpis tavides and mand iopmonting in i
epoch: 27 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.0705 - val_loss: 2.1211
----- generating with seed: t vote to leave the european union and the rise of populist parties around the world.

much of this 
----- diversity: 0.5
t vote to leave the european union and the rise of populist parties around the world.

much of this anconens on werk for manis-alles bo work ho the pare se pao the the pors at e hall giso eatiin and a
----- diversity: 1.2
t vote to leave the european union and the rise of populist parties around the world.

much of this igwertemataf tixsulalntr- rim oner-gauty og micoas ud lestn leme. hagmore more farce the avinm an fr
epoch: 28 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.0508 - val_loss: 2.1099
----- generating with seed: aped by the few and unaccountable to the many is a threat to all. economies are more successful when
----- diversity: 0.5
aped by the few and unaccountable to the many is a threat to all. economies are more successful when the res more sound to ping to eat enstores and that that of rewary reserthe so wer shatte res const
----- diversity: 1.2
aped by the few and unaccountable to the many is a threat to all. economies are more successful whend alluden bet calst gzexpardaty fore indyss-amc es2ryetn apsous eri he s.


bnutr th gser move colow
epoch: 29 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.0316 - val_loss: 2.1042
----- generating with seed: ti-muslim and anti-refugee sentiment expressed by some americans today echoes nativist lurches of th
----- diversity: 0.5
ti-muslim and anti-refugee sentiment expressed by some americans today echoes nativist lurches of the one beitimitical hance angerele ans ale consese tho word the soced consticing in chastist and mare
----- diversity: 1.2
ti-muslim and anti-refugee sentiment expressed by some americans today echoes nativist lurches of the wemiad nome,enibles untertaudt dhpberernis se sfromuss non urd ald alnist. bleulof car averebcover
epoch: 30 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 2.0179 - val_loss: 2.0924
----- generating with seed: 1999, 23 of prime-age women were out of the labour force. today, it is 26. people joining or rejoini
----- diversity: 0.5
1999, 23 of prime-age women were out of the labour force. today, it is 26. people joining or rejoining and the bating pising to ofat the for and and the pore to eras and in cheteer for are rise of the
----- diversity: 1.2
1999, 23 of prime-age women were out of the labour force. today, it is 26. people joining or rejoinif sagtais incellyublis for afd aun eot ens.

eibo ens yequout oe grfaclisicin puoy the. eesonilisati
epoch: 31 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.9961 - val_loss: 2.0963
----- generating with seed:  guard against systemic failure and ensure fair competition.

post-crisis reforms to wall street hav
----- diversity: 0.5
 guard against systemic failure and ensure fair competition.

post-crisis reforms to wall street have woul prothes to mere mest and recous far wer seveled be past aud alpares fur the that patiting pat
----- diversity: 1.2
 guard against systemic failure and ensure fair competition.

post-crisis reforms to wall street have foaty calr americho ge6ert ghatd wh fo tarde woll4. takencishew ble wgreghafurd thalnmeruncon witr
epoch: 32 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 1.9769 - val_loss: 2.0860
----- generating with seed: ild in a slum can see the skyscraper nearby, technology allows anyone with a smartphone to see how t
----- diversity: 0.5
ild in a slum can see the skyscraper nearby, technology allows anyone with a smartphone to see how the that ange parte the tho pecono is ast wor for congen des porting and rowe some and to senald and 
----- diversity: 1.2
ild in a slum can see the skyscraper nearby, technology allows anyone with a smartphone to see how thee ddapl2, thead e amingaity adlaicito duld wapr zenfthon sati7s solle mlstty.bum ragyeall the etha
epoch: 33 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.9642 - val_loss: 2.0892
----- generating with seed: red the need for a more resilient economy, one that grows sustainably without plundering the future 
----- diversity: 0.5
red the need for a more resilient economy, one that grows sustainably without plundering the future that insore sortore the best the porstull and andelisens more and thiting to tor goun the incoule be
----- diversity: 1.2
red the need for a more resilient economy, one that grows sustainably without plundering the future oo thax njo thd 1ve 10tn,kono sg otat lovto-neucd biledto morid alling ua upder-ianty ivitry feis fo
epoch: 34 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.9463 - val_loss: 2.0769
----- generating with seed: estments in basic research and development. policies focused on education are critical both for incr
----- diversity: 0.5
estments in basic research and development. policies focused on education are critical both for incracing and and economy cand in sour enored pporitit insungeration and with and ardecting to indering 
----- diversity: 1.2
estments in basic research and development. policies focused on education are critical both for incr hatted., lte fonden blin, yuale wath ecoqhialse gnow the dave dotute-ars bo tqeine fyeceo. cbyfect 
epoch: 35 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.9295 - val_loss: 2.0837
----- generating with seed: technological innovation suddenly developed a strain of anti-immigrant, anti-innovation protectionis
----- diversity: 0.5
technological innovation suddenly developed a strain of anti-immigrant, anti-innovation protectioniss and actof and ian encond in the for for eroum to part deal hes tores. the and and ater of probe at
----- diversity: 1.2
technological innovation suddenly developed a strain of anti-immigrant, anti-innovation protectionis, ay anes, tho fver were of tham qasexing-fust the cenoud ivse incon. alure to dveuncanp that mutina
epoch: 36 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.9121 - val_loss: 2.0814
----- generating with seed:  the isolation of corporations and elites, who often seem to live by a different set of rules to ord
----- diversity: 0.5
 the isolation of corporations and elites, who often seem to live by a different set of rules to ord es for and for the leve conte in the fat nom thes in the 19o0, wo les arengerenss in ameromant dong
----- diversity: 1.2
 the isolation of corporations and elites, who often seem to live by a different set of rules to ordtr fretare; aswruned petubiog apabe reriest proiecins.t ef aprenuwing thalb-acleqicenttimy toahersta
epoch: 37 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.9014 - val_loss: 2.0771
----- generating with seed:  the poverty rate fell faster than at any point since the 1960s. wages have risen faster in real ter
----- diversity: 0.5
 the poverty rate fell faster than at any point since the 1960s. wages have risen faster in real ter jot thal gresing of heas rese and the erofian a bored fore to chonss and were to and conoment intor
----- diversity: 1.2
 the poverty rate fell faster than at any point since the 1960s. wages have risen faster in real terest. be-menades, l ancouncid wats mhouc handf-alejos to eulipn ir chpaidrcca nac wente imated and bu
epoch: 38 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.8771 - val_loss: 2.0747
----- generating with seed:  only seemed to increase the isolation of corporations and elites, who often seem to live by a diffe
----- diversity: 0.5
 only seemed to increase the isolation of corporations and elites, who often seem to live by a differtee the dusticing the part cons rese the pablit. the botens cens mere the for coreans the portingt 
----- diversity: 1.2
 only seemed to increase the isolation of corporations and elites, who often seem to live by a diffever
;wn fof th in in ast hame tha borting dumanite bxighas ard, bihumqreacond dfpatdernd8 ac acdabre
epoch: 39 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 1.8533 - val_loss: 2.0765
----- generating with seed: working students, and ensuring men and women get equal pay for equal work would help to move us in t
----- diversity: 0.5
working students, and ensuring men and women get equal pay for equal work would help to move us in the in for and on comeles mer aal son mees and and in the in ous prowing of the por 1979,  e porkes s
----- diversity: 1.2
working students, and ensuring men and women get equal pay for equal work would help to move us in the wale sicredatinet 4unt, bnist f-ormiss of oun wgrees privitict fnis-mfoudt ed aot centsudit, bula
epoch: 40 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.8425 - val_loss: 2.0788
----- generating with seed: any other nations because we are convinced that with hard work, we can improve our own station and w
----- diversity: 0.5
any other nations because we are convinced that with hard work, we can improve our own station and with hes of the recont of reaer stor ald and at to the the sed to the stor that pating in the batith 
----- diversity: 1.2
any other nations because we are convinced that with hard work, we can improve our own station and we hethaitss add incoalislet thas geonisei, thtimass votufarlatsanot, atkerpeo, gyowit eavlotgy, dami
epoch: 41 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 1.8233 - val_loss: 2.0799
----- generating with seed: in legitimate concerns about long-term economic forces. decades of declining productivity growth and
----- diversity: 0.5
in legitimate concerns about long-term economic forces. decades of declining productivity growth and rest porther for cansele the prosint a more ur ard a our dector and the eraties and in and in the r
----- diversity: 1.2
in legitimate concerns about long-term economic forces. decades of declining productivity growth and pard uen yempiand-fir shides. 
endrkteasd gepldyines awsout, the tant edulit sam rovely:? ge peop w
epoch: 42 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 1.8194 - val_loss: 2.0726
----- generating with seed: america also helped catalyse the historic paris climate agreement, which presents the best opportuni
----- diversity: 0.5
america also helped catalyse the historic paris climate agreement, which presents the best opportunits and the hearen and ande longert ensement and acenome that pating arde the rowe not were conteme t
----- diversity: 1.2
america also helped catalyse the historic paris climate agreement, which presents the best opportunimn on the pypyt-imere pulduliol  istrecunos, t ate.

l 20.t ddbitibxeaticinhss, prercessticm s1ftems
epoch: 43 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.7932 - val_loss: 2.0821
----- generating with seed: ing future downturns; monetary policy should not bear the full burden of stabilising our economy. un
----- diversity: 0.5
ing future downturns; monetary policy should not bear the full burden of stabilising our economy. undings and andering the more thes be and inancanting the pare the panting the fore the the past to ma
----- diversity: 1.2
ing future downturns; monetary policy should not bear the full burden of stabilising our economy. uncous simonce on work bmitielturysus mate nf bivemud nostectonithad worldonnec storcob tha amreccwpnt
epoch: 44 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 1.7581 - val_loss: 2.0930
----- generating with seed: g a sturdier foundation
finally, the financial crisis painfully underscored the need for a more resi
----- diversity: 0.5
g a sturdier foundation
finally, the financial crisis painfully underscored the need for a more resinteritions to ever jon the pares an the enorecen the a porger the the bestores and to second ind chi
----- diversity: 1.2
g a sturdier foundation
finally, the financial crisis painfully underscored the need for a more resinnaming purthitifit of aad warkofxple toisd the ,eocf innderenanse thiy nines ridex worges 3iond..


epoch: 45 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 1.7650 - val_loss: 2.0947
----- generating with seed: unt the impact of their decisions on others through pollution, the ways in which disparities of info
----- diversity: 0.5
unt the impact of their decisions on others through pollution, the ways in which disparities of inforcent perediting thein sounced for on econome and betion and dilincest ondering the tore to mere par
----- diversity: 1.2
unt the impact of their decisions on others through pollution, the ways in which disparities of inforconty tualalnimalt, sreucus fol furantit-ngalcavixgomthod the indrolitiest achint averungite;, cotn
epoch: 46 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.7479 - val_loss: 2.0889
----- generating with seed: advanced economies see chart 1. without a faster-growing economy, we will not be able to generate th
----- diversity: 0.5
advanced economies see chart 1. without a faster-growing economy, we will not be able to generate the more the tor and the pars the fon rate reat efor whald bete lont the proster the for the aconom of
----- diversity: 1.2
advanced economies see chart 1. without a faster-growing economy, we will not be able to generate thiam m hiching on wapd the hald howl whers sur ald biencect orid baxpuling.

thith oy prout no gower 
epoch: 47 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 1.7343 - val_loss: 2.0940
----- generating with seed:  americans can get ahead requires addressing four major structural challenges: boosting productivity
----- diversity: 0.5
 americans can get ahead requires addressing four major structural challenges: boosting productivity to the ande to for sulice for echarnes hard wert and ansentren ins incoled concres and and and inco
----- diversity: 1.2
 americans can get ahead requires addressing four major structural challenges: boosting productivity- ror worstr canit eforce sfaxuclisg overdecind the incore farp chalke wo mpragians worthe ther ox a
epoch: 48 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 48s - loss: 1.7211 - val_loss: 2.0878
----- generating with seed: ual chance to get rich with everybody else. thats the problem with increased inequalityit diminishes
----- diversity: 0.5
ual chance to get rich with everybody else. thats the problem with increased inequalityit diminishes the s ore to to in prodicis for wer of allorest and estine thay presing to epraly the enorteden sto
----- diversity: 1.2
ual chance to get rich with everybody else. thats the problem with increased inequalityit diminishest fot mobreuestenotpy sonican ase prices avermabtiinsafiiss.

fourcc bauta9ghit ancome tirequoricy o
epoch: 49 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.7081 - val_loss: 2.0929
----- generating with seed:  through the internet, mobile broadband and devices, artificial intelligence, robotics, advanced mat
----- diversity: 0.5
 through the internet, mobile broadband and devices, artificial intelligence, robotics, advanced matiting for and ave the enotse for erecode in and and thete the at and candest and the poot of for ans
----- diversity: 1.2
 through the internet, mobile broadband and devices, artificial intelligence, robotics, advanced mate2 gr owk citseall holk ressedecy fthe intorannings chenram dissctem fisre suvita. unsert and angert
epoch: 50 / 50
Train on 14569 samples, validate on 3643 samples
Epoch 1/1
14569/14569 [==============================] - 47s - loss: 1.6817 - val_loss: 2.1137
----- generating with seed: , much of it fanned by politicians who would actually make the problem worse rather than better, it 
----- diversity: 0.5
, much of it fanned by politicians who would actually make the problem worse rather than better, it ande to the bate mout no dable to the porsting and the ardeting to ainits and the enowing or wishoul
----- diversity: 1.2
, much of it fanned by politicians who would actually make the problem worse rather than better, it houds, kth fur rhacm enaricg, liset depboys, the idenomees mors prosedris cartony, by wehoxsempetire

That looks pretty good! You can see that the RNN has learned alot of the linguistic structure of the original writing, including typical length for words, where to put spaces, and basic punctuation with commas and periods. Many words are still misspelled but seem almost reasonable, and it is pretty amazing that it is able to learn this much in only 50 epochs of training.

You can see that the loss is still going down after 50 epochs, so the model can definitely benefit from longer training. If you're curious you can try to train for more epochs, but as the error decreases be careful to monitor the output to make sure that the model is not overfitting. As with other neural network models, you can monitor the difference between training and validation loss to see if overfitting might be occuring. In this case, since we're using the model to generate new information, we can also get a sense of overfitting from the material it generates.

A good indication of overfitting is if the model outputs exactly what is in the original text given a seed from the text, but jibberish if given a seed that is not in the original text. Remember we don't want the model to learn how to reproduce exactly the original text, but to learn its style to be able to generate new text. As with other models, regularization methods such as dropout and limiting model complexity can be used to avoid the problem of overfitting.

Finally, let's save our training data and character to integer mapping dictionaries to an external file so we can reuse it with the model at a later time.


In [13]:
pickle_file = '-basic_data.pickle'

try:
    f = open(pickle_file, 'wb')
    save = {
        'X': X,
        'y': y,
        'int_to_char': int_to_char,
        'char_to_int': char_to_int,
    }
    pickle.dump(save, f, pickle.HIGHEST_PROTOCOL)
    f.close()
except Exception as e:
    print 'Unable to save data to', pickle_file, ':', e
    raise
    
statinfo = os.stat(pickle_file)
print 'Saved data to', pickle_file
print 'Compressed pickle size:', statinfo.st_size


Saved data to -basic_data.pickle
Compressed pickle size: 80934860