Lab 6.2 - Using a pre-trained model with Keras

In this section of the lab, we will load the model we trained in the previous section, along with the training data and mapping dictionaries, and use it to generate longer sequences of text.

Let's start by importing the libraries we will be using:



In [8]:

    
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import LSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

import sys
import re
import pickle

Next, we will import the data we saved previously using the pickle library.



In [9]:

    
pickle_file = '-basic_data.pickle'

with open(pickle_file, 'rb') as f:
    save = pickle.load(f)
    X = save['X']
    y = save['y']
    char_to_int = save['char_to_int']  
    int_to_char = save['int_to_char']    
    del save  # hint to help gc free up memory
    print('Training set', X.shape, y.shape)









    



('Training set', (18212, 100, 44), (18212, 44))

Now we need to define the Keras model. Since we will be loading parameters from a pre-trained model, this needs to match exactly the definition from the previous lab section. The only difference is that we will comment out the dropout layer so that the model uses all the hidden neurons when doing the predictions.



In [10]:

    
# define the LSTM model
model = Sequential()
model.add(LSTM(128, return_sequences=False, input_shape=(X.shape[1], X.shape[2])))
# model.add(Dropout(0.50))
model.add(Dense(y.shape[1], activation='softmax'))

Next we will load the parameters from the model we trained previously, and compile it with the same loss and optimizer function.



In [11]:

    
# load the parameters from the pretrained model
filename = "-basic_LSTM.hdf5"
model.load_weights(filename)
model.compile(loss='categorical_crossentropy', optimizer='adam')

We also need to rewrite the sample() and generate() helper functions so that we can use them in our code:



In [12]:

    
def sample(preds, temperature=1.0):
    preds = np.asarray(preds).astype('float64')
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)



In [13]:

    
def generate(sentence, sample_length=50, diversity=0.35):
    generated = sentence
    sys.stdout.write(generated)

    for i in range(sample_length):
        x = np.zeros((1, X.shape[1], X.shape[2]))
        for t, char in enumerate(sentence):
            x[0, t, char_to_int[char]] = 1.

        preds = model.predict(x, verbose=0)[0]
        next_index = sample(preds, diversity)
        next_char = int_to_char[next_index]

        generated += next_char
        sentence = sentence[1:] + next_char

        sys.stdout.write(next_char)
        sys.stdout.flush()
    print

Now we can use the generate() function to generate text of any length based on our imported pre-trained model and a seed text of our choice. For best result, the length of the seed text should be the same as the length of training sequences (100 in the previous lab section).

In this case, we will test the overfitting of the model by supplying it two seeds:

one which comes verbatim from the training text, and
one which comes from another earlier speech by Obama

If the model has not overfit our training data, we should expect it to produce reasonable results for both seeds. If it has overfit, it might produce pretty good results for something coming directly from the training set, but perform poorly on a new seed. This means that it has learned to replicate our training text, but cannot generalize to produce text based on other inputs. Since the original article was very short, however, the entire vocabulary of the model might be very limited, which is why as input we use a part of another speech given by Obama, instead of completely random text.

Since we have not trained the model for that long, we will also use a lower temperature to get the model to generate more accurate if less diverse results. Try running the code a few times with different temperature settings to generate different results.



In [14]:

    
prediction_length = 500
seed_from_text = "america has shown that progress is possible. last year, income gains were larger for households at t"
seed_original = "and as people around the world began to hear the tale of the lowly colonists who overthrew an empire"

for seed in [seed_from_text, seed_original]:
    generate(seed, prediction_length, .50)
    print "-" * 20









    



america has shown that progress is possible. last year, income gains were larger for households at t arlisand and atition ard fureming the reand inithand and the fares and the banted. andering the alesecins wit devermestis and wound tote ard on the fot malice for and sy the leat enor eore are in the for the gat be a dereate the tha tore singeris. 
ficers the past thay meden the fitalis and batiting and soriter ad the potstur and mise and and of pore and anderita so ge winn maie the enconte in the perssed the provanis more the past bathe paot comproanded in the for too ay on, redsting in ald an
--------------------
and as people around the world began to hear the tale of the lowly colonists who overthrew an empirence of in artering in the porstithen the for incalisis tion marise the eore at our hort rest bat andere the perstition probution to the fand more rase conseres en the adconesting the arkers, the thale for inseran y and towheng the parting our heratid as the wang, sered and coundess and censere that the andering the agrerens the expenting aino fingrition, by aidivitevend more and tox the mexstingsution has  for and the pass th abee to progress of the pable the lovers the mege in of arlingering an
--------------------