Deep Learning

Lab Session 3 - 3 Hours

Long Short Term Memory (LSTM) for Language Modeling

Student 1: CANALE Student 2: ELLENA

In this Lab Session, you will build and train a Recurrent Neural Network, based on Long Short-Term Memory (LSTM) units for next word prediction task.

Answers and experiments should be made by groups of one or two students. Each group should fill and run appropriate notebook cells. Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an pdf document using print as PDF (Ctrl+P). Do not forget to run all your cells before generating your final report and do not forget to include the names of all participants in the group. The lab session should be completed by June 9th 2017.

Send you pdf file to benoit.huet@eurecom.fr and olfa.ben-ahmed@eurecom.fr using [DeepLearning_lab3] as Subject of your email.

Introduction

You will train a LSTM to predict the next word using a sample short story. The LSTM will learn to predict the next item of a sentence from the 3 previous items (given as input). Ponctuation marks are considered as dictionary items so they can be predicted too. Figure 1 shows the LSTM and the process of next word prediction.

Each word (and ponctuation) from text sentences is encoded by a unique integer. The integer value corresponds to the index of the corresponding word (or punctuation mark) in the dictionnary. The network output is a one-hot-vector indicating the index of the predicted word in the reversed dictionary (Section 1.2). For example if the prediction is 86, the predicted word will be "company".

You will use a sample short story from Aesop’s Fables (http://www.taleswithmorals.com/) to train your model.

"There was once a young Shepherd Boy who tended his sheep at the foot of a mountain near a dark forest.

It was rather lonely for him all day, so he thought upon a plan by which he could get a little company and some excitement. He rushed down towards the village calling out "Wolf, Wolf," and the villagers came out to meet him, and some of them stopped with him for a considerable time. This pleased the boy so much that a few days afterwards he tried the same trick, and again the villagers came to his help. But shortly after this a Wolf actually did come out from the forest, and began to worry the sheep, and the boy of course cried out "Wolf, Wolf," still louder than before. But this time the villagers, who had been fooled twice before, thought the boy was again deceiving them, and nobody stirred to come to his help. So the Wolf made a good meal off the boy's flock, and when the boy complained, the wise man of the village said: "A liar will not be believed, even when he speaks the truth." "</i> </font>.

Start by loading the necessary libraries and resetting the default computational graph. For more details about the rnn packages, we suggest you to take a look at https://www.tensorflow.org/api_guides/python/contrib.rnn



In [2]:

    
import numpy as np
import collections # used to build the dictionary
import random
import time
from time import time
import pickle # may be used to save your model 
import matplotlib.pyplot as plt
#Import Tensorflow and rnn
import tensorflow as tf
from tensorflow.contrib import rnn  

# Target log path
logs_path = 'lstm_words'
writer = tf.summary.FileWriter(logs_path)

Next-word prediction task

Part 1: Data preparation

1.1. Loading data

Load and split the text of our story



In [2]:

    
def load_data(filename):
    with open(filename) as f:
        data = f.readlines()
    data = [x.strip().lower() for x in data]
    data = [data[i].split() for i in range(len(data))]
    data = np.array(data)
    data = np.reshape(data, [-1, ])
    print(data)
    return data

#Run the cell 
train_file ='data/story.txt'
train_data = load_data(train_file)
print("Loaded training data...")
print(len(train_data))









    



['there' 'was' 'once' 'a' 'young' 'shepherd' 'boy' 'who' 'tended' 'his'
 'sheep' 'at' 'the' 'foot' 'of' 'a' 'mountain' 'near' 'a' 'dark' 'forest'
 '.' 'it' 'was' 'rather' 'lonely' 'for' 'him' 'all' 'day' ',' 'so' 'he'
 'thought' 'upon' 'a' 'plan' 'by' 'which' 'he' 'could' 'get' 'a' 'little'
 'company' 'and' 'some' 'excitement' '.' 'he' 'rushed' 'down' 'towards'
 'the' 'village' 'calling' 'out' 'wolf' ',' 'wolf' ',' 'and' 'the'
 'villagers' 'came' 'out' 'to' 'meet' 'him' ',' 'and' 'some' 'of' 'them'
 'stopped' 'with' 'him' 'for' 'a' 'considerable' 'time' '.' 'this'
 'pleased' 'the' 'boy' 'so' 'much' 'that' 'a' 'few' 'days' 'afterwards'
 'he' 'tried' 'the' 'same' 'trick' ',' 'and' 'again' 'the' 'villagers'
 'came' 'to' 'his' 'help' '.' 'but' 'shortly' 'after' 'this' 'a' 'wolf'
 'actually' 'did' 'come' 'out' 'from' 'the' 'forest' ',' 'and' 'began' 'to'
 'worry' 'the' 'sheep,' 'and' 'the' 'boy' 'of' 'course' 'cried' 'out'
 'wolf' ',' 'wolf' ',' 'still' 'louder' 'than' 'before' '.' 'but' 'this'
 'time' 'the' 'villagers' ',' 'who' 'had' 'been' 'fooled' 'twice' 'before'
 ',' 'thought' 'the' 'boy' 'was' 'again' 'deceiving' 'them' ',' 'and'
 'nobody' 'stirred' 'to' 'come' 'to' 'his' 'help' '.' 'so' 'the' 'wolf'
 'made' 'a' 'good' 'meal' 'off' 'the' "boy's" 'flock' ',' 'and' 'when'
 'the' 'boy' 'complained' ',' 'the' 'wise' 'man' 'of' 'the' 'village'
 'said' ':' 'a' 'liar' 'will' 'not' 'be' 'believed' ',' 'even' 'when' 'he'
 'speaks' 'the' 'truth' '.']
Loaded training data...
214

1.2.Symbols encoding

The LSTM input's can only be numbers. A way to convert words (symbols or any items) to numbers is to assign a unique integer to each word. This process is often based on frequency of occurrence for efficient coding purpose.

Here, we define a function to build an indexed word dictionary (word->number). The "build_vocabulary" function builds both:

Dictionary : used for encoding words to numbers for the LSTM inputs
Reverted dictionnary : used for decoding the outputs of the LSTM into words (and punctuation).

For example, in the story above, we have 113 individual words. The "build_vocabulary" function builds a dictionary with the following entries ['the': 0], [',': 1], ['company': 85],...



In [3]:

    
def build_vocabulary(words):
    count = collections.Counter(words).most_common()
    dic= dict()
    for word, _ in count:
        dic[word] = len(dic)

    reverse_dic= dict(zip(dic.values(), dic.keys()))
    return dic, reverse_dic

Run the cell below to display the vocabulary



In [4]:

    
dictionary, reverse_dictionary = build_vocabulary(train_data)
vocabulary_size= len(dictionary) 
print "Dictionary size (Vocabulary size) = ", vocabulary_size
print("\n")
print("Dictionary : \n")
print(dictionary)
print("\n")
print("Reverted Dictionary : \n" )
print(reverse_dictionary)









    



Dictionary size (Vocabulary size) =  113


Dictionary : 

{'all': 32, 'liar': 33, 'help': 17, 'cried': 34, 'course': 35, 'still': 36, 'pleased': 37, 'before': 18, 'excitement': 91, 'deceiving': 38, 'had': 39, 'young': 69, 'actually': 40, 'to': 6, 'villagers': 11, 'shepherd': 41, 'them': 19, 'lonely': 42, 'get': 44, 'dark': 45, 'not': 64, 'day': 47, 'did': 48, 'calling': 49, 'twice': 50, 'good': 51, 'stopped': 52, 'truth': 53, 'meal': 54, 'sheep,': 55, 'some': 20, 'tended': 56, 'louder': 57, 'flock': 58, 'out': 9, 'even': 59, 'trick': 60, 'said': 61, 'for': 21, 'be': 62, 'after': 63, 'come': 22, 'by': 65, 'boy': 7, 'of': 10, 'could': 66, 'days': 67, 'wolf': 5, 'afterwards': 68, ',': 1, 'down': 70, 'village': 23, 'sheep': 72, 'little': 73, 'from': 74, 'rushed': 75, 'there': 76, 'been': 77, '.': 4, 'few': 78, 'much': 79, "boy's": 80, ':': 81, 'was': 12, 'a': 2, 'him': 13, 'that': 83, 'company': 84, 'nobody': 85, 'but': 24, 'fooled': 86, 'with': 87, 'than': 43, 'he': 8, 'made': 89, 'wise': 90, 'this': 14, 'will': 71, 'near': 92, 'believed': 93, 'meet': 94, 'and': 3, 'it': 95, 'his': 15, 'at': 96, 'worry': 97, 'again': 25, 'considerable': 88, 'rather': 98, 'began': 99, 'when': 26, 'same': 101, 'forest': 27, 'which': 102, 'speaks': 103, 'towards': 104, 'tried': 105, 'mountain': 106, 'who': 28, 'upon': 107, 'plan': 108, 'man': 109, 'complained': 82, 'stirred': 110, 'off': 100, 'foot': 46, 'shortly': 111, 'thought': 29, 'so': 16, 'time': 30, 'the': 0, 'came': 31, 'once': 112}


Reverted Dictionary : 

{0: 'the', 1: ',', 2: 'a', 3: 'and', 4: '.', 5: 'wolf', 6: 'to', 7: 'boy', 8: 'he', 9: 'out', 10: 'of', 11: 'villagers', 12: 'was', 13: 'him', 14: 'this', 15: 'his', 16: 'so', 17: 'help', 18: 'before', 19: 'them', 20: 'some', 21: 'for', 22: 'come', 23: 'village', 24: 'but', 25: 'again', 26: 'when', 27: 'forest', 28: 'who', 29: 'thought', 30: 'time', 31: 'came', 32: 'all', 33: 'liar', 34: 'cried', 35: 'course', 36: 'still', 37: 'pleased', 38: 'deceiving', 39: 'had', 40: 'actually', 41: 'shepherd', 42: 'lonely', 43: 'than', 44: 'get', 45: 'dark', 46: 'foot', 47: 'day', 48: 'did', 49: 'calling', 50: 'twice', 51: 'good', 52: 'stopped', 53: 'truth', 54: 'meal', 55: 'sheep,', 56: 'tended', 57: 'louder', 58: 'flock', 59: 'even', 60: 'trick', 61: 'said', 62: 'be', 63: 'after', 64: 'not', 65: 'by', 66: 'could', 67: 'days', 68: 'afterwards', 69: 'young', 70: 'down', 71: 'will', 72: 'sheep', 73: 'little', 74: 'from', 75: 'rushed', 76: 'there', 77: 'been', 78: 'few', 79: 'much', 80: "boy's", 81: ':', 82: 'complained', 83: 'that', 84: 'company', 85: 'nobody', 86: 'fooled', 87: 'with', 88: 'considerable', 89: 'made', 90: 'wise', 91: 'excitement', 92: 'near', 93: 'believed', 94: 'meet', 95: 'it', 96: 'at', 97: 'worry', 98: 'rather', 99: 'began', 100: 'off', 101: 'same', 102: 'which', 103: 'speaks', 104: 'towards', 105: 'tried', 106: 'mountain', 107: 'upon', 108: 'plan', 109: 'man', 110: 'stirred', 111: 'shortly', 112: 'once'}

Part 2 : LSTM Model in TensorFlow

Since you have defined how the data will be modeled, you are now to develop an LSTM model to predict the word of following a sequence of 3 words.

2.1. Model definition

Define a 2-layers LSTM model.

For this use the following classes from the tensorflow.contrib library:

rnn.BasicLSTMCell(number of hidden units)
rnn.static_rnn(rnn_cell, data, dtype=tf.float32)
rnn.MultiRNNCell(,)

You may need some tensorflow functions (https://www.tensorflow.org/api_docs/python/tf/) :

tf.split
tf.reshape
...



In [5]:

    
def lstm_model(x, w, b, n_input, n_hidden):
    # reshape to [1, n_input]
    x = tf.reshape(x, [-1, n_input])

    # Generate a n_input-element sequence of inputs
    # (eg. [had] [a] [general] -> [20] [6] [33])
    x = tf.split(x,n_input,1)

    # 1-layer LSTM with n_hidden units.
    rnn_cell = rnn.BasicLSTMCell(n_hidden)
    
    #improvement
    #rnn_cell = rnn.MultiRNNCell([rnn.BasicLSTMCell(n_hidden),rnn.BasicLSTMCell(n_hidden)])
    #rnn_cell = rnn.MultiRNNCell([rnn.BasicLSTMCell(n_hidden),rnn.BasicLSTMCell(n_hidden),rnn.BasicLSTMCell(n_hidden)])

    # generate prediction
    outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)

    # there are n_input outputs but
    # we only want the last output
    return tf.matmul(outputs[-1], w['out']) + b['out']

Training Parameters and constants



In [6]:

    
# Training Parameters
learning_rate = 0.001
epochs = 50000
display_step = 1000
n_input = 3

#For each LSTM cell that you initialise, supply a value for the hidden dimension, number of units in LSTM cell
n_hidden = 64

# tf Graph input
x = tf.placeholder("float", [None, n_input, 1])
y = tf.placeholder("float", [None, vocabulary_size])

# LSTM  weights and biases
weights = { 'out': tf.Variable(tf.random_normal([n_hidden, vocabulary_size]))}
biases = {'out': tf.Variable(tf.random_normal([vocabulary_size])) }


#build the model
pred = lstm_model(x, weights, biases,n_input,n_hidden)

Define the Loss/Cost and optimizer



In [7]:

    
# Loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
#cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
#cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred,-1.0,1.0)), reduction_indices=1))
optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)

# Model evaluation
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

Comment:

We decided to apply the softmax and calculate the cost at the same time. In this way we can use the method softmax_cross_entropy_with_logits, which is more numerically stable in corner cases than applying the softmax and then calculating the cross entropy

We give you here the Test Function



In [8]:

    
#run the cell
def test(sentence, session, verbose=False):
    sentence = sentence.strip()
    words = sentence.split(' ')
    if len(words) != n_input:
        print("sentence length should be equel to", n_input, "!")
    try:
        symbols_inputs = [dictionary[str(words[i - n_input])] for i in range(n_input)]
        keys = np.reshape(np.array(symbols_inputs), [-1, n_input, 1])
        onehot_pred = session.run(pred, feed_dict={x: keys})
        onehot_pred_index = int(tf.argmax(onehot_pred, 1).eval())
        words.append(reverse_dictionary[onehot_pred_index])
        sentence = " ".join(words)
        if verbose:
            print(sentence)
        return reverse_dictionary[onehot_pred_index]
    except:
        print " ".join(["Word", words[i - n_input], "not in dictionary"])

Part 3 : LSTM Training

In the Training process, at each epoch, 3 words are taken from the training data, encoded to integer to form the input vector. The training labels are one-hot vector encoding the word that comes after the 3 inputs words. Display the loss and the training accuracy every 1000 iteration. Save the model at the end of training in the lstm_model folder



In [9]:

    
# Initializing the variables
init = tf.global_variables_initializer()
saver = tf.train.Saver()
start_time = time()
# Launch the graph
with tf.Session() as session:
    session.run(init)
    step = 0
    offset = random.randint(0,n_input+1)
    end_offset = n_input + 1
    acc_total = 0
    loss_total = 0

    writer.add_graph(session.graph)

    while step < epochs:
        # Generate a minibatch. Add some randomness on selection process.
        if offset > (len(train_data)-end_offset):
            offset = random.randint(0, n_input+1)

        symbols_in_keys = [ [dictionary[ str(train_data[i])]] for i in range(offset, offset+n_input) ]
        symbols_in_keys = np.reshape(np.array(symbols_in_keys), [-1, n_input, 1])

        symbols_out_onehot = np.zeros([len(dictionary)], dtype=float)
        symbols_out_onehot[dictionary[str(train_data[offset+n_input])]] = 1.0
        symbols_out_onehot = np.reshape(symbols_out_onehot,[1,-1])

        _, acc, loss, onehot_pred = session.run([optimizer, accuracy, cost, pred], \
                                                feed_dict={x: symbols_in_keys, y: symbols_out_onehot})
        loss_total += loss
        acc_total += acc
        if (step+1) % display_step == 0:
            print("Iter= " + str(step+1) + ", Average Loss= " + \
                  "{:.6f}".format(loss_total/display_step) + ", Average Accuracy= " + \
                  "{:.2f}%".format(100*acc_total/display_step))
            acc_total = 0
            loss_total = 0
            symbols_in = [train_data[i] for i in range(offset, offset + n_input)]
            symbols_out = train_data[offset + n_input]
            symbols_out_pred = reverse_dictionary[int(tf.argmax(onehot_pred, 1).eval())]
            print("%s - [%s] vs [%s]" % (symbols_in,symbols_out,symbols_out_pred))
        step += 1
        offset += (n_input+1)
    print("Optimization Finished!")
    print("Elapsed time: ", time() - start_time)
    print("Run on command line.")
    print("\ttensorboard --logdir=%s" % (logs_path))
    print("Point your web browser to: http://localhost:6006/")
    save_path = saver.save(session, "model.ckpt")
    print("Model saved in file: %s" % save_path)









    



Iter= 1000, Average Loss= 4.982308, Average Accuracy= 7.10%
['there', 'was', 'once'] - [a] vs [the]
Iter= 2000, Average Loss= 4.005073, Average Accuracy= 12.60%
['who', 'tended', 'his'] - [sheep] vs [.]
Iter= 3000, Average Loss= 3.688261, Average Accuracy= 19.30%
['young', 'shepherd', 'boy'] - [who] vs [and]
Iter= 4000, Average Loss= 3.806114, Average Accuracy= 18.70%
['there', 'was', 'once'] - [a] vs [the]
Iter= 5000, Average Loss= 3.021567, Average Accuracy= 33.00%
['dark', 'forest', '.'] - [it] vs [so]
Iter= 6000, Average Loss= 2.841508, Average Accuracy= 38.70%
['mountain', 'near', 'a'] - [dark] vs [even]
Iter= 7000, Average Loss= 3.276319, Average Accuracy= 29.80%
['mountain', 'near', 'a'] - [dark] vs [and]
Iter= 8000, Average Loss= 2.447943, Average Accuracy= 44.70%
['the', 'foot', 'of'] - [a] vs [a]
Iter= 9000, Average Loss= 2.347743, Average Accuracy= 49.20%
['rather', 'lonely', 'for'] - [him] vs [.]
Iter= 10000, Average Loss= 2.667252, Average Accuracy= 42.10%
['.', 'it', 'was'] - [rather] vs [a]
Iter= 11000, Average Loss= 2.134206, Average Accuracy= 54.30%
['rather', 'lonely', 'for'] - [him] vs [him]
Iter= 12000, Average Loss= 2.089802, Average Accuracy= 54.30%
['for', 'him', 'all'] - [day] vs [the]
Iter= 13000, Average Loss= 2.256048, Average Accuracy= 50.30%
['so', 'he', 'thought'] - [upon] vs [the]
Iter= 14000, Average Loss= 2.319333, Average Accuracy= 50.40%
['plan', 'by', 'which'] - [he] vs [he]
Iter= 15000, Average Loss= 1.993091, Average Accuracy= 54.10%
[',', 'so', 'he'] - [thought] vs [thought]
Iter= 16000, Average Loss= 1.815346, Average Accuracy= 60.20%
['he', 'thought', 'upon'] - [a] vs [a]
Iter= 17000, Average Loss= 2.066788, Average Accuracy= 53.00%
['could', 'get', 'a'] - [little] vs [and]
Iter= 18000, Average Loss= 1.361218, Average Accuracy= 69.60%
['get', 'a', 'little'] - [company] vs [the]
Iter= 19000, Average Loss= 2.101400, Average Accuracy= 50.10%
['.', 'he', 'rushed'] - [down] vs [down]
Iter= 20000, Average Loss= 1.977156, Average Accuracy= 52.90%
['he', 'could', 'get'] - [a] vs [a]
Iter= 21000, Average Loss= 1.676822, Average Accuracy= 57.80%
['he', 'thought', 'upon'] - [a] vs [a]
Iter= 22000, Average Loss= 1.052131, Average Accuracy= 77.30%
['by', 'which', 'he'] - [could] vs [dark]
Iter= 23000, Average Loss= 1.074769, Average Accuracy= 73.40%
['company', 'and', 'some'] - [excitement] vs [excitement]
Iter= 24000, Average Loss= 1.762384, Average Accuracy= 58.10%
['calling', 'out', 'wolf'] - [,] vs [,]
Iter= 25000, Average Loss= 1.461418, Average Accuracy= 65.50%
['towards', 'the', 'village'] - [calling] vs [calling]
Iter= 26000, Average Loss= 1.488478, Average Accuracy= 61.70%
['the', 'village', 'calling'] - [out] vs [,]
Iter= 27000, Average Loss= 1.633228, Average Accuracy= 59.10%
['the', 'villagers', 'came'] - [out] vs [to]
Iter= 28000, Average Loss= 1.334593, Average Accuracy= 65.40%
[',', 'and', 'some'] - [of] vs [the]
Iter= 29000, Average Loss= 0.985419, Average Accuracy= 75.10%
['with', 'him', 'for'] - [a] vs [a]
Iter= 30000, Average Loss= 1.555909, Average Accuracy= 61.00%
['of', 'them', 'stopped'] - [with] vs [with]
Iter= 31000, Average Loss= 1.240786, Average Accuracy= 69.80%
['stopped', 'with', 'him'] - [for] vs [a]
Iter= 32000, Average Loss= 1.029533, Average Accuracy= 73.40%
['him', ',', 'and'] - [some] vs [some]
Iter= 33000, Average Loss= 1.030927, Average Accuracy= 73.60%
['some', 'of', 'them'] - [stopped] vs [help]
Iter= 34000, Average Loss= 1.196683, Average Accuracy= 70.90%
[',', 'and', 'some'] - [of] vs [the]
Iter= 35000, Average Loss= 1.099914, Average Accuracy= 73.10%
['them', 'stopped', 'with'] - [him] vs [,]
Iter= 36000, Average Loss= 1.203794, Average Accuracy= 68.50%
['time', '.', 'this'] - [pleased] vs [pleased]
Iter= 37000, Average Loss= 0.879334, Average Accuracy= 77.20%
['that', 'a', 'few'] - [days] vs [was]
Iter= 38000, Average Loss= 0.994206, Average Accuracy= 74.00%
['afterwards', 'he', 'tried'] - [the] vs [the]
Iter= 39000, Average Loss= 0.838392, Average Accuracy= 79.70%
['to', 'his', 'help'] - [.] vs [.]
Iter= 40000, Average Loss= 0.810431, Average Accuracy= 79.90%
['wolf', 'actually', 'did'] - [come] vs [not]
Iter= 41000, Average Loss= 0.894344, Average Accuracy= 78.10%
['to', 'worry', 'the'] - [sheep,] vs [sheep,]
Iter= 42000, Average Loss= 0.876505, Average Accuracy= 77.80%
[',', 'still', 'louder'] - [than] vs [than]
Iter= 43000, Average Loss= 0.943342, Average Accuracy= 76.80%
['the', 'boy', 'of'] - [course] vs [again]
Iter= 44000, Average Loss= 0.827295, Average Accuracy= 79.10%
['actually', 'did', 'come'] - [out] vs [out]
Iter= 45000, Average Loss= 0.958322, Average Accuracy= 76.00%
['from', 'the', 'forest'] - [,] vs [,]
Iter= 46000, Average Loss= 0.612455, Average Accuracy= 84.00%
['shortly', 'after', 'this'] - [a] vs [a]
Iter= 47000, Average Loss= 0.556309, Average Accuracy= 85.30%
['sheep,', 'and', 'the'] - [boy] vs [boy]
Iter= 48000, Average Loss= 0.707541, Average Accuracy= 81.30%
['villagers', ',', 'who'] - [had] vs [had]
Iter= 49000, Average Loss= 0.960478, Average Accuracy= 75.50%
[',', 'wolf', ','] - [still] vs [still]
Iter= 50000, Average Loss= 0.672656, Average Accuracy= 83.00%
['still', 'louder', 'than'] - [before] vs [before]
Optimization Finished!
('Elapsed time: ', 58.89074206352234)
Run on command line.
	tensorboard --logdir=lstm_words
Point your web browser to: http://localhost:6006/
Model saved in file: model.ckpt

Comment:

We created different models with different number of layers, and we have seen that the best accuracy is achieved using only 2 laers. Using more or less layers we achieve a lower accuracy

Part 4 : Test your model

3.1. Next word prediction

Load your model (using the model_saved variable given in the training session) and test the sentences :

'get a little'
'nobody tried to'
Try with other sentences using words from the stroy's vocabulary.



In [10]:

    
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    # Restore model weights from previously saved model
    saver.restore(sess, "./model.ckpt")
    print(test('get a little', sess))
    print(test('nobody tried to', sess))









    



company
come

Comment:

Here it looks that the RNN is working, in fact it can predict correctly the next word. We should not that in this case is difficult to check if the RNN is actually overfitting the training data.

3.2. More fun with the Fable Writer !

You will use the RNN/LSTM model learned in the previous question to create a new story/fable. For this you will choose 3 words from the dictionary which will start your story and initialize your network. Using those 3 words the RNN will generate the next word or the story. Using the last 3 words (the newly predicted one and the last 2 from the input) you will use the network to predict the 5 word of the story.. and so on until your story is 5 sentence long. Make a point at the end of your story. To implement that, you will use the test function.

This is the original fable, we will look at it to note an eventual overfitting



In [13]:

    
#Your implementation goes here 
with tf.Session() as sess:
    # Initialize variables
    sess.run(init)

    # Restore model weights from previously saved model
    saver.restore(sess, "./model.ckpt")
    
    #a sentence is concluded when we find a dot.
    fable = [random.choice(dictionary.keys()) for _ in range(3)]
    n_sentences = fable.count('.')

    offset = 0
    while n_sentences < 5:
        next_word = test(' '.join(fable[offset:offset+3]), sess)
        fable.append(next_word)
        if next_word == '.':
            n_sentences += 1
        offset+=1
    print(' '.join(fable))









    



man rather tried he the village had out wolf meet him , and some of them stopped with a and when the villagers came to his help . so the wolf made , good days : , and when the villagers came to his help . so the wolf made , good days : , and when the villagers came to his help . so the wolf made , good days : , and when the villagers came to his help . so the wolf made , good days : , and when the villagers came to his help .

Comment:

This is interesting, we see that the sentences have some sort of sense, but when we reach a point, we see the same sentence repated many times. Thus is probably due to overfitting, we should look more deeply. We see that the repeated sentence is different from the original one, but it is still always the same. We think this is due to the fact that the dot start always the same sentence. Maybe we could create more layers and see what happens.



In [3]:

    
def load_data(filename):
    with open(filename) as f:
        data = f.readlines()
    data = [x.strip().lower() for x in data]
    data = [data[i].split() for i in range(len(data))]
    data = np.array(data)
    data = np.reshape(data, [-1, ])
    return data

train_file ='data/story.txt'
train_data = load_data(train_file)

def build_vocabulary(words):
    count = collections.Counter(words).most_common()
    dic= dict()
    for word, _ in count:
        dic[word] = len(dic)

    reverse_dic= dict(zip(dic.values(), dic.keys()))
    return dic, reverse_dic

dictionary, reverse_dictionary = build_vocabulary(train_data)
vocabulary_size= len(dictionary)



In [32]:

    
import numpy as np
import collections # used to build the dictionary
import random
import time
from time import time
import pickle # may be used to save your model 
import matplotlib.pyplot as plt
#Import Tensorflow and rnn
import tensorflow as tf
from tensorflow.contrib import rnn  

def create_train_model(n_input = 3, n_layers = 2,verbose = False):
    tf.reset_default_graph()
    # Target log path
    logs_path = 'lstm_words'
    writer = tf.summary.FileWriter(logs_path)
    
    def lstm_model(x, w, b, n_input, n_hidden,n_layers):
        # reshape to [1, n_input]
        x = tf.reshape(x, [-1, n_input])

        # Generate a n_input-element sequence of inputs
        # (eg. [had] [a] [general] -> [20] [6] [33])
        x = tf.split(x,n_input,1)

        rnn_layers = [rnn.BasicLSTMCell(n_hidden) for _ in range(n_layers)]
        rnn_cell = rnn.MultiRNNCell(rnn_layers)
        # generate prediction
        outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)

        # there are n_input outputs but
        # we only want the last output
        return tf.matmul(outputs[-1], w['out']) + b['out']
    
    # Training Parameters
    learning_rate = 0.001
    epochs = 50000
    display_step = 1000


    #For each LSTM cell that you initialise, supply a value for the hidden dimension, number of units in LSTM cell
    n_hidden = 64

    # tf Graph input
    x = tf.placeholder("float", [None, n_input, 1])
    y = tf.placeholder("float", [None, vocabulary_size])

    # LSTM  weights and biases
    weights = { 'out': tf.Variable(tf.random_normal([n_hidden, vocabulary_size]))}
    biases = {'out': tf.Variable(tf.random_normal([vocabulary_size])) }


    #build the model
    pred = lstm_model(x, weights, biases,n_input,n_hidden,n_layers)
    # Loss and optimizer
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
    #cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(pred), reduction_indices=1))
    #cost = tf.reduce_mean(-tf.reduce_sum(y*tf.log(tf.clip_by_value(pred,-1.0,1.0)), reduction_indices=1))
    optimizer = tf.train.RMSPropOptimizer(learning_rate).minimize(cost)

    # Model evaluation
    correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
    
    # Initializing the variables
    init = tf.global_variables_initializer()
    saver = tf.train.Saver()
    start_time = time()
    # Launch the graph
    with tf.Session() as session:
        session.run(init)
        step = 0
        offset = random.randint(0,n_input+1)
        end_offset = n_input + 1
        acc_total = 0
        loss_total = 0

        writer.add_graph(session.graph)

        while step < epochs:
            # Generate a minibatch. Add some randomness on selection process.
            if offset > (len(train_data)-end_offset):
                offset = random.randint(0, n_input+1)

            symbols_in_keys = [ [dictionary[ str(train_data[i])]] for i in range(offset, offset+n_input) ]
            symbols_in_keys = np.reshape(np.array(symbols_in_keys), [-1, n_input, 1])

            symbols_out_onehot = np.zeros([len(dictionary)], dtype=float)
            symbols_out_onehot[dictionary[str(train_data[offset+n_input])]] = 1.0
            symbols_out_onehot = np.reshape(symbols_out_onehot,[1,-1])

            _, acc, loss, onehot_pred = session.run([optimizer, accuracy, cost, pred], \
                                                    feed_dict={x: symbols_in_keys, y: symbols_out_onehot})
            loss_total += loss
            acc_total += acc
            if (step+1) % display_step == 0:
                if verbose or step+1 == epochs: print("Iter= " + str(step+1) + ", Average Loss= " + \
                      "{:.6f}".format(loss_total/display_step) + ", Average Accuracy= " + \
                      "{:.2f}%".format(100*acc_total/display_step))
                acc_total = 0
                loss_total = 0
                symbols_in = [train_data[i] for i in range(offset, offset + n_input)]
                symbols_out = train_data[offset + n_input]
                symbols_out_pred = reverse_dictionary[int(tf.argmax(onehot_pred, 1).eval())]
                if verbose: print("%s - [%s] vs [%s]" % (symbols_in,symbols_out,symbols_out_pred))
            step += 1
            offset += (n_input+1)
            
        
        print("Optimization Finished!")
        print("Elapsed time: ", time() - start_time)
        print("Run on command line.")
        print("\ttensorboard --logdir=%s" % (logs_path))
        print("Point your web browser to: http://localhost:6006/")
        save_path = saver.save(session, "model.ckpt")
        print("Model saved in file: %s" % save_path)
        
        #run the cell
        def test(sentence, session, verbose=False):
            sentence = sentence.strip()
            words = sentence.split(' ')
            if len(words) != n_input:
                print("sentence length should be equel to", n_input, "!")
            try:
                symbols_inputs = [dictionary[str(words[i - n_input])] for i in range(n_input)]
                keys = np.reshape(np.array(symbols_inputs), [-1, n_input, 1])
                onehot_pred = session.run(pred, feed_dict={x: keys})
                onehot_pred_index = int(tf.argmax(onehot_pred, 1).eval())
                words.append(reverse_dictionary[onehot_pred_index])
                sentence = " ".join(words)
                if verbose:
                    print(sentence)
                return reverse_dictionary[onehot_pred_index]
            except:
                print " ".join(["Word", words[i - n_input], "not in dictionary"])

        
        #a sentence is concluded when we find a dot.
        fable = [random.choice(dictionary.keys()) for _ in range(n_input)]
        #print(dictionary)
        #print(fable)
        n_sentences = fable.count('.')

        offset = 0
        while n_sentences < 5 and len(fable) < 200:
            next_word = test(' '.join(fable[offset:offset+n_input]), session)
            fable.append(next_word)
            if next_word == '.':
                n_sentences += 1
            offset+=1
        print(' '.join(fable))

3.3. Play with number of inputs

The number of input in our example is 3, see what happens when you use other number (1 and 5)

n_input = 1



In [33]:

    
create_train_model(n_input = 1, n_layers = 1)









    



Iter= 50000, Average Loss= 3.199297, Average Accuracy= 11.90%
Optimization Finished!
('Elapsed time: ', 70.26049900054932)
Run on command line.
	tensorboard --logdir=lstm_words
Point your web browser to: http://localhost:6006/
Model saved in file: model.ckpt
actually . his . his . his . his .



In [34]:

    
create_train_model(n_input = 1, n_layers = 2)









    



Iter= 50000, Average Loss= 3.596768, Average Accuracy= 13.90%
Optimization Finished!
('Elapsed time: ', 65.09528613090515)
Run on command line.
	tensorboard --logdir=lstm_words
Point your web browser to: http://localhost:6006/
Model saved in file: model.ckpt
so . worry he his . worry he his . worry he his . worry he his .



In [35]:

    
create_train_model(n_input = 1, n_layers = 3)









    



Iter= 50000, Average Loss= 3.031387, Average Accuracy= 17.60%
Optimization Finished!
('Elapsed time: ', 69.99397802352905)
Run on command line.
	tensorboard --logdir=lstm_words
Point your web browser to: http://localhost:6006/
Model saved in file: model.ckpt
and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his , and some the boy who to his ,

Comment:

Here we see that when the input size is 1 we obtain a vad model regardless of the number of layers, this is because we are basically predicting a word based on the preceding word. This not enough to create a sentence with some sort of sense.Looking ath the prediction accuracy, it is very low.

n_input = 3



In [36]:

    
create_train_model(n_input = 3, n_layers = 1)









    



Iter= 50000, Average Loss= 0.708094, Average Accuracy= 81.10%
Optimization Finished!
('Elapsed time: ', 59.99074721336365)
Run on command line.
	tensorboard --logdir=lstm_words
Point your web browser to: http://localhost:6006/
Model saved in file: model.ckpt
speaks pleased stopped him some a considerable time . this pleased the boy so much that the and again much not a and again the villagers came to his help . but this time the villagers , who had been him for a considerable time . this pleased the boy so much that the and again much not a and again the villagers came to his help . but this time the villagers , who had been him for a considerable time .



In [37]:

    
create_train_model(n_input = 3, n_layers = 2)









    



Iter= 50000, Average Loss= 0.275338, Average Accuracy= 90.70%
Optimization Finished!
('Elapsed time: ', 65.29126191139221)
Run on command line.
	tensorboard --logdir=lstm_words
Point your web browser to: http://localhost:6006/
Model saved in file: model.ckpt
wolf down them , wolf days stirred to could get come . it was rather lonely for him all day , so he thought upon a plan by afterwards he who tended his sheep at the foot . the boy , who villagers thought day them , and nobody stirred to come to his help . but this time the villagers , who had been fooled twice before , thought the boy , who villagers thought day them , and nobody stirred to come to his help . but this time the villagers , who had been fooled twice before , thought the boy , who villagers thought day them , and nobody stirred to come to his help .



In [38]:

    
create_train_model(n_input = 3, n_layers = 3)









    



Iter= 50000, Average Loss= 0.319908, Average Accuracy= 90.20%
Optimization Finished!
('Elapsed time: ', 77.7850570678711)
Run on command line.
	tensorboard --logdir=lstm_words
Point your web browser to: http://localhost:6006/
Model saved in file: model.ckpt
shepherd time than out wolf , wolf , still louder than before . but this time the villagers , who had been fooled twice before , and nobody to worry the sheep, flock , little when he speaks the truth and the boy of course cried out wolf , wolf , still louder than before . but this time the villagers , who had been fooled twice before , and nobody to worry the sheep, flock , little when he speaks the truth and the boy of course cried out wolf , wolf , still louder than before . but this time the villagers , who had been fooled twice before , and nobody to worry the sheep, flock , little when he speaks the truth and the boy of course cried out wolf , wolf , still louder than before . but this time the villagers , who had been fooled twice before , and nobody to worry the sheep, flock , little when he speaks the truth and the boy of course cried out wolf , wolf , still louder than before .

Comment:

Here we see some sentences that have a sense, but we see a tendency to repeat the sentence of the training fable. This is interesting, because during the training the single triples where chosen randomly and not sequentially. Somehow, the net learned the training fable.

n_input = 5



In [39]:

    
create_train_model(n_input = 5, n_layers = 1)









    



Iter= 50000, Average Loss= 0.033428, Average Accuracy= 98.90%
Optimization Finished!
('Elapsed time: ', 67.373526096344)
Run on command line.
	tensorboard --logdir=lstm_words
Point your web browser to: http://localhost:6006/
Model saved in file: model.ckpt
: : and out come the villagers so . his came the so shortly , the wise man of the village said : a liar will not be believed , even when he speaks the truth . the boy of course cried out wolf , wolf , and the villagers came out to meet him , and some of them stopped with him for a considerable time . this pleased the boy so much that a few days afterwards he tried the same trick , and when the villagers came to his help . but shortly after this a wolf actually did come out from the forest , and began to worry the sheep, and the boy of course cried out wolf , wolf , and the villagers came out to meet him , and some of them stopped with him for a considerable time .



In [40]:

    
create_train_model(n_input = 5, n_layers = 2)









    



Iter= 50000, Average Loss= 0.056223, Average Accuracy= 98.50%
Optimization Finished!
('Elapsed time: ', 78.79401302337646)
Run on command line.
	tensorboard --logdir=lstm_words
Point your web browser to: http://localhost:6006/
Model saved in file: model.ckpt
still tried foot mountain stirred , even when he speaks the truth . the boy of course cried out wolf , wolf , and the villagers came out to meet him , and some of them stopped with him for a considerable time . this pleased the boy so much that a few days afterwards he tried the same trick , and again the boy complained , the wise man of the village said : a liar will not be believed , even when he speaks the truth . the boy of course cried out wolf , wolf , and the villagers came out to meet him , and some of them stopped with him for a considerable time . this pleased the boy so much that a few days afterwards he tried the same trick , and again the boy complained , the wise man of the village said : a liar will not be believed , even when he speaks the truth .



In [42]:

    
create_train_model(n_input = 5, n_layers = 3)









    



Iter= 50000, Average Loss= 0.042002, Average Accuracy= 99.00%
Optimization Finished!
('Elapsed time: ', 96.72665596008301)
Run on command line.
	tensorboard --logdir=lstm_words
Point your web browser to: http://localhost:6006/
Model saved in file: model.ckpt
time before that be afterwards the wolf again the villagers came to his help . but shortly after this a wolf actually did come out from the forest , and began to worry the sheep, and the boy of course cried out wolf , wolf , and the villagers came out to meet him , and some of them stopped with him for a considerable time . this pleased the boy so much that a few days afterwards he tried the same trick , and again the villagers came to his help . but shortly after this a wolf actually did come out from the forest , and began to worry the sheep, and the boy of course cried out wolf , wolf , and the villagers came out to meet him , and some of them stopped with him for a considerable time . this pleased the boy so much that a few days afterwards he tried the same trick , and again the villagers came to his help .

Comment:

With 5 words, the model learn to predict very well the next word, in fact we obtain an high accuracy. In this case we see that whole sentences are copied from the original fable, but they are not repeated exactly, we still see that some sentences are repeated, but at this point we think that this is due to the limited training set.