Text generation using RNN/LSTM (Character-level)

In this notebook you will learn the How to use TensorFlow for create a Recurrent Neural Network


This code implements a Recurrent Neural Network with LSTM/RNN units for training/sampling from character-level language models. In other words the model takes a text file as input and trains the RNN network that learns to predict the next character in a sequence.
The RNN can then be used to generate text character by character that will look like the original training data.

This code is based on this blog, and the code is an step-by-step implimentation of the character-level implimentation.


In [1]:
import tensorflow as tf
import time

In [2]:
print('TensorFlow version:', tf.__version__)


TensorFlow version: 1.1.0

Data loader

The following cell is a class that help to read data from input file.


In [3]:
import codecs
import os
import collections
from six.moves import cPickle
import numpy as np

class TextLoader():
    def __init__(self, data_dir, batch_size, seq_length, encoding='utf-8'):
        self.data_dir = data_dir
        self.batch_size = batch_size
        self.seq_length = seq_length
        self.encoding = encoding

        input_file = os.path.join(data_dir, "input.txt")
        vocab_file = os.path.join(data_dir, "vocab.pkl")
        tensor_file = os.path.join(data_dir, "data.npy")

        if not (os.path.exists(vocab_file) and os.path.exists(tensor_file)):
            print("reading text file")
            self.preprocess(input_file, vocab_file, tensor_file)
        else:
            print("loading preprocessed files")
            self.load_preprocessed(vocab_file, tensor_file)
        self.create_batches()
        self.reset_batch_pointer()

    def preprocess(self, input_file, vocab_file, tensor_file):
        with codecs.open(input_file, "r", encoding=self.encoding) as f:
            data = f.read()
        counter = collections.Counter(data)
        count_pairs = sorted(counter.items(), key=lambda x: -x[1])
        self.chars, _ = zip(*count_pairs)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        with open(vocab_file, 'wb') as f:
            cPickle.dump(self.chars, f)
        self.tensor = np.array(list(map(self.vocab.get, data)))
        np.save(tensor_file, self.tensor)

    def load_preprocessed(self, vocab_file, tensor_file):
        with open(vocab_file, 'rb') as f:
            self.chars = cPickle.load(f)
        self.vocab_size = len(self.chars)
        self.vocab = dict(zip(self.chars, range(len(self.chars))))
        self.tensor = np.load(tensor_file)
        self.num_batches = int(self.tensor.size / (self.batch_size *
                                                   self.seq_length))

    def create_batches(self):
        self.num_batches = int(self.tensor.size / (self.batch_size *
                                                   self.seq_length))

        # When the data (tensor) is too small, let's give them a better error message
        if self.num_batches==0:
            assert False, "Not enough data. Make seq_length and batch_size small."

        self.tensor = self.tensor[:self.num_batches * self.batch_size * self.seq_length]
        xdata = self.tensor
        ydata = np.copy(self.tensor)
        ydata[:-1] = xdata[1:]
        ydata[-1] = xdata[0]
        self.x_batches = np.split(xdata.reshape(self.batch_size, -1), self.num_batches, 1)
        self.y_batches = np.split(ydata.reshape(self.batch_size, -1), self.num_batches, 1)


    def next_batch(self):
        x, y = self.x_batches[self.pointer], self.y_batches[self.pointer]
        self.pointer += 1
        return x, y

    def reset_batch_pointer(self):
        self.pointer = 0

Parameters

Batch, number_of_batch, batch_size and seq_length

what is batch, number_of_batch, batch_size and seq_length in the charcter level example?

Lets assume the input is 'here is an example'. Then:

  • txt_length = 18
  • seq_length = 3
  • batch_size = 2
  • number_of_batch = 18/3*2 = 3
  • batch = array (['h','e','r'],['e',' ','i'])
  • sample Seq = 'her'

So, what are our actual parameters?


In [4]:
batch_size = 60 #minibatch size, i.e. size of dataset in each epoch
seq_length = 50 #RNN sequence length
num_epochs = 25 # you should change it to 50 if you want to see a relatively good results
learning_rate = 0.002
decay_rate = 0.97
rnn_size = 128 #size of RNN hidden state
num_layers = 2 #number of layers in the RNN

LSTM Architecture

  • each LSTM cell has an input layre, which its size is 128 units.
  • 128 is dimensionality of embedding vector.

rnn_size = num_units = num_hidden_units: = LSTM size

  • Each LSTM cell has a hidden layer, where there are some hidden units.
  • The argument n_hidden=128 of BasicLSTMCell is the number of hidden units of the LSTM (inside A).
  • Each LSTM cell keeps a vector, called hidden state vector, of size n_hidden=128.
  • A hidden state vector; which is the memory of the LSTM, accumulates using its (forget, input, and output) gates through time.
  • For each LSTM cell that we initialise, we need to supply a value (128 in this case) for the hidden dimension, or as some people like to call it, the number of units in the LSTM cell.
  • "num_units" is equivalant to "size of RNN hidden state"
  • rnn_size= 128, is also the dimension size of W2V/embedding, for each character/word.
  • An LSTM keeps two pieces of information as it propagates through time:
    • A hidden state vector
    • A previous time-step output
  • To make the name num_units more intuitive, you can think of it as the number of hidden units in the LSTM cell, or the number of memory units in the cell.
  • number of hidden units is the dimensianality of the output (= dimesianality of the state) of the LSTM cell.

num_layers = 2

  • number of layers in the RNN
  • An input of MultiRNNCell is cells which is list of RNNCells that will be composed in this order.

In [5]:
!mkdir -p ../../data/character_model
!wget -nv -O ../../data/character_model/input.txt https://ibm.box.com/shared/static/a3f9e9mbpup09toq35ut7ke3l3lf03hg.txt


2017-05-23 10:49:23 URL:https://public.boxcloud.com/d/1/GlcYgZJR3rfWY5w4a_kuVUXqKbHq-E2WEdL4t5SaMzl9ynLKF8O179kzBkAWcQ_3bftp36Kku38Bsg3Mh5G5qkYgWV_DeKR7i-3GtDjfyXCQeTLCH5zDlsE-TsQBnYv2g0H45sVs1U13T4ny2gIaXERiwdUTbHZe1XG7wsWOTklWW8v6vS8ZWjfpaLx9-CfKPcIJLs1MqCs6G07vla_6xwh5ijAH8POcgKkqUtMqqvQLLv8jWMYbZmz951uZYhPynbgJ4MD8YcMdG50fF7gi8rlLpxUSOuYG5ai47wKoAvU14SQ2k6fDmMikIg5N0uSgwBSTYQx-onBzYVmeWM-1DO97u5X4oI_XfEJbC3frtD8x7QVZYswqSTDHkCG6FTugVaOkBeLDr33WtYCBJ3mGydelHueaZaG1v5H14YyK5Xc_cn0vV0bsNa2fGZKQzwVVMD71nGr4ZosQBNzrDldQkvmvH7mik2knOl3LFkFnnpBPzCMbdP8oWe5h1y3miR6myY6txDNF0rga-8OXabMKeETvxNkk9w0ALnHo7D8nYp4_IjUHXIIvQzOHG41piyKOUD26drbNjrj5vMG0Ij3OvYJxDE9e_LeqmfdURW76obvuLIiaA6Ju1Kx5rTkPz_YuIem3r1oRqr6syB0qePMd0ecuPqbku9b3bmmVkt2ZGPuIoOQ7Bk7Ein7K-N6qPlLhAX4eU4qzg-07qhJZKRCulKbXBbRv5-XbinDYU8K3OFyzcQXVBnvwdZ4EEecsCgHPsq1TH1NpjdLkNiIBkyEgD4oqbDoc64FbciDW6qtN-uAfkcZBd4N8bDCrvVV3IWv3V-INVPoXFEWjFLvzmi6hq3w-4OZXHvYfoXPV_u7f0XaspqQHIE2TdK6fiBZqWK2FADSf9K2C-PEB-VEUdYH6w6egin1VX3fj544ftQ8mwUxzH1rzdG3fbpA2d6JKRB-Mur-f1je-v1KtvdvoaXD-2A6r9rHDK0ByI6PC17CG3fpyZeQkCFYUnB0Ue4am7k_9LaHH3FB16gO8hVHdoft_uvOolP2GBf35YHrLvdePPcJ4INcurdY-bTwVxSET2ZCyCapHKn1HVby5xtjtgLBtTzBZXg9xwtj4PBkFnClXyueNdwwkdp_qXU01Zl-OFVnvGt8thC3vFpBknrL6mgXaV1ou/download [1115393/1115393] -> "../../data/character_model/input.txt" [1]

In [6]:
data_loader = TextLoader('../../data/character_model/', batch_size, seq_length)
vocab_size = data_loader.vocab_size
data_loader.vocab_size


loading preprocessed files
Out[6]:
65

In [7]:
data_loader.num_batches


Out[7]:
371

Input and output


In [8]:
x,y = data_loader.next_batch()

In [9]:
x


Out[9]:
array([[49,  9,  7, ...,  1,  4,  7],
       [19,  4, 14, ..., 14,  9, 20],
       [ 8, 20, 10, ...,  8, 10, 18],
       ..., 
       [21,  2,  0, ...,  0, 21,  0],
       [ 9,  7,  7, ...,  0,  2,  3],
       [ 3,  7,  0, ...,  5,  9, 23]])

In [10]:
x.shape  #batch_size=60, seq_length=50


Out[10]:
(60, 50)

In [11]:
y


Out[11]:
array([[ 9,  7,  6, ...,  4,  7,  0],
       [ 4, 14, 22, ...,  9, 20,  5],
       [20, 10, 29, ..., 10, 18,  4],
       ..., 
       [ 2,  0,  6, ..., 21,  0,  6],
       [ 7,  7,  4, ...,  2,  3,  0],
       [ 7,  0, 33, ...,  9, 23,  0]])

In [12]:
print('Vocabulary size:', data_loader.vocab_size)


Vocabulary size: 65

In [13]:
print(", ".join(sorted(list(data_loader.chars))))


,  , !, $, &, ', ,, -, ., 3, :, ;, ?, A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z

In [14]:
data_loader.vocab['t']


Out[14]:
2

Defining stacked RNN Cell

BasicRNNCell is the most basic RNN cell.


In [15]:
# a two layer cell
with tf.variable_scope('multi_rnn_cell'):
    stacked_cell = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.BasicRNNCell(rnn_size) for _ in range(num_layers)])

In [16]:
# hidden state size
stacked_cell.output_size


Out[16]:
128

In [17]:
stacked_cell.state_size


Out[17]:
(128, 128)

In [18]:
input_data = tf.placeholder(tf.int32, [batch_size, seq_length])# a 60x50
targets = tf.placeholder(tf.int32, [batch_size, seq_length]) # a 60x50

The memory state of the network is initialized with a vector of zeros and gets updated after reading each character.

BasicRNNCell.zero_state(batch_size, dtype) Return zero-filled state tensor(s).

Args:

batch_size: int, float, or unit Tensor representing the batch size.
dtype: the data type to use for the state.


In [19]:
initial_state = stacked_cell.zero_state(batch_size, tf.float32) #why batch_size ? 60x128

In [20]:
input_data


Out[20]:
<tf.Tensor 'Placeholder:0' shape=(60, 50) dtype=int32>

In [21]:
session = tf.Session()

In [22]:
feed_dict={input_data:x, targets:y}

In [23]:
session.run(input_data, feed_dict)


Out[23]:
array([[49,  9,  7, ...,  1,  4,  7],
       [19,  4, 14, ..., 14,  9, 20],
       [ 8, 20, 10, ...,  8, 10, 18],
       ..., 
       [21,  2,  0, ...,  0, 21,  0],
       [ 9,  7,  7, ...,  0,  2,  3],
       [ 3,  7,  0, ...,  5,  9, 23]], dtype=int32)

Embedding


In [24]:
with tf.variable_scope('rnnlm',reuse=False):
    softmax_w = tf.get_variable("softmax_w", [rnn_size, vocab_size]) #128x65
    softmax_b = tf.get_variable("softmax_b", [vocab_size]) # 1x65)
    with tf.device("/cpu:0"):
        embedding = tf.get_variable("embedding", [vocab_size, rnn_size])  #65x128
        #input_data is a matrix of 60x50 and embedding is dictionary of 65x128 for all 65 characters
        # embedding_lookup goes to each row of input_data, and for each character in the row, finds the correspond vector in embedding
        # it creates a 60*50*[1*128] matrix
        # so, the first elemnt of em, is a matrix of 50x128, which each row of it is vector representing that character
        em = tf.nn.embedding_lookup(embedding, input_data) # em is 60x50x[1*128]
        # split: Splits a tensor into sub tensors.
        # syntax:  tf.split(split_dim, num_split, value, name='split')
        # it will split the 60x50x[1x128] matrix into 50 matrix of 60x[1*128]
        inputs = tf.split(em, seq_length, 1)
        # It will convert the list to 50 matrix of [60x128]
        inputs = [tf.squeeze(input_, [1]) for input_ in inputs]

In [25]:
session.run(tf.global_variables_initializer())
session.run(embedding)


Out[25]:
array([[-0.00455077, -0.08505657,  0.00968477, ..., -0.11256096,
         0.08936651,  0.13564713],
       [-0.01204589,  0.01478565,  0.12983467, ...,  0.03713098,
         0.07412069,  0.12602584],
       [-0.00811039, -0.09547275, -0.00303565, ...,  0.1208431 ,
        -0.02103838, -0.10411192],
       ..., 
       [ 0.0258662 , -0.02935547, -0.1106279 , ...,  0.07244253,
        -0.05265456, -0.07809801],
       [-0.0670867 ,  0.08318488, -0.14293635, ..., -0.05563153,
        -0.13975491,  0.12000434],
       [-0.02657422, -0.14235181,  0.06278005, ...,  0.15116782,
        -0.14109144,  0.17275174]], dtype=float32)

In [26]:
em = tf.nn.embedding_lookup(embedding, input_data)
em


Out[26]:
<tf.Tensor 'embedding_lookup:0' shape=(60, 50, 128) dtype=float32>

In [27]:
emp = session.run(em,feed_dict={input_data:x})
emp.shape


Out[27]:
(60, 50, 128)

In [28]:
emp[0]


Out[28]:
array([[ 0.15181227,  0.01090705,  0.03806114, ..., -0.10365069,
        -0.00031906,  0.00073797],
       [ 0.05349265,  0.15286236,  0.16492094, ..., -0.14390934,
        -0.05998485, -0.07160587],
       [ 0.08706288,  0.11176957, -0.13602386, ...,  0.13116427,
        -0.00431666, -0.07653115],
       ..., 
       [-0.01204589,  0.01478565,  0.12983467, ...,  0.03713098,
         0.07412069,  0.12602584],
       [ 0.02244808,  0.09415562,  0.16566236, ...,  0.13478656,
        -0.13616687, -0.02090606],
       [ 0.08706288,  0.11176957, -0.13602386, ...,  0.13116427,
        -0.00431666, -0.07653115]], dtype=float32)

In [29]:
inputs = tf.split(em, seq_length, 1)
inputs[0:5]


Out[29]:
[<tf.Tensor 'split:0' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:1' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:2' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:3' shape=(60, 1, 128) dtype=float32>,
 <tf.Tensor 'split:4' shape=(60, 1, 128) dtype=float32>]

In [30]:
inputs = [tf.squeeze(input_, [1]) for input_ in inputs]
inputs[0:5]


Out[30]:
[<tf.Tensor 'Squeeze:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_1:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_2:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_3:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'Squeeze_4:0' shape=(60, 128) dtype=float32>]

Feeding a batch of 50 sequence to a RNN:

  • Step 1: first character of each of the 50 sentences (in a batch) is input in parallel.
  • Step 2: second character of each of the 50 sentences is input in parallel.
  • Step n: nth character of each of the 50 sentences is input in parallel.

The parallelism is only for efficiency. Each character in a batch is handled in parallel, but the network sees one character of a sequence at a time and does the computations accordingly. All the computations involving the characters of all sequences in a batch at a given time step are done in parallel.


In [31]:
session.run(inputs[0],feed_dict={input_data:x})


Out[31]:
array([[ 0.15181227,  0.01090705,  0.03806114, ..., -0.10365069,
        -0.00031906,  0.00073797],
       [ 0.11181198,  0.05652621, -0.14609113, ...,  0.07103869,
        -0.01516595,  0.03015241],
       [-0.04029936, -0.15443259, -0.05244369, ...,  0.15671606,
         0.11565743,  0.04243562],
       ..., 
       [ 0.13116498, -0.14643349, -0.15287329, ..., -0.10149479,
        -0.10107021, -0.14191243],
       [ 0.05349265,  0.15286236,  0.16492094, ..., -0.14390934,
        -0.05998485, -0.07160587],
       [ 0.11297508, -0.08271112, -0.14546309, ...,  0.09985153,
        -0.06714724, -0.09677995]], dtype=float32)

In [32]:
stacked_cell.state_size


Out[32]:
(128, 128)

In [33]:
#outputs is 50x[60*128]
outputs, last_state = tf.contrib.legacy_seq2seq.rnn_decoder(inputs, initial_state, stacked_cell, loop_function=None, scope='rnnlm')

In [34]:
outputs[0:5]


Out[34]:
[<tf.Tensor 'rnnlm_1/multi_rnn_cell/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_1/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_2/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_3/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_4/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>]

In [35]:
test = outputs[0]
test


Out[35]:
<tf.Tensor 'rnnlm_1/multi_rnn_cell/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>

In [36]:
session.run(tf.global_variables_initializer())
session.run(test,feed_dict={input_data:x})


Out[36]:
array([[ 0.02281052, -0.10542833, -0.10928056, ...,  0.07889619,
         0.02886342, -0.02457379],
       [ 0.04373915,  0.01981135, -0.0588582 , ..., -0.05789723,
        -0.04222565,  0.18353246],
       [ 0.06870209,  0.11363489,  0.00273446, ...,  0.14812966,
         0.00024029, -0.00936932],
       ..., 
       [-0.03944357, -0.06743636, -0.04955675, ..., -0.01302843,
         0.03449786, -0.04922117],
       [-0.22960794, -0.10016541, -0.10149722, ...,  0.08571617,
         0.06665474, -0.14481398],
       [ 0.01329976,  0.09442141, -0.06003638, ...,  0.03841459,
        -0.11882372, -0.07997831]], dtype=float32)

outputs is 50x[60*128]. We need to reshape it to [60x50x128]. Then we can calculate the softmax:

softmax_w is [rnn_size, vocab_size], [128x65]

[60x50x128]x[128x65]+[60x50]


In [37]:
output = tf.reshape(tf.concat(outputs, 1), [-1, rnn_size])
output


Out[37]:
<tf.Tensor 'Reshape:0' shape=(3000, 128) dtype=float32>

In [38]:
logits = tf.matmul(output, softmax_w) + softmax_b
logits


Out[38]:
<tf.Tensor 'add:0' shape=(3000, 65) dtype=float32>

In [39]:
probs = tf.nn.softmax(logits)
probs


Out[39]:
<tf.Tensor 'Softmax:0' shape=(3000, 65) dtype=float32>

In [40]:
session.run(tf.global_variables_initializer())
session.run(probs,feed_dict={input_data:x})


Out[40]:
array([[ 0.01902954,  0.01288626,  0.01314334, ...,  0.01411375,
         0.01885167,  0.01985767],
       [ 0.01604957,  0.0141206 ,  0.01180383, ...,  0.01324264,
         0.0210009 ,  0.01870311],
       [ 0.01307559,  0.01469065,  0.0116342 , ...,  0.01260926,
         0.02340574,  0.02297945],
       ..., 
       [ 0.01656532,  0.02045839,  0.01115488, ...,  0.01176428,
         0.02083436,  0.01658119],
       [ 0.01537982,  0.01191368,  0.01081387, ...,  0.01424735,
         0.01677993,  0.01711635],
       [ 0.01438247,  0.01752337,  0.01026943, ...,  0.01559825,
         0.02184941,  0.01647909]], dtype=float32)

In [41]:
loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example([logits],
                [tf.reshape(targets, [-1])],
                [tf.ones([batch_size * seq_length])],
                vocab_size)

In [42]:
cost = tf.reduce_sum(loss) / batch_size / seq_length
cost


Out[42]:
<tf.Tensor 'truediv_1:0' shape=() dtype=float32>

In [43]:
final_state = last_state
final_state


Out[43]:
(<tf.Tensor 'rnnlm_1/multi_rnn_cell_49/cell_0/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>,
 <tf.Tensor 'rnnlm_1/multi_rnn_cell_49/cell_1/basic_rnn_cell/Tanh:0' shape=(60, 128) dtype=float32>)

In [44]:
lr = tf.Variable(0.0, trainable=False)

In [45]:
grad_clip =5.
tvars = tf.trainable_variables()

In [46]:
tvars


Out[46]:
[<tf.Variable 'rnnlm/softmax_w:0' shape=(128, 65) dtype=float32_ref>,
 <tf.Variable 'rnnlm/softmax_b:0' shape=(65,) dtype=float32_ref>,
 <tf.Variable 'rnnlm/embedding:0' shape=(65, 128) dtype=float32_ref>,
 <tf.Variable 'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/weights:0' shape=(256, 128) dtype=float32_ref>,
 <tf.Variable 'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/biases:0' shape=(128,) dtype=float32_ref>,
 <tf.Variable 'rnnlm/multi_rnn_cell/cell_1/basic_rnn_cell/weights:0' shape=(256, 128) dtype=float32_ref>,
 <tf.Variable 'rnnlm/multi_rnn_cell/cell_1/basic_rnn_cell/biases:0' shape=(128,) dtype=float32_ref>]

In [47]:
session.run(tf.global_variables_initializer())
[v.name for v in tf.global_variables()]


Out[47]:
['rnnlm/softmax_w:0',
 'rnnlm/softmax_b:0',
 'rnnlm/embedding:0',
 'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/weights:0',
 'rnnlm/multi_rnn_cell/cell_0/basic_rnn_cell/biases:0',
 'rnnlm/multi_rnn_cell/cell_1/basic_rnn_cell/weights:0',
 'rnnlm/multi_rnn_cell/cell_1/basic_rnn_cell/biases:0',
 'Variable:0']

In [48]:
grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), grad_clip)
grads


Out[48]:
[<tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_0:0' shape=(128, 65) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_1:0' shape=(65,) dtype=float32>,
 <tensorflow.python.framework.ops.IndexedSlices at 0x7f85994bd828>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_3:0' shape=(256, 128) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_4:0' shape=(128,) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_5:0' shape=(256, 128) dtype=float32>,
 <tf.Tensor 'clip_by_global_norm/clip_by_global_norm/_6:0' shape=(128,) dtype=float32>]

In [49]:
session.run(grads, feed_dict)[0]


Out[49]:
array([[  1.07060326e-03,  -5.24530653e-04,  -1.91177626e-03, ...,
          9.48221641e-05,   2.58255837e-04,   3.92583752e-04],
       [ -1.73673022e-03,  -3.51239811e-03,  -1.25999434e-03, ...,
          1.84613760e-04,   2.57512496e-04,   1.03761151e-04],
       [  1.92968792e-03,   5.13417297e-04,  -3.00116517e-05, ...,
         -1.03049410e-04,  -1.06119289e-04,  -2.45183386e-04],
       ..., 
       [ -9.14455857e-04,  -1.67382008e-03,  -3.30934674e-03, ...,
          5.25497366e-04,   4.40616073e-04,   3.70348018e-04],
       [ -4.17240756e-03,  -3.40628391e-03,  -1.02739211e-03, ...,
          6.01988751e-04,   4.35684458e-04,   2.87690957e-04],
       [  1.90374791e-03,   2.27611838e-03,   3.41343286e-04, ...,
         -5.74903897e-05,   1.40870125e-05,  -1.27287320e-04]], dtype=float32)

In [50]:
optimizer = tf.train.AdamOptimizer(lr)
train_op = optimizer.apply_gradients(zip(grads, tvars))

Using classes

Now that we have learned how the networks work, we can put all together:


In [51]:
class LSTMModel():
    def __init__(self,sample=False):
        rnn_size = 128 # size of RNN hidden state vector
        batch_size = 60 # minibatch size, i.e. size of dataset in each epoch
        seq_length = 50 # RNN sequence length
        num_layers = 2 # number of layers in the RNN
        vocab_size = 65
        grad_clip = 5.
        if sample:
            print("sample mode")
            batch_size = 1
            seq_length = 1
        # model.cell.state_size is (128, 128)
        with tf.variable_scope('lstm_model_cell'):
            reuse = tf.get_variable_scope().reuse
            self.stacked_cell = tf.contrib.rnn.MultiRNNCell([tf.contrib.rnn.BasicRNNCell(rnn_size, reuse=reuse) 
                                                         for _ in range(num_layers)])

        self.input_data = tf.placeholder(tf.int32, [batch_size, seq_length])
        self.targets = tf.placeholder(tf.int32, [batch_size, seq_length])
        # Initial state of the LSTM memory.
        # The memory state of the network is initialized with a vector of zeros and gets updated after reading each char. 
        self.initial_state = stacked_cell.zero_state(batch_size, tf.float32) #why batch_size

        with tf.variable_scope('rnnlm_class1'):
            softmax_w = tf.get_variable("softmax_w", [rnn_size, vocab_size]) #128x65
            softmax_b = tf.get_variable("softmax_b", [vocab_size]) # 1x65
            with tf.device("/cpu:0"):
                embedding = tf.get_variable("embedding", [vocab_size, rnn_size])  #65x128
                inputs = tf.split(tf.nn.embedding_lookup(embedding, self.input_data), seq_length, 1)
                inputs = [tf.squeeze(input_, [1]) for input_ in inputs]
                #inputs = tf.split(em, seq_length, 1)

        # The value of state is updated after processing each batch of chars.
        outputs, last_state = tf.contrib.legacy_seq2seq.rnn_decoder(inputs, self.initial_state, self.stacked_cell, loop_function=None, scope='rnnlm_class1')
        output = tf.reshape(tf.concat(outputs,1), [-1, rnn_size])
        self.logits = tf.matmul(output, softmax_w) + softmax_b
        self.probs = tf.nn.softmax(self.logits)
        loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example([self.logits],
                [tf.reshape(self.targets, [-1])],
                [tf.ones([batch_size * seq_length])],
                vocab_size)
        self.cost = tf.reduce_sum(loss) / batch_size / seq_length
        self.final_state = last_state
        self.lr = tf.Variable(0.0, trainable=False)
        tvars = tf.trainable_variables()
        grads, _ = tf.clip_by_global_norm(tf.gradients(self.cost, tvars),grad_clip)
        optimizer = tf.train.AdamOptimizer(self.lr)
        self.train_op = optimizer.apply_gradients(zip(grads, tvars))
        
    def sample(self, sess, chars, vocab, num=200, prime='The ', sampling_type=1):
        state = sess.run(self.stacked_cell.zero_state(1, tf.float32))
        for char in prime[:-1]:
            x = np.zeros((1, 1))
            x[0, 0] = vocab[char]
            feed = {self.input_data: x, self.initial_state:state}
            [state] = sess.run([self.final_state], feed)

        def weighted_pick(weights):
            t = np.cumsum(weights)
            s = np.sum(weights)
            return(int(np.searchsorted(t, np.random.rand(1)*s)))

        ret = prime
        char = prime[-1]
        for n in range(num):
            x = np.zeros((1, 1))
            x[0, 0] = vocab[char]
            feed = {self.input_data: x, self.initial_state:state}
            [probs, state] = sess.run([self.probs, self.final_state], feed)
            p = probs[0]

            if sampling_type == 0:
                sample = np.argmax(p)
            elif sampling_type == 2:
                if char == ' ':
                    sample = weighted_pick(p)
                else:
                    sample = np.argmax(p)
            else: # sampling_type == 1 default:
                sample = weighted_pick(p)

            pred = chars[sample]
            ret += pred
            char = pred
        return ret

the input is always a matrix of of shape [n x m]. Where n is the batch size, m is the feature size. In our case, the input shape will be [60 x ??].

size of data is 1113000, number of batches are 371, batch size is 60 and sequence length is 50. so, 5060371= 1113000

we have 50 epochs. each input matrix will represent 1 update per epoch.

Creating the LSTM object


In [52]:
with tf.variable_scope("rnn"):
    model = LSTMModel()

In [53]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
e=1
sess.run(tf.assign(model.lr, learning_rate * (decay_rate ** e)))
data_loader.reset_batch_pointer()
state = sess.run(model.initial_state)
state


Out[53]:
(array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        ..., 
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32),
 array([[ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        ..., 
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.],
        [ 0.,  0.,  0., ...,  0.,  0.,  0.]], dtype=float32))

In [54]:
x, y = data_loader.next_batch()
feed = {model.input_data: x, model.targets: y, model.initial_state:state}

In [55]:
train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
train_loss


Out[55]:
4.1709843

In [56]:
state


Out[56]:
(array([[-0.09675249,  0.07419401,  0.10559285, ...,  0.1317708 ,
         -0.0838413 , -0.13619834],
        [-0.16656482,  0.2262025 ,  0.21313049, ..., -0.12080994,
         -0.05027504,  0.03160353],
        [-0.11570983,  0.15961714,  0.09448375, ..., -0.08197498,
         -0.16918509,  0.06766865],
        ..., 
        [ 0.03510161,  0.00867428, -0.1297501 , ...,  0.01044595,
         -0.0471494 , -0.08271383],
        [-0.25937682,  0.18290329, -0.20928687, ...,  0.10053991,
         -0.00289368,  0.04116455],
        [-0.10159639,  0.09477679,  0.12515672, ..., -0.00159762,
         -0.16833328,  0.05839593]], dtype=float32),
 array([[ 0.04377746,  0.05588173,  0.33478257, ...,  0.38336733,
         -0.12499455,  0.11230542],
        [ 0.26025739, -0.16916114,  0.1594902 , ...,  0.09015638,
          0.01291311,  0.18199916],
        [-0.11651207,  0.15768678, -0.0537589 , ..., -0.136567  ,
         -0.21685205, -0.14383744],
        ..., 
        [ 0.34668308, -0.02540538, -0.23776405, ...,  0.13425317,
          0.23202583,  0.2141573 ],
        [ 0.09686469, -0.26220188,  0.27067024, ...,  0.17269139,
          0.14760818, -0.09695639],
        [-0.00746535,  0.21565835, -0.04120752, ..., -0.1490697 ,
          0.00494924, -0.04040538]], dtype=float32))

Train usinng LSTMModel class


In [57]:
initial_lr = 0.01
num_epochs = 50

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for e in range(num_epochs): # num_epochs is 5 for test, but should be higher
        current_lr = initial_lr * (decay_rate ** e)
        sess.run(tf.assign(model.lr, current_lr))
        print('Epoch {} ({} / {} batches, lr={:.4f})'.format(
            e+1,
            (e+1) * data_loader.num_batches, 
            num_epochs * data_loader.num_batches,
            current_lr
        ))
        data_loader.reset_batch_pointer()
        state = sess.run(model.initial_state) # (2x[60x128])
        for b in range(data_loader.num_batches): #for each batch
            start = time.time()
            x, y = data_loader.next_batch()
            feed = {model.input_data: x, model.targets: y, model.initial_state:state}
            train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
            end = time.time()
        print("Train_loss={:.3f}   Time/Batch={:.3f} ms".format(
            train_loss, 
            (end - start) * 1000
        ))
        print()
        #model.sample(sess, data_loader.chars , data_loader.vocab, num=200, prime='The ', sampling_type=1)


Epoch 1 (371 / 18550 batches, lr=0.0100)
Train_loss=1.731   Time/Batch=38.892 ms

Epoch 2 (742 / 18550 batches, lr=0.0097)
Train_loss=1.641   Time/Batch=45.372 ms

Epoch 3 (1113 / 18550 batches, lr=0.0094)
Train_loss=1.623   Time/Batch=52.473 ms

Epoch 4 (1484 / 18550 batches, lr=0.0091)
Train_loss=1.616   Time/Batch=52.220 ms

Epoch 5 (1855 / 18550 batches, lr=0.0089)
Train_loss=1.598   Time/Batch=54.698 ms

Epoch 6 (2226 / 18550 batches, lr=0.0086)
Train_loss=1.591   Time/Batch=58.853 ms

Epoch 7 (2597 / 18550 batches, lr=0.0083)
Train_loss=1.579   Time/Batch=55.002 ms

Epoch 8 (2968 / 18550 batches, lr=0.0081)
Train_loss=1.589   Time/Batch=44.984 ms

Epoch 9 (3339 / 18550 batches, lr=0.0078)
Train_loss=1.579   Time/Batch=98.104 ms

Epoch 10 (3710 / 18550 batches, lr=0.0076)
Train_loss=1.587   Time/Batch=36.872 ms

Epoch 11 (4081 / 18550 batches, lr=0.0074)
Train_loss=1.573   Time/Batch=59.055 ms

Epoch 12 (4452 / 18550 batches, lr=0.0072)
Train_loss=1.567   Time/Batch=89.665 ms

Epoch 13 (4823 / 18550 batches, lr=0.0069)
Train_loss=1.568   Time/Batch=45.539 ms

Epoch 14 (5194 / 18550 batches, lr=0.0067)
Train_loss=1.565   Time/Batch=84.331 ms

Epoch 15 (5565 / 18550 batches, lr=0.0065)
Train_loss=1.565   Time/Batch=60.008 ms

Epoch 16 (5936 / 18550 batches, lr=0.0063)
Train_loss=1.552   Time/Batch=72.037 ms

Epoch 17 (6307 / 18550 batches, lr=0.0061)
Train_loss=1.557   Time/Batch=95.832 ms

Epoch 18 (6678 / 18550 batches, lr=0.0060)
Train_loss=1.556   Time/Batch=69.829 ms

Epoch 19 (7049 / 18550 batches, lr=0.0058)
Train_loss=1.540   Time/Batch=77.337 ms

Epoch 20 (7420 / 18550 batches, lr=0.0056)
Train_loss=1.544   Time/Batch=61.575 ms

Epoch 21 (7791 / 18550 batches, lr=0.0054)
Train_loss=1.544   Time/Batch=36.538 ms

Epoch 22 (8162 / 18550 batches, lr=0.0053)
Train_loss=1.539   Time/Batch=86.214 ms

Epoch 23 (8533 / 18550 batches, lr=0.0051)
Train_loss=1.537   Time/Batch=65.374 ms

Epoch 24 (8904 / 18550 batches, lr=0.0050)
Train_loss=1.530   Time/Batch=45.184 ms

Epoch 25 (9275 / 18550 batches, lr=0.0048)
Train_loss=1.535   Time/Batch=173.852 ms

Epoch 26 (9646 / 18550 batches, lr=0.0047)
Train_loss=1.509   Time/Batch=52.137 ms

Epoch 27 (10017 / 18550 batches, lr=0.0045)
Train_loss=1.531   Time/Batch=68.971 ms

Epoch 28 (10388 / 18550 batches, lr=0.0044)
Train_loss=1.508   Time/Batch=109.409 ms

Epoch 29 (10759 / 18550 batches, lr=0.0043)
Train_loss=1.513   Time/Batch=45.965 ms

Epoch 30 (11130 / 18550 batches, lr=0.0041)
Train_loss=1.509   Time/Batch=44.678 ms

Epoch 31 (11501 / 18550 batches, lr=0.0040)
Train_loss=1.496   Time/Batch=44.377 ms

Epoch 32 (11872 / 18550 batches, lr=0.0039)
Train_loss=1.506   Time/Batch=97.122 ms

Epoch 33 (12243 / 18550 batches, lr=0.0038)
Train_loss=1.505   Time/Batch=45.927 ms

Epoch 34 (12614 / 18550 batches, lr=0.0037)
Train_loss=1.509   Time/Batch=52.007 ms

Epoch 35 (12985 / 18550 batches, lr=0.0036)
Train_loss=1.487   Time/Batch=62.120 ms

Epoch 36 (13356 / 18550 batches, lr=0.0034)
Train_loss=1.487   Time/Batch=41.154 ms

Epoch 37 (13727 / 18550 batches, lr=0.0033)
Train_loss=1.497   Time/Batch=38.228 ms

Epoch 38 (14098 / 18550 batches, lr=0.0032)
Train_loss=1.497   Time/Batch=58.795 ms

Epoch 39 (14469 / 18550 batches, lr=0.0031)
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-57-e6fb17f4ebfd> in <module>()
     19             x, y = data_loader.next_batch()
     20             feed = {model.input_data: x, model.targets: y, model.initial_state:state}
---> 21             train_loss, state, _ = sess.run([model.cost, model.final_state, model.train_op], feed)
     22             end = time.time()
     23         print("Train_loss={:.3f}   Time/Batch={:.3f} ms".format(

/home/santi/miniconda3/envs/data_science/lib/python3.5/site-packages/tensorflow/python/client/session.py in run(self, fetches, feed_dict, options, run_metadata)
    776     try:
    777       result = self._run(None, fetches, feed_dict, options_ptr,
--> 778                          run_metadata_ptr)
    779       if run_metadata:
    780         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

/home/santi/miniconda3/envs/data_science/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
    980     if final_fetches or final_targets:
    981       results = self._do_run(handle, final_targets, final_fetches,
--> 982                              feed_dict_string, options, run_metadata)
    983     else:
    984       results = []

/home/santi/miniconda3/envs/data_science/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1030     if handle is None:
   1031       return self._do_call(_run_fn, self._session, feed_dict, fetch_list,
-> 1032                            target_list, options, run_metadata)
   1033     else:
   1034       return self._do_call(_prun_fn, self._session, handle, feed_dict,

/home/santi/miniconda3/envs/data_science/lib/python3.5/site-packages/tensorflow/python/client/session.py in _do_call(self, fn, *args)
   1037   def _do_call(self, fn, *args):
   1038     try:
-> 1039       return fn(*args)
   1040     except errors.OpError as e:
   1041       message = compat.as_text(e.message)

/home/santi/miniconda3/envs/data_science/lib/python3.5/site-packages/tensorflow/python/client/session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1019         return tf_session.TF_Run(session, options,
   1020                                  feed_dict, fetch_list, target_list,
-> 1021                                  status, run_metadata)
   1022 
   1023     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

Sample


In [ ]:
sess = tf.InteractiveSession()
with tf.variable_scope("sample_test"):
    sess.run(tf.global_variables_initializer())
    m = LSTMModel(sample=True)

In [ ]:
prime='The '
num=200
sampling_type=1
vocab=data_loader.vocab
chars=data_loader.chars

In [ ]:
sess.run(m.initial_state)

In [ ]:
#print state
sess.run(tf.global_variables_initializer())
state=sess.run(m.initial_state)
for char in prime[:-1]:
    x = np.zeros((1, 1))
    x[0, 0] = vocab[char]
    feed = {m.input_data: x, m.initial_state:state}
    [state] = sess.run([m.final_state], feed)

In [ ]:
state

In [ ]:
def weighted_pick(weights):
    t = np.cumsum(weights)
    s = np.sum(weights)
    return(int(np.searchsorted(t, np.random.rand(1)*s)))

ret = prime
char = prime[-1]
for n in range(num):
    x = np.zeros((1, 1))
    x[0, 0] = vocab[char]
    feed = {m.input_data: x, m.initial_state:state}
    [probs, state] = sess.run([m.probs, m.final_state], feed)
    p = probs[0]

    if sampling_type == 0:
        sample = np.argmax(p)
    elif sampling_type == 2:
        if char == ' ':
            sample = weighted_pick(p)
        else:
            sample = np.argmax(p)
    else: # sampling_type == 1 default:
        sample = weighted_pick(p)

    pred = chars[sample]
    ret += pred
    char = pred

In [ ]:
ret

Sample using function


In [66]:
sess = tf.InteractiveSession()
sess.run(tf.global_variables_initializer())
state=sess.run(m.initial_state)
m.sample(sess, data_loader.chars , data_loader.vocab, num=200, prime='The ', sampling_type=1)


Out[66]:
"The 3ESYEnEyh.3seImxt.k\nU;Ss\n?k?am$KGASlvrd-PoXvX:CyDNDDXOHF ?Hclt?oFG-u?rRaob'yU&KwNW3QdNHO3 WzsAISCQl?wlca$AfV&awKe\niw&w'J'Gz!&h'uKM,uRzJpN:yv?jNUHHlqCDYnjhHScjKHs?q'mkp\nK$YpdksOTGjztDsrs$K-W&McaY$LD!VQ"

In [ ]: