Text Generation using a RNN

This notebook demonstrates how to generate text using an RNN using tf.keras and eager execution. If you like, you can write a similar model using less code. Here, we show a lower-level impementation that's useful to understand as prework before diving in to deeper examples in a similar, like .

This notebook is an end-to-end example. When you run it, it will download a dataset of Shakespeare's writing. We'll use a collection of plays, borrowed from Andrej Karpathy's excellent The Unreasonable Effectiveness of Recurrent Neural Networks. The notebook will train a model, and use it to generate sample output.

Here is the output(with start string='w') after training a single layer GRU for 30 epochs with the default settings below:

were to the death of him
And nothing of the field in the view of hell,
When I said, banish him, I will not burn thee that would live.

HENRY BOLINGBROKE:
My gracious uncle--

DUKE OF YORK:
As much disgraced to the court, the gods them speak,
And now in peace himself excuse thee in the world.

HORTENSIO:
Madam, 'tis not the cause of the counterfeit of the earth,
And leave me to the sun that set them on the earth
And leave the world and are revenged for thee.

GLOUCESTER:
I would they were talking with the very name of means
To make a puppet of a guest, and therefore, good Grumio,
Nor arm'd to prison, o' the clouds, of the whole field,
With the admire
With the feeding of thy chair, and we have heard it so,
I thank you, sir, he is a visor friendship with your silly your bed.

SAMPSON:
I do desire to live, I pray: some stand of the minds, make thee remedies
With the enemies of my soul.

MENENIUS:
I'll keep the cause of my mistress.

POLIXENES:
My brother Marcius!

Second Servant:
Will't ple

Of course, while some of the sentences are grammatical, most do not make sense. But, consider:

Our model is character based (when we began training, it did not yet know how to spell a valid English word, or that words were even a unit of text).
The structure of the output resembles a play (blocks begin with a speaker name, in all caps similar to the original text). Sentences generally end with a period. If you look at the text from a distance (or don't read the invididual words too closely, it appears as if it's an excerpt from a play).

As a next step, you can experiment training the model on a different dataset - any large text file(ASCII) will do, and you can modify a single line of code below to make that change. Have fun!

Install unidecode library

A helpful library to convert unicode to ASCII.



In [0]:

    
!pip install unidecode

Import tensorflow and enable eager execution.



In [0]:

    
# Import TensorFlow >= 1.10 and enable eager execution
import tensorflow as tf

# Note: Once you enable eager execution, it cannot be disabled. 
tf.enable_eager_execution()

import numpy as np
import os
import re
import random
import unidecode
import time

Download the dataset

In this example, we will use the shakespeare dataset. You can use any other dataset that you like.



In [0]:

    
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')

Read the dataset



In [0]:

    
text = unidecode.unidecode(open(path_to_file).read())
# length of text is the number of characters in it
print (len(text))

Creating dictionaries to map from characters to their indices and vice-versa, which will be used to vectorize the inputs



In [0]:

    
# unique contains all the unique characters in the file
unique = sorted(set(text))

# creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(unique)}
idx2char = {i:u for i, u in enumerate(unique)}



In [0]:

    
# setting the maximum length sentence we want for a single input in characters
max_length = 100

# length of the vocabulary in chars
vocab_size = len(unique)

# the embedding dimension 
embedding_dim = 256

# number of RNN (here GRU) units
units = 1024

# batch size 
BATCH_SIZE = 64

# buffer size to shuffle our dataset
BUFFER_SIZE = 10000

Creating the input and output tensors

Vectorizing the input and the target text because our model cannot understand strings only numbers.

But first, we need to create the input and output vectors. Remember the max_length we set above, we will use it here. We are creating max_length chunks of input, where each input vector is all the characters in that chunk except the last and the target vector is all the characters in that chunk except the first.

For example, consider that the string = 'tensorflow' and the max_length is 9

So, the input = 'tensorflo' and output = 'ensorflow'

After creating the vectors, we convert each character into numbers using the char2idx dictionary we created above.



In [0]:

    
input_text = []
target_text = []

for f in range(0, len(text)-max_length, max_length):
    inps = text[f:f+max_length]
    targ = text[f+1:f+1+max_length]

    input_text.append([char2idx[i] for i in inps])
    target_text.append([char2idx[t] for t in targ])
    
print (np.array(input_text).shape)
print (np.array(target_text).shape)

Creating batches and shuffling them using tf.data



In [0]:

    
dataset = tf.data.Dataset.from_tensor_slices((input_text, target_text)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)

Creating the model

We use the Model Subclassing API which gives us full flexibility to create the model and change it however we like. We use 3 layers to define our model.

Embedding layer
GRU layer (you can use an LSTM layer here)
Fully connected layer



In [0]:

    
class Model(tf.keras.Model):
  def __init__(self, vocab_size, embedding_dim, units, batch_size):
    super(Model, self).__init__()
    self.units = units
    self.batch_sz = batch_size

    self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)

    if tf.test.is_gpu_available():
      self.gru = tf.keras.layers.CuDNNGRU(self.units, 
                                          return_sequences=True, 
                                          return_state=True, 
                                          recurrent_initializer='glorot_uniform')
    else:
      self.gru = tf.keras.layers.GRU(self.units, 
                                     return_sequences=True, 
                                     return_state=True, 
                                     recurrent_activation='sigmoid', 
                                     recurrent_initializer='glorot_uniform')

    self.fc = tf.keras.layers.Dense(vocab_size)
        
  def call(self, x, hidden):
    x = self.embedding(x)

    # output shape == (batch_size, max_length, hidden_size) 
    # states shape == (batch_size, hidden_size)

    # states variable to preserve the state of the model
    # this will be used to pass at every step to the model while training
    output, states = self.gru(x, initial_state=hidden)


    # reshaping the output so that we can pass it to the Dense layer
    # after reshaping the shape is (batch_size * max_length, hidden_size)
    output = tf.reshape(output, (-1, output.shape[2]))

    # The dense layer will output predictions for every time_steps(max_length)
    # output shape after the dense layer == (max_length * batch_size, vocab_size)
    x = self.fc(output)

    return x, states

Call the model and set the optimizer and the loss function



In [0]:

    
model = Model(vocab_size, embedding_dim, units, BATCH_SIZE)



In [0]:

    
optimizer = tf.train.AdamOptimizer()

# using sparse_softmax_cross_entropy so that we don't have to create one-hot vectors
def loss_function(real, preds):
    return tf.losses.sparse_softmax_cross_entropy(labels=real, logits=preds)

Checkpoints (Object-based saving)



In [0]:

    
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
                                 model=model)

Train the model

Here we will use a custom training loop with the help of GradientTape()

We initialize the hidden state of the model with zeros and shape == (batch_size, number of rnn units). We do this by calling the function defined while creating the model.
Next, we iterate over the dataset(batch by batch) and calculate the predictions and the hidden states associated with that input.
There are a lot of interesting things happening here.
- The model gets hidden state(initialized with 0), lets call that H0 and the first batch of input, lets call that I0.
- The model then returns the predictions P1 and H1.
- For the next batch of input, the model receives I1 and H1.
- The interesting thing here is that we pass H1 to the model with I1 which is how the model learns. The context learned from batch to batch is contained in the hidden state.
- We continue doing this until the dataset is exhausted and then we start a new epoch and repeat this.
After calculating the predictions, we calculate the loss using the loss function defined above. Then we calculate the gradients of the loss with respect to the model variables(input)
Finally, we take a step in that direction with the help of the optimizer using the apply_gradients function.

Note:- If you are running this notebook in Colab which has a Tesla K80 GPU it takes about 23 seconds per epoch.



In [0]:

    
# Training step

EPOCHS = 20

for epoch in range(EPOCHS):
    start = time.time()
    
    # initializing the hidden state at the start of every epoch
    hidden = model.reset_states()
    
    for (batch, (inp, target)) in enumerate(dataset):
          with tf.GradientTape() as tape:
              # feeding the hidden state back into the model
              # This is the interesting step
              predictions, hidden = model(inp, hidden)
              
              # reshaping the target because that's how the 
              # loss function expects it
              target = tf.reshape(target, (-1,))
              loss = loss_function(target, predictions)
              
          grads = tape.gradient(loss, model.variables)
          optimizer.apply_gradients(zip(grads, model.variables))

          if batch % 100 == 0:
              print ('Epoch {} Batch {} Loss {:.4f}'.format(epoch+1,
                                                            batch,
                                                            loss))
    # saving (checkpoint) the model every 5 epochs
    if (epoch + 1) % 5 == 0:
      checkpoint.save(file_prefix = checkpoint_prefix)

    print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
    print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))

Restore the latest checkpoint



In [0]:

    
# restoring the latest checkpoint in checkpoint_dir
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))

Predicting using our trained model

The below code block is used to generated the text

We start by choosing a start string and initializing the hidden state and setting the number of characters we want to generate.
We get predictions using the start_string and the hidden state
Then we use argmax to calculate the index of the predicted word. We use this predicted word as our next input to the model
The hidden state returned by the model is fed back into the model so that it now has more context rather than just one word. After we predict the next word, the modified hidden states are again fed back into the model, which is how it learns as it gets more context from the previously predicted words.
If you see the predictions, the model knows when to capitalize, make paragraphs and the text follows a shakespeare style of writing which is pretty awesome!



In [0]:

    
# Evaluation step(generating text using the model learned)

# number of characters to generate
num_generate = 1000

# You can change the start string to experiment
start_string = 'Q'
# converting our start string to numbers(vectorizing!) 
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)

# empty string to store our results
text_generated = ''

# hidden state shape == (batch_size, number of rnn units); here batch size == 1
hidden = [tf.zeros((1, units))]
for i in range(num_generate):
    predictions, hidden = model(input_eval, hidden)

    # using argmax to predict the word returned by the model
    predicted_id = tf.argmax(predictions[-1]).numpy()
    
    # We pass the predicted word as the next input to the model
    # along with the previous hidden state
    input_eval = tf.expand_dims([predicted_id], 0)
    
    text_generated += idx2char[predicted_id]

print (start_string + text_generated)

Next steps

Change the start string to a different character, or the start of a sentence.
Experiment with training on a different, or with different parameters. Project Gutenberg, for example, contains a large collection of books.
Add another RNN layer.



In [0]: