Licensed under the Apache License, Version 2.0 (the "License").
|
This notebook demonstrates how to generate text using an RNN using tf.keras and eager execution. If you like, you can write a similar model using less code. Here, we show a lower-level impementation that's useful to understand as prework before diving in to deeper examples in a similar, like .
This notebook is an end-to-end example. When you run it, it will download a dataset of Shakespeare's writing. We'll use a collection of plays, borrowed from Andrej Karpathy's excellent The Unreasonable Effectiveness of Recurrent Neural Networks. The notebook will train a model, and use it to generate sample output.
Here is the output(with start string='w') after training a single layer GRU for 30 epochs with the default settings below:
were to the death of him
And nothing of the field in the view of hell,
When I said, banish him, I will not burn thee that would live.
HENRY BOLINGBROKE:
My gracious uncle--
DUKE OF YORK:
As much disgraced to the court, the gods them speak,
And now in peace himself excuse thee in the world.
HORTENSIO:
Madam, 'tis not the cause of the counterfeit of the earth,
And leave me to the sun that set them on the earth
And leave the world and are revenged for thee.
GLOUCESTER:
I would they were talking with the very name of means
To make a puppet of a guest, and therefore, good Grumio,
Nor arm'd to prison, o' the clouds, of the whole field,
With the admire
With the feeding of thy chair, and we have heard it so,
I thank you, sir, he is a visor friendship with your silly your bed.
SAMPSON:
I do desire to live, I pray: some stand of the minds, make thee remedies
With the enemies of my soul.
MENENIUS:
I'll keep the cause of my mistress.
POLIXENES:
My brother Marcius!
Second Servant:
Will't ple
Of course, while some of the sentences are grammatical, most do not make sense. But, consider:
Our model is character based (when we began training, it did not yet know how to spell a valid English word, or that words were even a unit of text).
The structure of the output resembles a play (blocks begin with a speaker name, in all caps similar to the original text). Sentences generally end with a period. If you look at the text from a distance (or don't read the invididual words too closely, it appears as if it's an excerpt from a play).
As a next step, you can experiment training the model on a different dataset - any large text file(ASCII) will do, and you can modify a single line of code below to make that change. Have fun!
In [0]:
!pip install unidecode
In [0]:
# Import TensorFlow >= 1.10 and enable eager execution
import tensorflow as tf
# Note: Once you enable eager execution, it cannot be disabled.
tf.enable_eager_execution()
import numpy as np
import os
import re
import random
import unidecode
import time
In this example, we will use the shakespeare dataset. You can use any other dataset that you like.
In [0]:
path_to_file = tf.keras.utils.get_file('shakespeare.txt', 'https://storage.googleapis.com/download.tensorflow.org/data/shakespeare.txt')
In [0]:
text = unidecode.unidecode(open(path_to_file).read())
# length of text is the number of characters in it
print (len(text))
Creating dictionaries to map from characters to their indices and vice-versa, which will be used to vectorize the inputs
In [0]:
# unique contains all the unique characters in the file
unique = sorted(set(text))
# creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(unique)}
idx2char = {i:u for i, u in enumerate(unique)}
In [0]:
# setting the maximum length sentence we want for a single input in characters
max_length = 100
# length of the vocabulary in chars
vocab_size = len(unique)
# the embedding dimension
embedding_dim = 256
# number of RNN (here GRU) units
units = 1024
# batch size
BATCH_SIZE = 64
# buffer size to shuffle our dataset
BUFFER_SIZE = 10000
Vectorizing the input and the target text because our model cannot understand strings only numbers.
But first, we need to create the input and output vectors. Remember the max_length we set above, we will use it here. We are creating max_length chunks of input, where each input vector is all the characters in that chunk except the last and the target vector is all the characters in that chunk except the first.
For example, consider that the string = 'tensorflow' and the max_length is 9
So, the input = 'tensorflo'
and output = 'ensorflow'
After creating the vectors, we convert each character into numbers using the char2idx dictionary we created above.
In [0]:
input_text = []
target_text = []
for f in range(0, len(text)-max_length, max_length):
inps = text[f:f+max_length]
targ = text[f+1:f+1+max_length]
input_text.append([char2idx[i] for i in inps])
target_text.append([char2idx[t] for t in targ])
print (np.array(input_text).shape)
print (np.array(target_text).shape)
In [0]:
dataset = tf.data.Dataset.from_tensor_slices((input_text, target_text)).shuffle(BUFFER_SIZE)
dataset = dataset.batch(BATCH_SIZE, drop_remainder=True)
In [0]:
class Model(tf.keras.Model):
def __init__(self, vocab_size, embedding_dim, units, batch_size):
super(Model, self).__init__()
self.units = units
self.batch_sz = batch_size
self.embedding = tf.keras.layers.Embedding(vocab_size, embedding_dim)
if tf.test.is_gpu_available():
self.gru = tf.keras.layers.CuDNNGRU(self.units,
return_sequences=True,
return_state=True,
recurrent_initializer='glorot_uniform')
else:
self.gru = tf.keras.layers.GRU(self.units,
return_sequences=True,
return_state=True,
recurrent_activation='sigmoid',
recurrent_initializer='glorot_uniform')
self.fc = tf.keras.layers.Dense(vocab_size)
def call(self, x, hidden):
x = self.embedding(x)
# output shape == (batch_size, max_length, hidden_size)
# states shape == (batch_size, hidden_size)
# states variable to preserve the state of the model
# this will be used to pass at every step to the model while training
output, states = self.gru(x, initial_state=hidden)
# reshaping the output so that we can pass it to the Dense layer
# after reshaping the shape is (batch_size * max_length, hidden_size)
output = tf.reshape(output, (-1, output.shape[2]))
# The dense layer will output predictions for every time_steps(max_length)
# output shape after the dense layer == (max_length * batch_size, vocab_size)
x = self.fc(output)
return x, states
In [0]:
model = Model(vocab_size, embedding_dim, units, BATCH_SIZE)
In [0]:
optimizer = tf.train.AdamOptimizer()
# using sparse_softmax_cross_entropy so that we don't have to create one-hot vectors
def loss_function(real, preds):
return tf.losses.sparse_softmax_cross_entropy(labels=real, logits=preds)
In [0]:
checkpoint_dir = './training_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, "ckpt")
checkpoint = tf.train.Checkpoint(optimizer=optimizer,
model=model)
Here we will use a custom training loop with the help of GradientTape()
We initialize the hidden state of the model with zeros and shape == (batch_size, number of rnn units). We do this by calling the function defined while creating the model.
Next, we iterate over the dataset(batch by batch) and calculate the predictions and the hidden states associated with that input.
There are a lot of interesting things happening here.
After calculating the predictions, we calculate the loss using the loss function defined above. Then we calculate the gradients of the loss with respect to the model variables(input)
Finally, we take a step in that direction with the help of the optimizer using the apply_gradients function.
Note:- If you are running this notebook in Colab which has a Tesla K80 GPU it takes about 23 seconds per epoch.
In [0]:
# Training step
EPOCHS = 20
for epoch in range(EPOCHS):
start = time.time()
# initializing the hidden state at the start of every epoch
hidden = model.reset_states()
for (batch, (inp, target)) in enumerate(dataset):
with tf.GradientTape() as tape:
# feeding the hidden state back into the model
# This is the interesting step
predictions, hidden = model(inp, hidden)
# reshaping the target because that's how the
# loss function expects it
target = tf.reshape(target, (-1,))
loss = loss_function(target, predictions)
grads = tape.gradient(loss, model.variables)
optimizer.apply_gradients(zip(grads, model.variables))
if batch % 100 == 0:
print ('Epoch {} Batch {} Loss {:.4f}'.format(epoch+1,
batch,
loss))
# saving (checkpoint) the model every 5 epochs
if (epoch + 1) % 5 == 0:
checkpoint.save(file_prefix = checkpoint_prefix)
print ('Epoch {} Loss {:.4f}'.format(epoch+1, loss))
print('Time taken for 1 epoch {} sec\n'.format(time.time() - start))
In [0]:
# restoring the latest checkpoint in checkpoint_dir
checkpoint.restore(tf.train.latest_checkpoint(checkpoint_dir))
The below code block is used to generated the text
We start by choosing a start string and initializing the hidden state and setting the number of characters we want to generate.
We get predictions using the start_string and the hidden state
Then we use a multinomial distribution to calculate the index of the predicted word. We use this predicted word as our next input to the model
The hidden state returned by the model is fed back into the model so that it now has more context rather than just one word. After we predict the next word, the modified hidden states are again fed back into the model, which is how it learns as it gets more context from the previously predicted words.
If you see the predictions, the model knows when to capitalize, make paragraphs and the text follows a shakespeare style of writing which is pretty awesome!
In [0]:
# Evaluation step(generating text using the model learned)
# number of characters to generate
num_generate = 1000
# You can change the start string to experiment
start_string = 'Q'
# converting our start string to numbers(vectorizing!)
input_eval = [char2idx[s] for s in start_string]
input_eval = tf.expand_dims(input_eval, 0)
# empty string to store our results
text_generated = ''
# low temperatures results in more predictable text.
# higher temperatures results in more surprising text
# experiment to find the best setting
temperature = 1.0
# hidden state shape == (batch_size, number of rnn units); here batch size == 1
hidden = [tf.zeros((1, units))]
for i in range(num_generate):
predictions, hidden = model(input_eval, hidden)
# using a multinomial distribution to predict the word returned by the model
predictions = predictions / temperature
predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[0][0].numpy()
# We pass the predicted word as the next input to the model
# along with the previous hidden state
input_eval = tf.expand_dims([predicted_id], 0)
text_generated += idx2char[predicted_id]
print (start_string + text_generated)
In [0]: