In [0]:
#@title Licensed under the Apache License, Version 2.0 (the "License"); { display-mode: "form" }
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Predict Shakespeare with Cloud TPUs and Keras

This example uses tf.keras to build a language model and train it on a Google Cloud TPU. This language model predicts the next character of text given the text so far. The trained model can generate new snippets of text that read in a similar style to the text training data.

We'll train the model on the combined works of William Shakespeare, then use it to compose a play in the style of The Great Bard:

Loves that led me no dumbs lack her Berjoy's face with her to-day. The spirits roar'd; which shames which within his powers Which tied up remedies lending with occasion, A loud and Lancaster, stabb'd in me Upon my sword for ever: 'Agripo'er, his days let me free. Stop it of that word, be so: at Lear, When I did profess the hour-stranger for my life, When I did sink to be cried how for aught; Some beds which seeks chaste senses prove burning; But he perforces seen in her eyes so fast; And _

Note: To enable TPUs on Google Colab, select Runtime > Change runtime type, and set Hardware acceleration to TPU.

Download data

Download The Complete Works of William Shakespeare as a single text file from Project Gutenberg. We'll use snippets from this file as the training data for the model. The target snippet is offset by one character.


In [0]:
!wget --show-progress --continue -O /content/shakespeare.txt http://www.gutenberg.org/files/100/100-0.txt


Redirecting output to ‘wget-log’.
/content/shakespear 100%[===================>]   5.58M  8.24MB/s    in 0.7s    

Build the data generator


In [0]:
import numpy as np
import six
import tensorflow as tf
import time
import os

# This address identifies the TPU we'll use when configuring TensorFlow.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']

SHAKESPEARE_TXT = '/content/shakespeare.txt'

tf.logging.set_verbosity(tf.logging.INFO)

def transform(txt, pad_to=None):
  # drop any non-ascii characters
  output = np.asarray([ord(c) for c in txt if ord(c) < 255], dtype=np.int32)
  if pad_to is not None:
    output = output[:pad_to]
    output = np.concatenate([
        np.zeros([pad_to - len(txt)], dtype=np.int32),
        output,
    ])
  return output

def training_generator(seq_len=100, batch_size=1024):
  """A generator yields (source, target) arrays for training."""
  with tf.gfile.GFile(SHAKESPEARE_TXT, 'r') as f:
    txt = f.read()

  tf.logging.info('Input text [%d] %s', len(txt), txt[:50])
  source = transform(txt)
  while True:
    offsets = np.random.randint(0, len(source) - seq_len, batch_size)

    # Our model uses sparse crossentropy loss, but Keras requires labels
    # to have the same rank as the input logits.  We add an empty final
    # dimension to account for this.
    yield (
        np.stack([source[idx:idx + seq_len] for idx in offsets]),
        np.expand_dims(
            np.stack([source[idx + 1:idx + seq_len + 1] for idx in offsets]),
            -1),
    )

six.next(training_generator(seq_len=10, batch_size=1))


INFO:tensorflow:Input text [5834393] 
Project Gutenberg’s The Complete Works of Willi
Out[0]:
(array([[103, 101, 114,  13,  10,  32,  32,  32,  32,  79]], dtype=int32),
 array([[[101],
         [114],
         [ 13],
         [ 10],
         [ 32],
         [ 32],
         [ 32],
         [ 32],
         [ 79],
         [117]]], dtype=int32))

Build the model

The model is defined as a two-layer, forward-LSTM—with two changes from the tf.keras standard LSTM definition:

  1. Define the input shape of our model which satisfies the XLA compiler's static shape requirement.
  2. Use tf.train.Optimizer instead of a standard Keras optimizer (Keras optimizer support is still experimental).

In [0]:
EMBEDDING_DIM = 512

def lstm_model(seq_len=100, batch_size=None, stateful=True):
  """Language model: predict the next word given the current word."""
  source = tf.keras.Input(
      name='seed', shape=(seq_len,), batch_size=batch_size, dtype=tf.int32)

  embedding = tf.keras.layers.Embedding(input_dim=256, output_dim=EMBEDDING_DIM)(source)
  lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)
  lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)
  predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(256, activation='softmax'))(lstm_2)
  model = tf.keras.Model(inputs=[source], outputs=[predicted_char])
  model.compile(
      optimizer=tf.train.RMSPropOptimizer(learning_rate=0.01),
      loss='sparse_categorical_crossentropy',
      metrics=['sparse_categorical_accuracy'])
  return model

Train the model

The tf.contrib.tpu.keras_to_tpu_model function converts a tf.keras model to an equivalent TPU version. We then use the standard Keras methods to train: fit, predict, and evaluate.


In [0]:
tf.keras.backend.clear_session()

training_model = lstm_model(seq_len=100, batch_size=128, stateful=False)

tpu_model = tf.contrib.tpu.keras_to_tpu_model(
    training_model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
        tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))

tpu_model.fit_generator(
    training_generator(seq_len=100, batch_size=1024),
    steps_per_epoch=100,
    epochs=10,
)
tpu_model.save_weights('/tmp/bard.h5', overwrite=True)


INFO:tensorflow:Querying Tensorflow master (b'grpc://10.118.7.82:8470') for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 6638032082838577689)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 4873016205556938351)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 5471416470704720555)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 6926907012367290755)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 4354044869524745214)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 16938815156612161417)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 6950840633641785585)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 16768792233987872678)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 12675314328015960221)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 16547454787033470948)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 10102547023578588130)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 17447963411858084871)
WARNING:tensorflow:tpu_model (from tensorflow.contrib.tpu.python.tpu.keras_support) is experimental and may change or be removed at any time, and without warning.
INFO:tensorflow:Connecting to: b'grpc://10.118.7.82:8470'
Epoch 1/10
INFO:tensorflow:Input text [5834393] 
Project Gutenberg’s The Complete Works of Willi
INFO:tensorflow:New input shapes; (re-)compiling: mode=train, [TensorSpec(shape=(128, 100), dtype=tf.int32, name='seed0'), TensorSpec(shape=(128, 100, 1), dtype=tf.float32, name='time_distributed_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for seed
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 3.9678761959075928 secs
INFO:tensorflow:Setting weights on TPU model.
100/100 [==============================] - 27s 268ms/step - loss: 4.5166 - sparse_categorical_accuracy: 0.1821
Epoch 2/10
100/100 [==============================] - 17s 167ms/step - loss: 3.3698 - sparse_categorical_accuracy: 0.1959
Epoch 3/10
100/100 [==============================] - 17s 167ms/step - loss: 2.1526 - sparse_categorical_accuracy: 0.3880
Epoch 4/10
100/100 [==============================] - 17s 168ms/step - loss: 1.5054 - sparse_categorical_accuracy: 0.5477
Epoch 5/10
100/100 [==============================] - 17s 168ms/step - loss: 1.3133 - sparse_categorical_accuracy: 0.5975
Epoch 6/10
100/100 [==============================] - 17s 170ms/step - loss: 1.2355 - sparse_categorical_accuracy: 0.6173
Epoch 7/10
100/100 [==============================] - 17s 167ms/step - loss: 1.1957 - sparse_categorical_accuracy: 0.6273
Epoch 8/10
100/100 [==============================] - 17s 169ms/step - loss: 1.1664 - sparse_categorical_accuracy: 0.6355
Epoch 9/10
100/100 [==============================] - 17s 167ms/step - loss: 1.1466 - sparse_categorical_accuracy: 0.6407
Epoch 10/10
100/100 [==============================] - 17s 167ms/step - loss: 1.1344 - sparse_categorical_accuracy: 0.6437
INFO:tensorflow:Copying TPU weights to the CPU

Make predictions with the model

Use the trained model to make predictions and generate your own Shakespeare-esque play. Start the model off with a seed sentence, then generate 250 characters from it. We'll make five predictions from the initial seed.


In [0]:
BATCH_SIZE = 5
PREDICT_LEN = 250

# Keras requires the batch size be specified ahead of time for stateful models.
# We use a sequence length of 1, as we will be feeding in one character at a 
# time and predicting the next character.
prediction_model = lstm_model(seq_len=1, batch_size=BATCH_SIZE, stateful=True)
prediction_model.load_weights('/tmp/bard.h5')

# We seed the model with our initial string, copied BATCH_SIZE times

seed_txt = 'Looks it not like the king?  Verily, we must go! '
seed = transform(seed_txt)
seed = np.repeat(np.expand_dims(seed, 0), BATCH_SIZE, axis=0)

# First, run the seed forward to prime the state of the model.
prediction_model.reset_states()
for i in range(len(seed_txt) - 1):
  prediction_model.predict(seed[:, i:i + 1])

# Now we can accumulate predictions!
predictions = [seed[:, -1:]]
for i in range(PREDICT_LEN):
  last_word = predictions[-1]
  next_probits = prediction_model.predict(last_word)[:, 0, :]
  
  # sample from our output distribution
  next_idx = [
      np.random.choice(256, p=next_probits[i])
      for i in range(BATCH_SIZE)
  ]
  predictions.append(np.asarray(next_idx, dtype=np.int32))
  

for i in range(BATCH_SIZE):
  print('PREDICTION %d\n\n' % i)
  p = [predictions[j][i] for j in range(PREDICT_LEN)]
  generated = ''.join([chr(c) for c in p])
  print(generated)
  print()
  assert len(generated) == PREDICT_LEN, 'Generated text too short'


PREDICTION 0


 vouchsafe to.

[Enter Iailors dastard.]
And yet these doublest masters young Snowing in a tongue bend interchanged factions of more do,
Sett, appartives.

THERSITES.
Nor tis what a separate main, were they.

POLONIUS.
O, I could touch them

PREDICTION 1


 Pay thee in them lo waste.
Your head no hire and ground of all Truan:
For in the glorious cou,
Blunk, thence and laud, repentance, wrapping;
For what you were there.

ROMEO.
A very lamentable if with a drunk.

MONTAGUE.
Yes, madam.

PALA

PREDICTION 2


 If sure you should say, although
and means: the same, sir! It shall as the Lies; I am out to change.
Unclaimer that hath plead'd drink,       220
Wherein their sacred life to bed.
Why wouldst thou take my lady? O, go indeed
That piting at the r

PREDICTION 3


 stands death]
So lady, we are unsaying lambs, and oftendance
Which, then; wheres Edic died all that?

WOMAN.
Then I will be seen.

DIOMEDES.
Tis the last,
And so within a mosey of the mines,
Look
Farry this to then, voice he living propos

PREDICTION 4


 I cannot way.
You do not spent till to Thebs. Not as now
We were hundred; and I feareth much well condemn'd to bear
Though her mothers heads in this image slow:
It was a soul that hath a beggar;
For this absence doth sack again, thunder-boist's