Licensed under the Apache License, Version 2.0 (the "License");


In [0]:
# Copyright 2018 The TensorFlow Hub Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================

In [0]:

Predict Shakespeare with Cloud TPUs and Keras

Overview

This example uses tf.keras to build a language model and train it on a Cloud TPU. This language model predicts the next character of text given the text so far. The trained model can generate new snippets of text that read in a similar style to the text training data.

The model trains for 10 epochs and completes in approximately 5 minutes.

This notebook is hosted on GitHub. To view it in its original repository, after opening the notebook, select File > View on GitHub.

Learning objectives

In this Colab, you will learn how to:

  • Build a two-layer, forward-LSTM model.
  • Use distribution strategy to produce a tf.keras model that runs on TPU version and then use the standard Keras methods to train: fit, predict, and evaluate.
  • Use the trained model to make predictions and generate your own Shakespeare-esque play.

Instructions

  Train on TPU  

  1. On the main menu, click Runtime and select Change runtime type. Set "TPU" as the hardware accelerator.
  2. Click Runtime again and select Runtime > Run All. You can also run the cells manually with Shift-ENTER.

TPUs are located in Google Cloud, for optimal performance, they read data directly from Google Cloud Storage (GCS)

Data, model, and training

In this example, you train the model on the combined works of William Shakespeare, then use the model to compose a play in the style of The Great Bard:

Loves that led me no dumbs lack her Berjoy's face with her to-day. The spirits roar'd; which shames which within his powers Which tied up remedies lending with occasion, A loud and Lancaster, stabb'd in me Upon my sword for ever: 'Agripo'er, his days let me free. Stop it of that word, be so: at Lear, When I did profess the hour-stranger for my life, When I did sink to be cried how for aught; Some beds which seeks chaste senses prove burning; But he perforces seen in her eyes so fast; And _

Download data

Download The Complete Works of William Shakespeare as a single text file from Project Gutenberg. You use snippets from this file as the training data for the model. The target snippet is offset by one character.


In [0]:
!wget --show-progress --continue -O /content/shakespeare.txt http://www.gutenberg.org/files/100/100-0.txt

Build the input dataset

We just downloaded some text. The following shows the start of the text and a random snippet so we can get a feel for the whole text.


In [0]:
!head -n5 /content/shakespeare.txt
!echo "..."
!shuf -n5 /content/shakespeare.txt

In [0]:
import numpy as np
import tensorflow as tf
import os

import distutils
if distutils.version.LooseVersion(tf.__version__) < '1.14':
    raise Exception('This notebook is compatible with TensorFlow 1.14 or higher, for TensorFlow 1.13 or lower please use the previous version at https://github.com/tensorflow/tpu/blob/r1.13/tools/colab/shakespeare_with_tpu_and_keras.ipynb')

# This address identifies the TPU we'll use when configuring TensorFlow.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']

SHAKESPEARE_TXT = '/content/shakespeare.txt'

def transform(txt):
  return np.asarray([ord(c) for c in txt if ord(c) < 255], dtype=np.int32)

def input_fn(seq_len=100, batch_size=1024):
  """Return a dataset of source and target sequences for training."""
  with tf.io.gfile.GFile(SHAKESPEARE_TXT, 'r') as f:
    txt = f.read()

  source = tf.constant(transform(txt), dtype=tf.int32)

  ds = tf.data.Dataset.from_tensor_slices(source).batch(seq_len+1, drop_remainder=True)

  def split_input_target(chunk):
    input_text = chunk[:-1]
    target_text = chunk[1:]
    return input_text, target_text

  BUFFER_SIZE = 10000
  ds = ds.map(split_input_target).shuffle(BUFFER_SIZE).batch(batch_size, drop_remainder=True)

  return ds.repeat()

Build the model

The model is defined as a two-layer, forward-LSTM, the same model should work both on CPU and TPU.

Because our vocabulary size is 256, the input dimension to the Embedding layer is 256.

When specifying the arguments to the LSTM, it is important to note how the stateful argument is used. When training we will make sure that stateful=False because we do want to reset the state of our model between batches, but when sampling (computing predictions) from a trained model, we want stateful=True so that the model can retain information across the current batch and generate more interesting text.


In [0]:
EMBEDDING_DIM = 512

def lstm_model(seq_len=100, batch_size=None, stateful=True):
  """Language model: predict the next word given the current word."""
  source = tf.keras.Input(
      name='seed', shape=(seq_len,), batch_size=batch_size, dtype=tf.int32)

  embedding = tf.keras.layers.Embedding(input_dim=256, output_dim=EMBEDDING_DIM)(source)
  lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)
  lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)
  predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(256, activation='softmax'))(lstm_2)
  return tf.keras.Model(inputs=[source], outputs=[predicted_char])

Train the model

First, we need to create a distribution strategy that can use the TPU. In this case it is TPUStrategy. You can create and compile the model inside its scope. Once that is done, future calls to the standard Keras methods fit, evaluate and predict use the TPU.

Again note that we train with stateful=False because while training, we only care about one batch at a time.


In [0]:
tf.keras.backend.clear_session()

resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)
tf.contrib.distribute.initialize_tpu_system(resolver)
strategy = tf.contrib.distribute.TPUStrategy(resolver)

with strategy.scope():
  training_model = lstm_model(seq_len=100, stateful=False)
  training_model.compile(
      optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.01),
      loss='sparse_categorical_crossentropy',
      metrics=['sparse_categorical_accuracy'])

training_model.fit(
    input_fn(),
    steps_per_epoch=100,
    epochs=10
)
training_model.save_weights('/tmp/bard.h5', overwrite=True)

Make predictions with the model

Use the trained model to make predictions and generate your own Shakespeare-esque play. Start the model off with a seed sentence, then generate 250 characters from it. The model makes five predictions from the initial seed.

The predictions are done on the CPU so the batch size (5) in this case does not have to be divisible by 8.

Note that when we are doing predictions or, to be more precise, text generation, we set stateful=True so that the model's state is kept between batches. If stateful is false, the model state is reset between each batch, and the model will only be able to use the information from the current batch (a single character) to make a prediction.

The output of the model is a set of probabilities for the next character (given the input so far). To build a paragraph, we predict one character at a time and sample a character (based on the probabilities provided by the model). For example, if the input character is "o" and the output probabilities are "p" (0.65), "t" (0.30), others characters (0.05), then we allow our model to generate text other than just "Ophelia" and "Othello."


In [0]:
BATCH_SIZE = 5
PREDICT_LEN = 250

# Keras requires the batch size be specified ahead of time for stateful models.
# We use a sequence length of 1, as we will be feeding in one character at a 
# time and predicting the next character.
prediction_model = lstm_model(seq_len=1, batch_size=BATCH_SIZE, stateful=True)
prediction_model.load_weights('/tmp/bard.h5')

# We seed the model with our initial string, copied BATCH_SIZE times

seed_txt = 'Looks it not like the king?  Verily, we must go! '
seed = transform(seed_txt)
seed = np.repeat(np.expand_dims(seed, 0), BATCH_SIZE, axis=0)

# First, run the seed forward to prime the state of the model.
prediction_model.reset_states()
for i in range(len(seed_txt) - 1):
  prediction_model.predict(seed[:, i:i + 1])

# Now we can accumulate predictions!
predictions = [seed[:, -1:]]
for i in range(PREDICT_LEN):
  last_word = predictions[-1]
  next_probits = prediction_model.predict(last_word)[:, 0, :]
  
  # sample from our output distribution
  next_idx = [
      np.random.choice(256, p=next_probits[i])
      for i in range(BATCH_SIZE)
  ]
  predictions.append(np.asarray(next_idx, dtype=np.int32))
  

for i in range(BATCH_SIZE):
  print('PREDICTION %d\n\n' % i)
  p = [predictions[j][i] for j in range(PREDICT_LEN)]
  generated = ''.join([chr(c) for c in p])  # Convert back to text
  print(generated)
  print()
  assert len(generated) == PREDICT_LEN, 'Generated text too short'

What's next

  • Learn about Cloud TPUs that Google designed and optimized specifically to speed up and scale up ML workloads for training and inference and to enable ML engineers and researchers to iterate more quickly.
  • Explore the range of Cloud TPU tutorials and Colabs to find other examples that can be used when implementing your ML project.

On Google Cloud Platform, in addition to GPUs and TPUs available on pre-configured deep learning VMs, you will find AutoML(beta) for training custom models without writing code and Cloud ML Engine which will allows you to run parallel trainings and hyperparameter tuning of your custom models on powerful distributed hardware.