In [0]:
# Copyright 2018 The TensorFlow Hub Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
In [0]:
This example uses tf.keras to build a language model and train it on a Cloud TPU. This language model predicts the next character of text given the text so far. The trained model can generate new snippets of text that read in a similar style to the text training data.
The model trains for 10 epochs and completes in approximately 5 minutes.
This notebook is hosted on GitHub. To view it in its original repository, after opening the notebook, select File > View on GitHub.
In this Colab, you will learn how to:
tf.keras
model that runs on TPU version and then use the standard Keras methods to train: fit
, predict
, and evaluate
.TPUs are located in Google Cloud, for optimal performance, they read data directly from Google Cloud Storage (GCS)
In this example, you train the model on the combined works of William Shakespeare, then use the model to compose a play in the style of The Great Bard:
Loves that led me no dumbs lack her Berjoy's face with her to-day. The spirits roar'd; which shames which within his powers Which tied up remedies lending with occasion, A loud and Lancaster, stabb'd in me Upon my sword for ever: 'Agripo'er, his days let me free. Stop it of that word, be so: at Lear, When I did profess the hour-stranger for my life, When I did sink to be cried how for aught; Some beds which seeks chaste senses prove burning; But he perforces seen in her eyes so fast; And _
Download The Complete Works of William Shakespeare as a single text file from Project Gutenberg. You use snippets from this file as the training data for the model. The target snippet is offset by one character.
In [0]:
!wget --show-progress --continue -O /content/shakespeare.txt http://www.gutenberg.org/files/100/100-0.txt
We just downloaded some text. The following shows the start of the text and a random snippet so we can get a feel for the whole text.
In [0]:
!head -n5 /content/shakespeare.txt
!echo "..."
!shuf -n5 /content/shakespeare.txt
In [0]:
import numpy as np
import tensorflow as tf
import os
import distutils
if distutils.version.LooseVersion(tf.__version__) < '1.14':
raise Exception('This notebook is compatible with TensorFlow 1.14 or higher, for TensorFlow 1.13 or lower please use the previous version at https://github.com/tensorflow/tpu/blob/r1.13/tools/colab/shakespeare_with_tpu_and_keras.ipynb')
# This address identifies the TPU we'll use when configuring TensorFlow.
TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
SHAKESPEARE_TXT = '/content/shakespeare.txt'
def transform(txt):
return np.asarray([ord(c) for c in txt if ord(c) < 255], dtype=np.int32)
def input_fn(seq_len=100, batch_size=1024):
"""Return a dataset of source and target sequences for training."""
with tf.io.gfile.GFile(SHAKESPEARE_TXT, 'r') as f:
txt = f.read()
source = tf.constant(transform(txt), dtype=tf.int32)
ds = tf.data.Dataset.from_tensor_slices(source).batch(seq_len+1, drop_remainder=True)
def split_input_target(chunk):
input_text = chunk[:-1]
target_text = chunk[1:]
return input_text, target_text
BUFFER_SIZE = 10000
ds = ds.map(split_input_target).shuffle(BUFFER_SIZE).batch(batch_size, drop_remainder=True)
return ds.repeat()
The model is defined as a two-layer, forward-LSTM, the same model should work both on CPU and TPU.
Because our vocabulary size is 256, the input dimension to the Embedding layer is 256.
When specifying the arguments to the LSTM, it is important to note how the stateful argument is used. When training we will make sure that stateful=False
because we do want to reset the state of our model between batches, but when sampling (computing predictions) from a trained model, we want stateful=True
so that the model can retain information across the current batch and generate more interesting text.
In [0]:
EMBEDDING_DIM = 512
def lstm_model(seq_len=100, batch_size=None, stateful=True):
"""Language model: predict the next word given the current word."""
source = tf.keras.Input(
name='seed', shape=(seq_len,), batch_size=batch_size, dtype=tf.int32)
embedding = tf.keras.layers.Embedding(input_dim=256, output_dim=EMBEDDING_DIM)(source)
lstm_1 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(embedding)
lstm_2 = tf.keras.layers.LSTM(EMBEDDING_DIM, stateful=stateful, return_sequences=True)(lstm_1)
predicted_char = tf.keras.layers.TimeDistributed(tf.keras.layers.Dense(256, activation='softmax'))(lstm_2)
return tf.keras.Model(inputs=[source], outputs=[predicted_char])
First, we need to create a distribution strategy that can use the TPU. In this case it is TPUStrategy. You can create and compile the model inside its scope. Once that is done, future calls to the standard Keras methods fit
, evaluate
and predict
use the TPU.
Again note that we train with stateful=False
because while training, we only care about one batch at a time.
In [0]:
tf.keras.backend.clear_session()
resolver = tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)
tf.contrib.distribute.initialize_tpu_system(resolver)
strategy = tf.contrib.distribute.TPUStrategy(resolver)
with strategy.scope():
training_model = lstm_model(seq_len=100, stateful=False)
training_model.compile(
optimizer=tf.keras.optimizers.RMSprop(learning_rate=0.01),
loss='sparse_categorical_crossentropy',
metrics=['sparse_categorical_accuracy'])
training_model.fit(
input_fn(),
steps_per_epoch=100,
epochs=10
)
training_model.save_weights('/tmp/bard.h5', overwrite=True)
Use the trained model to make predictions and generate your own Shakespeare-esque play. Start the model off with a seed sentence, then generate 250 characters from it. The model makes five predictions from the initial seed.
The predictions are done on the CPU so the batch size (5) in this case does not have to be divisible by 8.
Note that when we are doing predictions or, to be more precise, text generation, we set stateful=True
so that the model's state is kept between batches. If stateful is false, the model state is reset between each batch, and the model will only be able to use the information from the current batch (a single character) to make a prediction.
The output of the model is a set of probabilities for the next character (given the input so far). To build a paragraph, we predict one character at a time and sample a character (based on the probabilities provided by the model). For example, if the input character is "o" and the output probabilities are "p" (0.65), "t" (0.30), others characters (0.05), then we allow our model to generate text other than just "Ophelia" and "Othello."
In [0]:
BATCH_SIZE = 5
PREDICT_LEN = 250
# Keras requires the batch size be specified ahead of time for stateful models.
# We use a sequence length of 1, as we will be feeding in one character at a
# time and predicting the next character.
prediction_model = lstm_model(seq_len=1, batch_size=BATCH_SIZE, stateful=True)
prediction_model.load_weights('/tmp/bard.h5')
# We seed the model with our initial string, copied BATCH_SIZE times
seed_txt = 'Looks it not like the king? Verily, we must go! '
seed = transform(seed_txt)
seed = np.repeat(np.expand_dims(seed, 0), BATCH_SIZE, axis=0)
# First, run the seed forward to prime the state of the model.
prediction_model.reset_states()
for i in range(len(seed_txt) - 1):
prediction_model.predict(seed[:, i:i + 1])
# Now we can accumulate predictions!
predictions = [seed[:, -1:]]
for i in range(PREDICT_LEN):
last_word = predictions[-1]
next_probits = prediction_model.predict(last_word)[:, 0, :]
# sample from our output distribution
next_idx = [
np.random.choice(256, p=next_probits[i])
for i in range(BATCH_SIZE)
]
predictions.append(np.asarray(next_idx, dtype=np.int32))
for i in range(BATCH_SIZE):
print('PREDICTION %d\n\n' % i)
p = [predictions[j][i] for j in range(PREDICT_LEN)]
generated = ''.join([chr(c) for c in p]) # Convert back to text
print(generated)
print()
assert len(generated) == PREDICT_LEN, 'Generated text too short'
On Google Cloud Platform, in addition to GPUs and TPUs available on pre-configured deep learning VMs, you will find AutoML(beta) for training custom models without writing code and Cloud ML Engine which will allows you to run parallel trainings and hyperparameter tuning of your custom models on powerful distributed hardware.