Shakespeare in 5 minutes with Cloud TPUs

This notebook demonstrates using Cloud TPUs to build a language model: a model that predicts the next character of text given the text so far. Once our model has been trained we can sample from it to generate new text that looks like the text it was trained on. In this case we're going to train our network using the combined works of Shakespeare, creating a play generating robot.

Note: You will need a GCP account and a GCS bucket for this notebook to run!

Our network outputs something Shakespeare-esque:


Loves that led me no dumbs lack her Berjoy's face with her to-day. The spirits roar'd; which shames which within his powers Which tied up remedies lending with occasion, A loud and Lancaster, stabb'd in me Upon my sword for ever: 'Agripo'er, his days let me free. Stop it of that word, be so: at Lear, When I did profess the hour-stranger for my life, When I did sink to be cried how for aught; Some beds which seeks chaste senses prove burning; But he perforces seen in her eyes so fast; And _


Let's get started on generating our own Shakespeare! We'll start off with our data generator. The training data to our model will be snippets from our text file: the target snippet is offset by one character.


In [0]:
# !rm /content/adc.json

In [0]:
import json
import os
import pprint
import re
import time
import tensorflow as tf


use_tpu = True #@param {type:"boolean"}
bucket = '' #@param {type:"string"}

assert bucket, 'Must specify an existing GCS bucket name'
print('Using bucket: {}'.format(bucket))

if use_tpu:
    assert 'COLAB_TPU_ADDR' in os.environ, 'Missing TPU; did you request a TPU in Notebook Settings?'

MODEL_DIR = 'gs://{}/{}'.format(bucket, time.strftime('tpuestimator-lstm/%Y-%m-%d-%H-%M-%S'))
print('Using model dir: {}'.format(MODEL_DIR))

from google.colab import auth
auth.authenticate_user()

if 'COLAB_TPU_ADDR' in os.environ:
  TF_MASTER = 'grpc://{}'.format(os.environ['COLAB_TPU_ADDR'])
  
  # Upload credentials to TPU.
  with tf.Session(TF_MASTER) as sess:    
    with open('/content/adc.json', 'r') as f:
      auth_info = json.load(f)
    tf.contrib.cloud.configure_gcs(sess, credentials=auth_info)
  # Now credentials are set for all future sessions on this TPU.
else:
  TF_MASTER=''

with tf.Session(TF_MASTER) as session:
  pprint.pprint(session.list_devices())


Using bucket: tpu-estimator-shakespeare-test-bucket
Using model dir: gs://tpu-estimator-shakespeare-test-bucket/tpuestimator-lstm/2018-09-28-23-58-37
[_DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 15880407734472941098),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 7578514533265224491),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 3512042959205926245),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 6509007211901600635),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 2788113998249947095),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 18075511148356623033),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 6450852309571070103),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 14749604383048689166),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 5384492138625106038),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 17430458968359885752),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 12326938441744536100),
 _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 2858742845697236993)]

Training Data

We can use a tf.data pipeline to feed input data to our Estimator. In this case, we want our model to predict the next character, so we will feed sequences from our dataset where the source is offset from the target by 1 character.

Note that we use tf.contrib.data.enumerate_dataset() and tf.contrib.stateless.stateless_random_uniform to generate deterministic uniform samples. This, combined with the setting of RunConfig.tf_random_seed guarantees that every run of the model will have the exact same behavior.


In [0]:
import numpy as np

!wget --show-progress --continue -O /content/shakespeare.txt http://www.gutenberg.org/files/100/100-0.txt

SHAKESPEARE_TXT = '/content/shakespeare.txt'
RANDOM_SEED = 42  # An arbitrary choice.

def transform(txt):
  return np.asarray([ord(c) for c in txt], dtype=np.int32)

def input_fn(params):
  """Return a dataset of source and target sequences for training."""
  batch_size = params['batch_size']
  print('Batch size: {}'.format(batch_size))
  seq_len = params['seq_len']
  with tf.gfile.GFile(params['source_file'], 'r') as f:
    txt = f.read()
    txt = ''.join([x for x in txt if ord(x) < 128])
    
  tf.logging.info('Sample text: %s', txt[10000:10100])
  source = tf.constant(transform(txt), dtype=tf.int32)
  ds = tf.data.Dataset.from_tensors(source)
  ds = ds.repeat()
  ds = ds.apply(tf.contrib.data.enumerate_dataset())

  def _select_seq(offset, src):
    idx = tf.contrib.stateless.stateless_random_uniform(
        [1], seed=[RANDOM_SEED, offset], dtype=tf.float32)[0]

    max_start_offset = len(txt) - seq_len
    idx = tf.cast(idx * max_start_offset, tf.int32)
    print(idx)
    
    return {
        'source': tf.reshape(src[idx:idx + seq_len], [seq_len]),
        'target': tf.reshape(src[idx + 1:idx + seq_len + 1], [seq_len])
    }

  ds = ds.map(_select_seq)
  ds = ds.batch(batch_size, drop_remainder=True)
  ds = ds.prefetch(2)
  return ds

tf.reset_default_graph()
tf.set_random_seed(0)
with tf.Session() as session:
  ds = input_fn({'batch_size': 1, 'seq_len': 10, 'source_file': SHAKESPEARE_TXT})
  features = session.run(ds.make_one_shot_iterator().get_next())
  print(features['source'])
  print(features['target'])


Redirecting output to ‘wget-log.1’.
Batch size: 1
INFO:tensorflow:Sample text: ureless, and rude, barrenly perish:
Look whom she best endowed, she gave thee more;
Which bounteou
Tensor("Cast:0", shape=(), dtype=int32)
[[111 109 105 115 101 100  39 115 116  32]]
[[109 105 115 101 100  39 115 116  32 116]]

Building our model

Now that we have some data, we can define our model. We use a simple 3 layer, forward LSTM for this notebook.

The only change to our model versus a CPU/GPU model is that we specify a static shape for the input of our model. This allows TF to infer the shape of the model and satisfy the XLA compilers static shape requirement.


In [0]:
EMBEDDING_DIM = 1024

# Construct a 2-layer LSTM
def _lstm(inputs, batch_size, initial_state=None):
  def _make_cell(layer_idx):
    with tf.variable_scope('lstm/%d' % layer_idx,):
      return tf.nn.rnn_cell.LSTMCell(
          num_units=EMBEDDING_DIM,
          state_is_tuple=True,
          reuse=tf.AUTO_REUSE,
      )

  cell = tf.nn.rnn_cell.MultiRNNCell([
      _make_cell(0), 
      _make_cell(1),
  ])
  if initial_state is None:
    initial_state = cell.zero_state(batch_size, tf.float32)

  outputs, final_state = tf.contrib.recurrent.functional_rnn(
      cell, inputs, initial_state=initial_state, use_tpu=use_tpu)
  return outputs, final_state


def lstm_model(seq, initial_state=None):
  with tf.variable_scope('lstm', 
                         initializer=tf.orthogonal_initializer,
                         reuse=tf.AUTO_REUSE):
    batch_size = seq.shape[0]
    seq_len = seq.shape[1]

    embedding_params = tf.get_variable(
        'char_embedding', 
        initializer=tf.orthogonal_initializer(seed=0),
        shape=(256, EMBEDDING_DIM), dtype=tf.float32)

    embedding = tf.nn.embedding_lookup(embedding_params, seq)

    lstm_output, lstm_state = _lstm(
        embedding, batch_size, initial_state=initial_state)

    # Apply a single dense layer to the output of our LSTM to predict
    # our final characters.  This looks awkward as we have to flatten
    # our input to 2 dimensions before applying the dense layer.
    flattened = tf.reshape(lstm_output, [-1, EMBEDDING_DIM])
    logits = tf.layers.dense(flattened, 256, name='logits',)
    logits = tf.reshape(logits, [-1, seq_len, 256])
    return logits, lstm_state

Training our model

Since we're using TPUEstimator, we need to provide what's called a model function to train our model. This specifies how to train, evaluate and run inference (predictions) on our model.

Let's cover each part in turn. We'll first look at the training step.

  • We feed our source tensor to our LSTM model
  • Compute the cross entropy loss to train it better predict the target tensor.
  • Use the RMSPropOptimizer to optimize our network
  • Wrap it with the CrossShardOptimizer which lets us use multiple TPU cores to train.

Finally we return a TPUEstimatorSpec indicating how TPUEstimator should train our model.


In [0]:
def train_fn(source, target):
  logits, lstm_state = lstm_model(source)
  batch_size = source.shape[0]
  
  loss = tf.reduce_mean(
      tf.nn.sparse_softmax_cross_entropy_with_logits(
          labels=target, logits=logits))

  optimizer = tf.train.AdamOptimizer(learning_rate=0.001)
  if TF_MASTER:
    optimizer = tf.contrib.tpu.CrossShardOptimizer(optimizer)
  train_op = optimizer.minimize(loss, tf.train.get_global_step())
  return tf.contrib.tpu.TPUEstimatorSpec(
      mode=tf.estimator.ModeKeys.TRAIN,
      loss=loss,
      train_op=train_op,
  )

Evaluating our model

Next, evaluation. This is simpler: we run our model forward and check how well it predicts the next character. Again, we return a TPUEstimatorSpec to tell TPUEstimator how to evaluate the model.


In [0]:
def eval_fn(source, target):
  logits, _ = lstm_model(source)
  def metric_fn(labels, logits):
    labels = tf.cast(labels, tf.int64)
    return {
        'recall@1': tf.metrics.recall_at_k(labels, logits, 1),
        'recall@5': tf.metrics.recall_at_k(labels, logits, 5)
    }

  eval_metrics = (metric_fn, [target, logits])
  return tf.contrib.tpu.TPUEstimatorSpec(
      mode=tf.estimator.ModeKeys.EVAL, 
      loss=loss, 
      eval_metrics=eval_metrics)

Computing Predictions

We leave the most complicated part for last. There's nothing TPU specific here! For predictions we use the input tensor as a seed for our model. We then use a TensorFlow loop to sample characters from our model and return the result.


In [0]:
def predict_fn(source):
  # Seed the model with our initial array
  batch_size = source.shape[0]
  logits, lstm_state = lstm_model(source)

  def _body(i, state, preds):
    """Body of our prediction loop: predict the next character."""
    cur_preds = preds.read(i)
    next_logits, next_state = lstm_model(
        tf.cast(tf.expand_dims(cur_preds, -1), tf.int32), state)

    # pull out the last (and only) prediction.
    next_logits = next_logits[:, -1]
    next_pred = tf.multinomial(
        next_logits, num_samples=1, output_dtype=tf.int32)[:, 0]
    preds = preds.write(i + 1, next_pred)
    return (i + 1, next_state, preds)

  def _cond(i, state, preds):
    del state
    del preds

    # Loop until `predict_len - 1`: preds[0] is the initial state and we
    # write to `i + 1` on each iteration.
    return tf.less(i, predict_len - 1)

  next_pred = tf.multinomial(
      logits[:, -1], num_samples=1, output_dtype=tf.int32)[:, 0]

  i = tf.constant(0, dtype=tf.int32)

  predict_len = 500

  # compute predictions as [seq_len, batch_size] to simplify indexing/updates
  pred_var = tf.TensorArray(
      dtype=tf.int32,
      size=predict_len,
      dynamic_size=False,
      clear_after_read=False,
      element_shape=(batch_size,),
      name='prediction_accumulator',
  )

  pred_var = pred_var.write(0, next_pred)
  _, _, final_predictions = tf.while_loop(_cond, _body,
                                          [i, lstm_state, pred_var])

  # reshape back to [batch_size, predict_len] and cast to int32
  final_predictions = final_predictions.stack()
  final_predictions = tf.transpose(final_predictions, [1, 0])
  final_predictions = tf.reshape(final_predictions, (batch_size, predict_len))

  return tf.contrib.tpu.TPUEstimatorSpec(
      mode=tf.estimator.ModeKeys.PREDICT, 
      predictions={'predictions': final_predictions})

Building our model function

We can now use our helper functions to build our combined model function and train our model!


In [0]:
def model_fn(features, labels, mode, params):
  if mode == tf.estimator.ModeKeys.TRAIN:
    return train_fn(features['source'], features['target'])
  if mode == tf.estimator.ModeKeys.EVAL:
    return eval_fn(features['source'], features['target'])
  if mode == tf.estimator.ModeKeys.PREDICT:
    return predict_fn(features['source'])

Running our model

We now have a bit of boilerplate to specify our TPU worker and then we can train our model!


In [0]:
def _make_estimator(num_shards, use_tpu=True):
  config = tf.contrib.tpu.RunConfig(
      tf_random_seed=RANDOM_SEED,
      master=TF_MASTER,
      model_dir=MODEL_DIR,
      save_checkpoints_steps=5000,
      tpu_config=tf.contrib.tpu.TPUConfig(
          num_shards=num_shards, iterations_per_loop=100))

  estimator = tf.contrib.tpu.TPUEstimator(
      use_tpu=use_tpu,
      model_fn=model_fn, config=config,
      train_batch_size=1024,
      eval_batch_size=1024,
      predict_batch_size=128,
      params={'seq_len': 100, 'source_file': SHAKESPEARE_TXT},
  )
  return estimator


# Use all 8 cores for training
estimator = _make_estimator(num_shards=8, use_tpu=use_tpu)
estimator.train(
    input_fn=input_fn,
    max_steps=2000,
)


INFO:tensorflow:Using config: {'_model_dir': 'gs://tpu-estimator-shakespeare-test-bucket/tpuestimator-lstm/2018-09-28-23-58-37', '_tf_random_seed': 42, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f2d308f1ef0>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.76.7.218:8470', '_evaluation_master': 'grpc://10.76.7.218:8470', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=8, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
INFO:tensorflow:_TPUContext: eval_on_tpu True
INFO:tensorflow:Querying Tensorflow master (grpc://10.76.7.218:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:CPU:0, CPU, -1, 15880407734472941098)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 7578514533265224491)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:XLA_GPU:0, XLA_GPU, 17179869184, 3512042959205926245)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 6509007211901600635)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 2788113998249947095)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 18075511148356623033)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 6450852309571070103)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 14749604383048689166)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 5384492138625106038)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 17430458968359885752)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 12326938441744536100)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:tpu_worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 2858742845697236993)
INFO:tensorflow:Calling model_fn.
Batch size: 1024
INFO:tensorflow:Sample text: ureless, and rude, barrenly perish:
Look whom she best endowed, she gave thee more;
Which bounteou
Tensor("Cast:0", shape=(), dtype=int32, device=/job:tpu_worker/task:0/device:CPU:0)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gradients_impl.py:108: UserWarning: Converting sparse IndexedSlices to a dense Tensor of unknown shape. This may consume a large amount of memory.
  "Converting sparse IndexedSlices to a dense Tensor of unknown shape. "
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:TPU job name tpu_worker
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from gs://tpu-estimator-shakespeare-test-bucket/tpuestimator-lstm/2018-09-28-23-58-37/model.ckpt-500
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 500 into gs://tpu-estimator-shakespeare-test-bucket/tpuestimator-lstm/2018-09-28-23-58-37/model.ckpt.
INFO:tensorflow:Installing graceful shutdown hook.
INFO:tensorflow:Creating heartbeat manager for ['/job:tpu_worker/replica:0/task:0/device:CPU:0', '/job:tpu_worker/replica:0/task:0/device:XLA_CPU:0']
WARNING:tensorflow:Worker heartbeats not supported by all workers.  No failure handling will be enabled.
INFO:tensorflow:Init TPU system
INFO:tensorflow:Starting infeed thread controller.
INFO:tensorflow:Starting outfeed thread controller.
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.2207099, step = 600
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.2004857, step = 700 (16.893 sec)
INFO:tensorflow:global_step/sec: 5.91939
INFO:tensorflow:examples/sec: 6061.46
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.1948798, step = 800 (16.768 sec)
INFO:tensorflow:global_step/sec: 5.96381
INFO:tensorflow:examples/sec: 6106.95
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.1526046, step = 900 (19.553 sec)
INFO:tensorflow:global_step/sec: 5.11454
INFO:tensorflow:examples/sec: 5237.29
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.1549901, step = 1000 (16.756 sec)
INFO:tensorflow:global_step/sec: 5.96787
INFO:tensorflow:examples/sec: 6111.1
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.1079394, step = 1100 (16.774 sec)
INFO:tensorflow:global_step/sec: 5.96144
INFO:tensorflow:examples/sec: 6104.52
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.123244, step = 1200 (16.742 sec)
INFO:tensorflow:global_step/sec: 5.97283
INFO:tensorflow:examples/sec: 6116.18
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.0875183, step = 1300 (16.782 sec)
INFO:tensorflow:global_step/sec: 5.95897
INFO:tensorflow:examples/sec: 6101.98
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.0983433, step = 1400 (16.738 sec)
INFO:tensorflow:global_step/sec: 5.97466
INFO:tensorflow:examples/sec: 6118.05
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.0734489, step = 1500 (16.777 sec)
INFO:tensorflow:global_step/sec: 5.96048
INFO:tensorflow:examples/sec: 6103.53
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.049539, step = 1600 (16.748 sec)
INFO:tensorflow:global_step/sec: 5.97095
INFO:tensorflow:examples/sec: 6114.25
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.026352, step = 1700 (19.171 sec)
INFO:tensorflow:global_step/sec: 5.21613
INFO:tensorflow:examples/sec: 5341.32
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.0030547, step = 1800 (16.763 sec)
INFO:tensorflow:global_step/sec: 5.96552
INFO:tensorflow:examples/sec: 6108.69
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 1.0061597, step = 1900 (16.776 sec)
INFO:tensorflow:global_step/sec: 5.96088
INFO:tensorflow:examples/sec: 6103.94
INFO:tensorflow:Enqueue next (100) batch(es) of data to infeed.
INFO:tensorflow:Dequeue next (100) batch(es) of data from outfeed.
INFO:tensorflow:loss = 0.9461493, step = 2000 (16.768 sec)
INFO:tensorflow:global_step/sec: 5.96372
INFO:tensorflow:examples/sec: 6106.85
INFO:tensorflow:Saving checkpoints for 2000 into gs://tpu-estimator-shakespeare-test-bucket/tpuestimator-lstm/2018-09-28-23-58-37/model.ckpt.
INFO:tensorflow:Stop infeed thread controller
INFO:tensorflow:Shutting down InfeedController thread.
INFO:tensorflow:InfeedController received shutdown signal, stopping.
INFO:tensorflow:Infeed thread finished, shutting down.
INFO:tensorflow:infeed marked as finished
INFO:tensorflow:Stop output thread controller
INFO:tensorflow:Shutting down OutfeedController thread.
INFO:tensorflow:OutfeedController received shutdown signal, stopping.
INFO:tensorflow:Outfeed thread finished, shutting down.
INFO:tensorflow:outfeed marked as finished
INFO:tensorflow:Shutdown TPU system.
INFO:tensorflow:Loss for final step: 0.9461493.
INFO:tensorflow:training_loop marked as finished
Out[0]:
<tensorflow.contrib.tpu.python.tpu.tpu_estimator.TPUEstimator at 0x7f2d307910f0>

Running predictions with our model

We've trained our model, now we can run predictions through it to generate "Shakespeare"! We provide a seed sentence to get our model started, and then sample 500 characters from it.


In [0]:
def _seed_input_fn(params):
  del params
  seed_txt = 'Looks it not like the king?'
  seed = transform(seed_txt)
  seed = tf.constant(seed.reshape([1, -1]), dtype=tf.int32)
  # Predict must return a Dataset, not a Tensor.
  return tf.data.Dataset.from_tensors({'source': seed})

# Use 1 core for prediction since we're only generating a single element batch
estimator = _make_estimator(num_shards=1, use_tpu=False)

idx = next(estimator.predict(input_fn=_seed_input_fn))['predictions']
print(''.join([chr(i) for i in idx]))


INFO:tensorflow:Using config: {'_model_dir': 'gs://tpu-estimator-shakespeare-test-bucket/tpuestimator-lstm/2018-09-28-23-58-37', '_tf_random_seed': 42, '_save_summary_steps': 100, '_save_checkpoints_steps': 5000, '_save_checkpoints_secs': None, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': None, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f2d30618f60>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': 'grpc://10.76.7.218:8470', '_evaluation_master': 'grpc://10.76.7.218:8470', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1, '_tpu_config': TPUConfig(iterations_per_loop=100, num_shards=1, num_cores_per_replica=None, per_host_input_for_training=2, tpu_job_name=None, initial_infeed_sleep_secs=None, input_partition_dims=None), '_cluster': None}
WARNING:tensorflow:Setting TPUConfig.num_shards==1 is an unsupported behavior. Please fix as soon as possible (leaving num_shards as None.
INFO:tensorflow:_TPUContext: eval_on_tpu True
WARNING:tensorflow:eval_on_tpu ignored because use_tpu is False.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Running infer on CPU
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from gs://tpu-estimator-shakespeare-test-bucket/tpuestimator-lstm/2018-09-28-23-58-37/model.ckpt-2000
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:prediction_loop marked as finished
One forward I am.
  DEMETRIUS. Sir, O, be sin to me and more defence,
    Made wars shall pinch this large request again
    Whiles my young mistress Mercury.
  ROSALIND. 'Tis a coward that carries them.
    By the garter, he shall roast thy book, must pleasure, kneel'd;
    you'll be dient,
    with my son Lucius.
  CORIOLANUS. Hear me; if thou read be won order than too.
    The long thou seest is lost; but out thy taints
    Look forth thee well, if more, a perisher, go.
    Chirti

In [0]: