Character Sequence to Sequence

In this notebook, we'll build a model that takes in a sequence of letters, and outputs a sorted version of that sequence. We'll do that using what we've learned so far about Sequence to Sequence models. This notebook was updated to work with TensorFlow 1.1 and builds on the work of Dave Currie. Check out Dave's post Text Summarization with Amazon Reviews.

Dataset

The dataset lives in the /data/ folder. At the moment, it is made up of the following files:

  • letters_source.txt: The list of input letter sequences. Each sequence is its own line.
  • letters_target.txt: The list of target sequences we'll use in the training process. Each sequence here is a response to the input sequence in letters_source.txt with the same line number.

In [73]:
import numpy as np
import time

import helper

source_path = 'data/letters_source.txt'
target_path = 'data/letters_target.txt'

source_sentences = helper.load_data(source_path)
target_sentences = helper.load_data(target_path)

Let's start by examining the current state of the dataset. source_sentences contains the entire input sequence file as text delimited by newline symbols.


In [74]:
source_sentences[:50].split('\n')


Out[74]:
['bsaqq',
 'npy',
 'lbwuj',
 'bqv',
 'kial',
 'tddam',
 'edxpjpg',
 'nspv',
 'huloz',
 '']

source_sentences contains the entire output sequence file as text delimited by newline symbols. Each line corresponds to the line from source_sentences. source_sentences contains a sorted characters of the line.


In [75]:
target_sentences[:50].split('\n')


Out[75]:
['abqqs',
 'npy',
 'bjluw',
 'bqv',
 'aikl',
 'addmt',
 'degjppx',
 'npsv',
 'hlouz',
 '']

Preprocess

To do anything useful with it, we'll need to turn the each string into a list of characters:

Then convert the characters to their int values as declared in our vocabulary:


In [76]:
def extract_character_vocab(data):
    special_words = ['<PAD>', '<UNK>', '<GO>',  '<EOS>']

    set_words = set([character for line in data.split('\n') for character in line])
    int_to_vocab = {word_i: word for word_i, word in enumerate(special_words + list(set_words))}
    vocab_to_int = {word: word_i for word_i, word in int_to_vocab.items()}

    return int_to_vocab, vocab_to_int

# Build int2letter and letter2int dicts
source_int_to_letter, source_letter_to_int = extract_character_vocab(source_sentences)
target_int_to_letter, target_letter_to_int = extract_character_vocab(target_sentences)

# Convert characters to ids
source_letter_ids = [[source_letter_to_int.get(letter, source_letter_to_int['<UNK>']) for letter in line] for line in source_sentences.split('\n')]
target_letter_ids = [[target_letter_to_int.get(letter, target_letter_to_int['<UNK>']) for letter in line] + [target_letter_to_int['<EOS>']] for line in target_sentences.split('\n')] 

print("Example source sequence")
print(source_letter_ids[:3])
print("\n")
print("Example target sequence")
print(target_letter_ids[:3])


Example source sequence
[[19, 29, 10, 25, 25], [7, 17, 18], [12, 19, 15, 21, 4]]


Example target sequence
[[29, 19, 25, 25, 5, 3], [8, 17, 18, 3], [19, 4, 12, 21, 15, 3]]

This is the final shape we need them to be in. We can now proceed to building the model.

Model

Check the Version of TensorFlow

This will check to make sure you have the correct version of TensorFlow


In [77]:
from distutils.version import LooseVersion
import tensorflow as tf
from tensorflow.python.layers.core import Dense


# Check TensorFlow Version
assert LooseVersion(tf.__version__) >= LooseVersion('1.1'), 'Please use TensorFlow version 1.1 or newer'
print('TensorFlow Version: {}'.format(tf.__version__))


TensorFlow Version: 1.1.0

Hyperparameters


In [78]:
# Number of Epochs
epochs = 60
# Batch Size
batch_size = 128
# RNN Size
rnn_size = 50
# Number of Layers
num_layers = 2
# Embedding Size
encoding_embedding_size = 15
decoding_embedding_size = 15
# Learning Rate
learning_rate = 0.001

Input


In [79]:
def get_model_inputs():
    input_data = tf.placeholder(tf.int32, [None, None], name='input')
    targets = tf.placeholder(tf.int32, [None, None], name='targets')
    lr = tf.placeholder(tf.float32, name='learning_rate')

    target_sequence_length = tf.placeholder(tf.int32, (None,), name='target_sequence_length')
    max_target_sequence_length = tf.reduce_max(target_sequence_length, name='max_target_len')
    source_sequence_length = tf.placeholder(tf.int32, (None,), name='source_sequence_length')
    
    return input_data, targets, lr, target_sequence_length, max_target_sequence_length, source_sequence_length

Sequence to Sequence Model

We can now stat defining the functions that will build the seq2seq model. We are building it from the bottom up with the following components:

2.1 Encoder
    - Embedding
    - Encoder cell
2.2 Decoder
    1- Process decoder inputs
    2- Set up the decoder
        - Embedding
        - Decoder cell
        - Dense output layer
        - Training decoder
        - Inference decoder
2.3 Seq2seq model connecting the encoder and decoder
2.4 Build the training graph hooking up the model with the 
    optimizer

2.1 Encoder

The first bit of the model we'll build is the encoder. Here, we'll embed the input data, construct our encoder, then pass the embedded data to the encoder.


In [89]:
def encoding_layer(input_data, rnn_size, num_layers,
                   source_sequence_length, source_vocab_size, 
                   encoding_embedding_size):


    # Encoder embedding
    enc_embed_input = tf.contrib.layers.embed_sequence(input_data, source_vocab_size, encoding_embedding_size)

    # RNN cell
    def make_cell(rnn_size):
        enc_cell = tf.contrib.rnn.LSTMCell(rnn_size,
                                           initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
        return enc_cell

    enc_cell = tf.contrib.rnn.MultiRNNCell([make_cell(rnn_size) for _ in range(num_layers)])
    
    enc_output, enc_state = tf.nn.dynamic_rnn(enc_cell, enc_embed_input, sequence_length=source_sequence_length, dtype=tf.float32)
    
    return enc_output, enc_state

2.2 Decoder

The decoder is probably the most involved part of this model. The following steps are needed to create it:

1- Process decoder inputs
2- Set up the decoder components
    - Embedding
    - Decoder cell
    - Dense output layer
    - Training decoder
    - Inference decoder


Process Decoder Input

In the training process, the target sequences will be used in two different places:

  1. Using them to calculate the loss
  2. Feeding them to the decoder during training to make the model more robust.

Now we need to address the second point. Let's assume our targets look like this in their letter/word form (we're doing this for readibility. At this point in the code, these sequences would be in int form):

We need to do a simple transformation on the tensor before feeding it to the decoder:

1- We will feed an item of the sequence to the decoder at each time step. Think about the last timestep -- where the decoder outputs the final word in its output. The input to that step is the item before last from the target sequence. The decoder has no use for the last item in the target sequence in this scenario. So we'll need to remove the last item.

We do that using tensorflow's tf.strided_slice() method. We hand it the tensor, and the index of where to start and where to end the cutting.

2- The first item in each sequence we feed to the decoder has to be GO symbol. So We'll add that to the beginning.

Now the tensor is ready to be fed to the decoder. It looks like this (if we convert from ints to letters/symbols):


In [90]:
# Process the input we'll feed to the decoder
def process_decoder_input(target_data, vocab_to_int, batch_size):
    '''Remove the last word id from each batch and concat the <GO> to the begining of each batch'''
    ending = tf.strided_slice(target_data, [0, 0], [batch_size, -1], [1, 1])
    dec_input = tf.concat([tf.fill([batch_size, 1], vocab_to_int['<GO>']), ending], 1)

    return dec_input

Set up the decoder components

    - Embedding
    - Decoder cell
    - Dense output layer
    - Training decoder
    - Inference decoder

1- Embedding

Now that we have prepared the inputs to the training decoder, we need to embed them so they can be ready to be passed to the decoder.

We'll create an embedding matrix like the following then have tf.nn.embedding_lookup convert our input to its embedded equivalent:

2- Decoder Cell

Then we declare our decoder cell. Just like the encoder, we'll use an tf.contrib.rnn.LSTMCell here as well.

We need to declare a decoder for the training process, and a decoder for the inference/prediction process. These two decoders will share their parameters (so that all the weights and biases that are set during the training phase can be used when we deploy the model).

First, we'll need to define the type of cell we'll be using for our decoder RNNs. We opted for LSTM.

3- Dense output layer

Before we move to declaring our decoders, we'll need to create the output layer, which will be a tensorflow.python.layers.core.Dense layer that translates the outputs of the decoder to logits that tell us which element of the decoder vocabulary the decoder is choosing to output at each time step.

4- Training decoder

Essentially, we'll be creating two decoders which share their parameters. One for training and one for inference. The two are similar in that both created using tf.contrib.seq2seq.BasicDecoder and tf.contrib.seq2seq.dynamic_decode. They differ, however, in that we feed the the target sequences as inputs to the training decoder at each time step to make it more robust.

We can think of the training decoder as looking like this (except that it works with sequences in batches):

The training decoder does not feed the output of each time step to the next. Rather, the inputs to the decoder time steps are the target sequence from the training dataset (the orange letters).

5- Inference decoder

The inference decoder is the one we'll use when we deploy our model to the wild.

We'll hand our encoder hidden state to both the training and inference decoders and have it process its output. TensorFlow handles most of the logic for us. We just have to use the appropriate methods from tf.contrib.seq2seq and supply them with the appropriate inputs.


In [99]:
def decoding_layer(target_letter_to_int, decoding_embedding_size, num_layers, rnn_size,
                   target_sequence_length, max_target_sequence_length, enc_state, dec_input):
    # 1. Decoder Embedding
    target_vocab_size = len(target_letter_to_int)
    dec_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
    dec_embed_input = tf.nn.embedding_lookup(dec_embeddings, dec_input)

    # 2. Construct the decoder cell
    def make_cell(rnn_size):
        dec_cell = tf.contrib.rnn.LSTMCell(rnn_size,
                                           initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
        return dec_cell

    dec_cell = tf.contrib.rnn.MultiRNNCell([make_cell(rnn_size) for _ in range(num_layers)])
     
    # 3. Dense layer to translate the decoder's output at each time 
    # step into a choice from the target vocabulary
    output_layer = Dense(target_vocab_size,
                         kernel_initializer = tf.truncated_normal_initializer(mean = 0.0, stddev=0.1))


    # 4. Set up a training decoder and an inference decoder
    # Training Decoder
    with tf.variable_scope("decode"):

        # Helper for the training process. Used by BasicDecoder to read inputs.
        training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=dec_embed_input,
                                                            sequence_length=target_sequence_length,
                                                            time_major=False)
        
        
        # Basic decoder
        training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
                                                           training_helper,
                                                           enc_state,
                                                           output_layer) 
        
        # Perform dynamic decoding using the decoder
        training_decoder_output, _ = tf.contrib.seq2seq.dynamic_decode(training_decoder,
                                                                       impute_finished=True,
                                                                       maximum_iterations=max_target_sequence_length)
    # 5. Inference Decoder
    # Reuses the same parameters trained by the training process
    with tf.variable_scope("decode", reuse=True):
        start_tokens = tf.tile(tf.constant([target_letter_to_int['<GO>']], dtype=tf.int32), [batch_size], name='start_tokens')

        # Helper for the inference process.
        inference_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(dec_embeddings,
                                                                start_tokens,
                                                                target_letter_to_int['<EOS>'])

        # Basic decoder
        inference_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
                                                        inference_helper,
                                                        enc_state,
                                                        output_layer)
        
        # Perform dynamic decoding using the decoder
        inference_decoder_output, _ = tf.contrib.seq2seq.dynamic_decode(inference_decoder,
                                                            impute_finished=True,
                                                            maximum_iterations=max_target_sequence_length)
         

    
    return training_decoder_output, inference_decoder_output

2.3 Seq2seq model

Let's now go a step above, and hook up the encoder and decoder using the methods we just declared


In [100]:
def seq2seq_model(input_data, targets, lr, target_sequence_length, 
                  max_target_sequence_length, source_sequence_length,
                  source_vocab_size, target_vocab_size,
                  enc_embedding_size, dec_embedding_size, 
                  rnn_size, num_layers):
    
    # Pass the input data through the encoder. We'll ignore the encoder output, but use the state
    _, enc_state = encoding_layer(input_data, 
                                  rnn_size, 
                                  num_layers, 
                                  source_sequence_length,
                                  source_vocab_size, 
                                  encoding_embedding_size)
    
    
    # Prepare the target sequences we'll feed to the decoder in training mode
    dec_input = process_decoder_input(targets, target_letter_to_int, batch_size)
    
    # Pass encoder state and decoder inputs to the decoders
    training_decoder_output, inference_decoder_output = decoding_layer(target_letter_to_int, 
                                                                       decoding_embedding_size, 
                                                                       num_layers, 
                                                                       rnn_size,
                                                                       target_sequence_length,
                                                                       max_target_sequence_length,
                                                                       enc_state, 
                                                                       dec_input) 
    
    return training_decoder_output, inference_decoder_output

Model outputs training_decoder_output and inference_decoder_output both contain a 'rnn_output' logits tensor that looks like this:

The logits we get from the training tensor we'll pass to tf.contrib.seq2seq.sequence_loss() to calculate the loss and ultimately the gradient.


In [101]:
# Build the graph
train_graph = tf.Graph()
# Set the graph to default to ensure that it is ready for training
with train_graph.as_default():
    
    # Load the model inputs    
    input_data, targets, lr, target_sequence_length, max_target_sequence_length, source_sequence_length = get_model_inputs()
    
    # Create the training and inference logits
    training_decoder_output, inference_decoder_output = seq2seq_model(input_data, 
                                                                      targets, 
                                                                      lr, 
                                                                      target_sequence_length, 
                                                                      max_target_sequence_length, 
                                                                      source_sequence_length,
                                                                      len(source_letter_to_int),
                                                                      len(target_letter_to_int),
                                                                      encoding_embedding_size, 
                                                                      decoding_embedding_size, 
                                                                      rnn_size, 
                                                                      num_layers)    
    
    # Create tensors for the training logits and inference logits
    training_logits = tf.identity(training_decoder_output.rnn_output, 'logits')
    inference_logits = tf.identity(inference_decoder_output.sample_id, name='predictions')
    
    # Create the weights for sequence_loss
    masks = tf.sequence_mask(target_sequence_length, max_target_sequence_length, dtype=tf.float32, name='masks')

    with tf.name_scope("optimization"):
        
        # Loss function
        cost = tf.contrib.seq2seq.sequence_loss(
            training_logits,
            targets,
            masks)

        # Optimizer
        optimizer = tf.train.AdamOptimizer(lr)

        # Gradient Clipping
        gradients = optimizer.compute_gradients(cost)
        capped_gradients = [(tf.clip_by_value(grad, -5., 5.), var) for grad, var in gradients if grad is not None]
        train_op = optimizer.apply_gradients(capped_gradients)

Get Batches

There's little processing involved when we retreive the batches. This is a simple example assuming batch_size = 2

Source sequences (it's actually in int form, we're showing the characters for clarity):

Target sequences (also in int, but showing letters for clarity):


In [102]:
def pad_sentence_batch(sentence_batch, pad_int):
    """Pad sentences with <PAD> so that each sentence of a batch has the same length"""
    max_sentence = max([len(sentence) for sentence in sentence_batch])
    return [sentence + [pad_int] * (max_sentence - len(sentence)) for sentence in sentence_batch]

In [103]:
def get_batches(targets, sources, batch_size, source_pad_int, target_pad_int):
    """Batch targets, sources, and the lengths of their sentences together"""
    for batch_i in range(0, len(sources)//batch_size):
        start_i = batch_i * batch_size
        sources_batch = sources[start_i:start_i + batch_size]
        targets_batch = targets[start_i:start_i + batch_size]
        pad_sources_batch = np.array(pad_sentence_batch(sources_batch, source_pad_int))
        pad_targets_batch = np.array(pad_sentence_batch(targets_batch, target_pad_int))
        
        # Need the lengths for the _lengths parameters
        pad_targets_lengths = []
        for target in pad_targets_batch:
            pad_targets_lengths.append(len(target))
        
        pad_source_lengths = []
        for source in pad_sources_batch:
            pad_source_lengths.append(len(source))
        
        yield pad_targets_batch, pad_sources_batch, pad_targets_lengths, pad_source_lengths

Train

We're now ready to train our model. If you run into OOM (out of memory) issues during training, try to decrease the batch_size.


In [104]:
# Split data to training and validation sets
train_source = source_letter_ids[batch_size:]
train_target = target_letter_ids[batch_size:]
valid_source = source_letter_ids[:batch_size]
valid_target = target_letter_ids[:batch_size]
(valid_targets_batch, valid_sources_batch, valid_targets_lengths, valid_sources_lengths) = next(get_batches(valid_target, valid_source, batch_size,
                           source_letter_to_int['<PAD>'],
                           target_letter_to_int['<PAD>']))

display_step = 20 # Check training loss after every 20 batches

checkpoint = "best_model.ckpt" 
with tf.Session(graph=train_graph) as sess:
    sess.run(tf.global_variables_initializer())
        
    for epoch_i in range(1, epochs+1):
        for batch_i, (targets_batch, sources_batch, targets_lengths, sources_lengths) in enumerate(
                get_batches(train_target, train_source, batch_size,
                           source_letter_to_int['<PAD>'],
                           target_letter_to_int['<PAD>'])):
            
            # Training step
            _, loss = sess.run(
                [train_op, cost],
                {input_data: sources_batch,
                 targets: targets_batch,
                 lr: learning_rate,
                 target_sequence_length: targets_lengths,
                 source_sequence_length: sources_lengths})

            # Debug message updating us on the status of the training
            if batch_i % display_step == 0 and batch_i > 0:
                
                # Calculate validation cost
                validation_loss = sess.run(
                [cost],
                {input_data: valid_sources_batch,
                 targets: valid_targets_batch,
                 lr: learning_rate,
                 target_sequence_length: valid_targets_lengths,
                 source_sequence_length: valid_sources_lengths})
                
                print('Epoch {:>3}/{} Batch {:>4}/{} - Loss: {:>6.3f}  - Validation loss: {:>6.3f}'
                      .format(epoch_i,
                              epochs, 
                              batch_i, 
                              len(train_source) // batch_size, 
                              loss, 
                              validation_loss[0]))

    
    
    # Save Model
    saver = tf.train.Saver()
    saver.save(sess, checkpoint)
    print('Model Trained and Saved')


Epoch   1/60 Batch   20/77 - Loss:  2.527  - Validation loss:  2.541
Epoch   1/60 Batch   40/77 - Loss:  2.305  - Validation loss:  2.261
Epoch   1/60 Batch   60/77 - Loss:  1.971  - Validation loss:  2.017
Epoch   2/60 Batch   20/77 - Loss:  1.666  - Validation loss:  1.750
Epoch   2/60 Batch   40/77 - Loss:  1.683  - Validation loss:  1.648
Epoch   2/60 Batch   60/77 - Loss:  1.522  - Validation loss:  1.558
Epoch   3/60 Batch   20/77 - Loss:  1.384  - Validation loss:  1.459
Epoch   3/60 Batch   40/77 - Loss:  1.460  - Validation loss:  1.430
Epoch   3/60 Batch   60/77 - Loss:  1.373  - Validation loss:  1.400
Epoch   4/60 Batch   20/77 - Loss:  1.250  - Validation loss:  1.326
Epoch   4/60 Batch   40/77 - Loss:  1.305  - Validation loss:  1.277
Epoch   4/60 Batch   60/77 - Loss:  1.188  - Validation loss:  1.234
Epoch   5/60 Batch   20/77 - Loss:  1.126  - Validation loss:  1.179
Epoch   5/60 Batch   40/77 - Loss:  1.187  - Validation loss:  1.155
Epoch   5/60 Batch   60/77 - Loss:  1.092  - Validation loss:  1.126
Epoch   6/60 Batch   20/77 - Loss:  1.014  - Validation loss:  1.068
Epoch   6/60 Batch   40/77 - Loss:  1.061  - Validation loss:  1.035
Epoch   6/60 Batch   60/77 - Loss:  0.975  - Validation loss:  1.002
Epoch   7/60 Batch   20/77 - Loss:  0.875  - Validation loss:  0.912
Epoch   7/60 Batch   40/77 - Loss:  0.916  - Validation loss:  0.876
Epoch   7/60 Batch   60/77 - Loss:  0.831  - Validation loss:  0.841
Epoch   8/60 Batch   20/77 - Loss:  0.745  - Validation loss:  0.786
Epoch   8/60 Batch   40/77 - Loss:  0.804  - Validation loss:  0.756
Epoch   8/60 Batch   60/77 - Loss:  0.722  - Validation loss:  0.729
Epoch   9/60 Batch   20/77 - Loss:  0.642  - Validation loss:  0.680
Epoch   9/60 Batch   40/77 - Loss:  0.699  - Validation loss:  0.660
Epoch   9/60 Batch   60/77 - Loss:  0.625  - Validation loss:  0.635
Epoch  10/60 Batch   20/77 - Loss:  0.565  - Validation loss:  0.595
Epoch  10/60 Batch   40/77 - Loss:  0.616  - Validation loss:  0.581
Epoch  10/60 Batch   60/77 - Loss:  0.550  - Validation loss:  0.562
Epoch  11/60 Batch   20/77 - Loss:  0.504  - Validation loss:  0.528
Epoch  11/60 Batch   40/77 - Loss:  0.563  - Validation loss:  0.525
Epoch  11/60 Batch   60/77 - Loss:  0.496  - Validation loss:  0.504
Epoch  12/60 Batch   20/77 - Loss:  0.455  - Validation loss:  0.480
Epoch  12/60 Batch   40/77 - Loss:  0.501  - Validation loss:  0.475
Epoch  12/60 Batch   60/77 - Loss:  0.455  - Validation loss:  0.459
Epoch  13/60 Batch   20/77 - Loss:  0.409  - Validation loss:  0.433
Epoch  13/60 Batch   40/77 - Loss:  0.455  - Validation loss:  0.423
Epoch  13/60 Batch   60/77 - Loss:  0.403  - Validation loss:  0.411
Epoch  14/60 Batch   20/77 - Loss:  0.370  - Validation loss:  0.393
Epoch  14/60 Batch   40/77 - Loss:  0.418  - Validation loss:  0.382
Epoch  14/60 Batch   60/77 - Loss:  0.358  - Validation loss:  0.366
Epoch  15/60 Batch   20/77 - Loss:  0.328  - Validation loss:  0.346
Epoch  15/60 Batch   40/77 - Loss:  0.376  - Validation loss:  0.340
Epoch  15/60 Batch   60/77 - Loss:  0.321  - Validation loss:  0.332
Epoch  16/60 Batch   20/77 - Loss:  0.305  - Validation loss:  0.315
Epoch  16/60 Batch   40/77 - Loss:  0.340  - Validation loss:  0.306
Epoch  16/60 Batch   60/77 - Loss:  0.289  - Validation loss:  0.297
Epoch  17/60 Batch   20/77 - Loss:  0.257  - Validation loss:  0.281
Epoch  17/60 Batch   40/77 - Loss:  0.309  - Validation loss:  0.279
Epoch  17/60 Batch   60/77 - Loss:  0.261  - Validation loss:  0.271
Epoch  18/60 Batch   20/77 - Loss:  0.226  - Validation loss:  0.255
Epoch  18/60 Batch   40/77 - Loss:  0.277  - Validation loss:  0.249
Epoch  18/60 Batch   60/77 - Loss:  0.234  - Validation loss:  0.239
Epoch  19/60 Batch   20/77 - Loss:  0.199  - Validation loss:  0.227
Epoch  19/60 Batch   40/77 - Loss:  0.248  - Validation loss:  0.222
Epoch  19/60 Batch   60/77 - Loss:  0.207  - Validation loss:  0.213
Epoch  20/60 Batch   20/77 - Loss:  0.178  - Validation loss:  0.205
Epoch  20/60 Batch   40/77 - Loss:  0.222  - Validation loss:  0.199
Epoch  20/60 Batch   60/77 - Loss:  0.185  - Validation loss:  0.191
Epoch  21/60 Batch   20/77 - Loss:  0.157  - Validation loss:  0.182
Epoch  21/60 Batch   40/77 - Loss:  0.196  - Validation loss:  0.177
Epoch  21/60 Batch   60/77 - Loss:  0.163  - Validation loss:  0.170
Epoch  22/60 Batch   20/77 - Loss:  0.150  - Validation loss:  0.159
Epoch  22/60 Batch   40/77 - Loss:  0.176  - Validation loss:  0.165
Epoch  22/60 Batch   60/77 - Loss:  0.147  - Validation loss:  0.165
Epoch  23/60 Batch   20/77 - Loss:  0.127  - Validation loss:  0.147
Epoch  23/60 Batch   40/77 - Loss:  0.157  - Validation loss:  0.144
Epoch  23/60 Batch   60/77 - Loss:  0.134  - Validation loss:  0.144
Epoch  24/60 Batch   20/77 - Loss:  0.115  - Validation loss:  0.134
Epoch  24/60 Batch   40/77 - Loss:  0.141  - Validation loss:  0.147
Epoch  24/60 Batch   60/77 - Loss:  0.131  - Validation loss:  0.149
Epoch  25/60 Batch   20/77 - Loss:  0.104  - Validation loss:  0.125
Epoch  25/60 Batch   40/77 - Loss:  0.127  - Validation loss:  0.123
Epoch  25/60 Batch   60/77 - Loss:  0.109  - Validation loss:  0.117
Epoch  26/60 Batch   20/77 - Loss:  0.088  - Validation loss:  0.111
Epoch  26/60 Batch   40/77 - Loss:  0.115  - Validation loss:  0.112
Epoch  26/60 Batch   60/77 - Loss:  0.097  - Validation loss:  0.106
Epoch  27/60 Batch   20/77 - Loss:  0.080  - Validation loss:  0.100
Epoch  27/60 Batch   40/77 - Loss:  0.107  - Validation loss:  0.103
Epoch  27/60 Batch   60/77 - Loss:  0.088  - Validation loss:  0.096
Epoch  28/60 Batch   20/77 - Loss:  0.071  - Validation loss:  0.091
Epoch  28/60 Batch   40/77 - Loss:  0.093  - Validation loss:  0.096
Epoch  28/60 Batch   60/77 - Loss:  0.081  - Validation loss:  0.087
Epoch  29/60 Batch   20/77 - Loss:  0.065  - Validation loss:  0.084
Epoch  29/60 Batch   40/77 - Loss:  0.085  - Validation loss:  0.089
Epoch  29/60 Batch   60/77 - Loss:  0.078  - Validation loss:  0.080
Epoch  30/60 Batch   20/77 - Loss:  0.057  - Validation loss:  0.075
Epoch  30/60 Batch   40/77 - Loss:  0.075  - Validation loss:  0.081
Epoch  30/60 Batch   60/77 - Loss:  0.075  - Validation loss:  0.075
Epoch  31/60 Batch   20/77 - Loss:  0.052  - Validation loss:  0.070
Epoch  31/60 Batch   40/77 - Loss:  0.067  - Validation loss:  0.075
Epoch  31/60 Batch   60/77 - Loss:  0.061  - Validation loss:  0.075
Epoch  32/60 Batch   20/77 - Loss:  0.047  - Validation loss:  0.063
Epoch  32/60 Batch   40/77 - Loss:  0.061  - Validation loss:  0.065
Epoch  32/60 Batch   60/77 - Loss:  0.055  - Validation loss:  0.062
Epoch  33/60 Batch   20/77 - Loss:  0.042  - Validation loss:  0.058
Epoch  33/60 Batch   40/77 - Loss:  0.056  - Validation loss:  0.063
Epoch  33/60 Batch   60/77 - Loss:  0.052  - Validation loss:  0.058
Epoch  34/60 Batch   20/77 - Loss:  0.038  - Validation loss:  0.053
Epoch  34/60 Batch   40/77 - Loss:  0.052  - Validation loss:  0.055
Epoch  34/60 Batch   60/77 - Loss:  0.046  - Validation loss:  0.053
Epoch  35/60 Batch   20/77 - Loss:  0.034  - Validation loss:  0.049
Epoch  35/60 Batch   40/77 - Loss:  0.049  - Validation loss:  0.050
Epoch  35/60 Batch   60/77 - Loss:  0.042  - Validation loss:  0.048
Epoch  36/60 Batch   20/77 - Loss:  0.030  - Validation loss:  0.045
Epoch  36/60 Batch   40/77 - Loss:  0.045  - Validation loss:  0.047
Epoch  36/60 Batch   60/77 - Loss:  0.038  - Validation loss:  0.045
Epoch  37/60 Batch   20/77 - Loss:  0.027  - Validation loss:  0.042
Epoch  37/60 Batch   40/77 - Loss:  0.041  - Validation loss:  0.044
Epoch  37/60 Batch   60/77 - Loss:  0.035  - Validation loss:  0.041
Epoch  38/60 Batch   20/77 - Loss:  0.025  - Validation loss:  0.039
Epoch  38/60 Batch   40/77 - Loss:  0.037  - Validation loss:  0.041
Epoch  38/60 Batch   60/77 - Loss:  0.032  - Validation loss:  0.039
Epoch  39/60 Batch   20/77 - Loss:  0.022  - Validation loss:  0.037
Epoch  39/60 Batch   40/77 - Loss:  0.034  - Validation loss:  0.039
Epoch  39/60 Batch   60/77 - Loss:  0.029  - Validation loss:  0.036
Epoch  40/60 Batch   20/77 - Loss:  0.020  - Validation loss:  0.034
Epoch  40/60 Batch   40/77 - Loss:  0.030  - Validation loss:  0.036
Epoch  40/60 Batch   60/77 - Loss:  0.026  - Validation loss:  0.035
Epoch  41/60 Batch   20/77 - Loss:  0.018  - Validation loss:  0.033
Epoch  41/60 Batch   40/77 - Loss:  0.027  - Validation loss:  0.034
Epoch  41/60 Batch   60/77 - Loss:  0.024  - Validation loss:  0.033
Epoch  42/60 Batch   20/77 - Loss:  0.017  - Validation loss:  0.031
Epoch  42/60 Batch   40/77 - Loss:  0.025  - Validation loss:  0.032
Epoch  42/60 Batch   60/77 - Loss:  0.022  - Validation loss:  0.031
Epoch  43/60 Batch   20/77 - Loss:  0.015  - Validation loss:  0.030
Epoch  43/60 Batch   40/77 - Loss:  0.022  - Validation loss:  0.029
Epoch  43/60 Batch   60/77 - Loss:  0.020  - Validation loss:  0.030
Epoch  44/60 Batch   20/77 - Loss:  0.014  - Validation loss:  0.029
Epoch  44/60 Batch   40/77 - Loss:  0.020  - Validation loss:  0.027
Epoch  44/60 Batch   60/77 - Loss:  0.019  - Validation loss:  0.029
Epoch  45/60 Batch   20/77 - Loss:  0.013  - Validation loss:  0.027
Epoch  45/60 Batch   40/77 - Loss:  0.019  - Validation loss:  0.025
Epoch  45/60 Batch   60/77 - Loss:  0.017  - Validation loss:  0.027
Epoch  46/60 Batch   20/77 - Loss:  0.012  - Validation loss:  0.025
Epoch  46/60 Batch   40/77 - Loss:  0.017  - Validation loss:  0.024
Epoch  46/60 Batch   60/77 - Loss:  0.016  - Validation loss:  0.025
Epoch  47/60 Batch   20/77 - Loss:  0.026  - Validation loss:  0.046
Epoch  47/60 Batch   40/77 - Loss:  0.027  - Validation loss:  0.040
Epoch  47/60 Batch   60/77 - Loss:  0.021  - Validation loss:  0.030
Epoch  48/60 Batch   20/77 - Loss:  0.011  - Validation loss:  0.025
Epoch  48/60 Batch   40/77 - Loss:  0.017  - Validation loss:  0.023
Epoch  48/60 Batch   60/77 - Loss:  0.014  - Validation loss:  0.022
Epoch  49/60 Batch   20/77 - Loss:  0.010  - Validation loss:  0.024
Epoch  49/60 Batch   40/77 - Loss:  0.014  - Validation loss:  0.022
Epoch  49/60 Batch   60/77 - Loss:  0.013  - Validation loss:  0.021
Epoch  50/60 Batch   20/77 - Loss:  0.009  - Validation loss:  0.023
Epoch  50/60 Batch   40/77 - Loss:  0.013  - Validation loss:  0.020
Epoch  50/60 Batch   60/77 - Loss:  0.012  - Validation loss:  0.020
Epoch  51/60 Batch   20/77 - Loss:  0.008  - Validation loss:  0.022
Epoch  51/60 Batch   40/77 - Loss:  0.012  - Validation loss:  0.019
Epoch  51/60 Batch   60/77 - Loss:  0.011  - Validation loss:  0.018
Epoch  52/60 Batch   20/77 - Loss:  0.008  - Validation loss:  0.020
Epoch  52/60 Batch   40/77 - Loss:  0.011  - Validation loss:  0.017
Epoch  52/60 Batch   60/77 - Loss:  0.010  - Validation loss:  0.018
Epoch  53/60 Batch   20/77 - Loss:  0.007  - Validation loss:  0.020
Epoch  53/60 Batch   40/77 - Loss:  0.010  - Validation loss:  0.016
Epoch  53/60 Batch   60/77 - Loss:  0.009  - Validation loss:  0.017
Epoch  54/60 Batch   20/77 - Loss:  0.007  - Validation loss:  0.019
Epoch  54/60 Batch   40/77 - Loss:  0.009  - Validation loss:  0.016
Epoch  54/60 Batch   60/77 - Loss:  0.009  - Validation loss:  0.016
Epoch  55/60 Batch   20/77 - Loss:  0.006  - Validation loss:  0.018
Epoch  55/60 Batch   40/77 - Loss:  0.009  - Validation loss:  0.015
Epoch  55/60 Batch   60/77 - Loss:  0.008  - Validation loss:  0.015
Epoch  56/60 Batch   20/77 - Loss:  0.006  - Validation loss:  0.017
Epoch  56/60 Batch   40/77 - Loss:  0.008  - Validation loss:  0.014
Epoch  56/60 Batch   60/77 - Loss:  0.008  - Validation loss:  0.015
Epoch  57/60 Batch   20/77 - Loss:  0.005  - Validation loss:  0.017
Epoch  57/60 Batch   40/77 - Loss:  0.008  - Validation loss:  0.014
Epoch  57/60 Batch   60/77 - Loss:  0.007  - Validation loss:  0.014
Epoch  58/60 Batch   20/77 - Loss:  0.005  - Validation loss:  0.016
Epoch  58/60 Batch   40/77 - Loss:  0.007  - Validation loss:  0.013
Epoch  58/60 Batch   60/77 - Loss:  0.007  - Validation loss:  0.014
Epoch  59/60 Batch   20/77 - Loss:  0.005  - Validation loss:  0.015
Epoch  59/60 Batch   40/77 - Loss:  0.007  - Validation loss:  0.012
Epoch  59/60 Batch   60/77 - Loss:  0.006  - Validation loss:  0.013
Epoch  60/60 Batch   20/77 - Loss:  0.005  - Validation loss:  0.015
Epoch  60/60 Batch   40/77 - Loss:  0.006  - Validation loss:  0.012
Epoch  60/60 Batch   60/77 - Loss:  0.006  - Validation loss:  0.013
Model Trained and Saved

Prediction


In [105]:
def source_to_seq(text):
    '''Prepare the text for the model'''
    sequence_length = 7
    return [source_letter_to_int.get(word, source_letter_to_int['<UNK>']) for word in text]+ [source_letter_to_int['<PAD>']]*(sequence_length-len(text))

In [106]:
input_sentence = 'hello'
text = source_to_seq(input_sentence)

checkpoint = "./best_model.ckpt"

loaded_graph = tf.Graph()
with tf.Session(graph=loaded_graph) as sess:
    # Load saved model
    loader = tf.train.import_meta_graph(checkpoint + '.meta')
    loader.restore(sess, checkpoint)

    input_data = loaded_graph.get_tensor_by_name('input:0')
    logits = loaded_graph.get_tensor_by_name('predictions:0')
    source_sequence_length = loaded_graph.get_tensor_by_name('source_sequence_length:0')
    target_sequence_length = loaded_graph.get_tensor_by_name('target_sequence_length:0')
    
    #Multiply by batch_size to match the model's input parameters
    answer_logits = sess.run(logits, {input_data: [text]*batch_size, 
                                      target_sequence_length: [len(text)]*batch_size, 
                                      source_sequence_length: [len(text)]*batch_size})[0] 


pad = source_letter_to_int["<PAD>"] 

print('Original Text:', input_sentence)

print('\nSource')
print('  Word Ids:    {}'.format([i for i in text]))
print('  Input Words: {}'.format(" ".join([source_int_to_letter[i] for i in text])))

print('\nTarget')
print('  Word Ids:       {}'.format([i for i in answer_logits if i != pad]))
print('  Response Words: {}'.format(" ".join([target_int_to_letter[i] for i in answer_logits if i != pad])))


INFO:tensorflow:Restoring parameters from ./best_model.ckpt
Original Text: hello

Source
  Word Ids:    [23, 8, 12, 12, 16, 0, 0]
  Input Words: h e l l o <PAD> <PAD>

Target
  Word Ids:       [9, 23, 12, 12, 16, 3]
  Response Words: e h l l o <EOS>

In [ ]: