Data Science Summer School - Split '17

Prerequisites: Please download the following zip archive which contains checkpoint you will need in this exercise and put it in assets\checkpoints\ssds\ folder.

3. Character-wise language modeling with multi-layer LSTMs

This hands-on session is based on two tutorial notebooks Intro to Recurrent Networks (Character-wise RNN) and Tensorboard from Udacity's Deep Learning Nanodegree Foundation program.

This notebook implements a multi-layer LSTMs network for training/sampling from character-level language models. The model takes a text file as input and trains the network that learns to predict the next character in a sequence. The network can then be used to generate text character by character that will look like the original training data. This network is based on Andrej Karpathy's post on RNNs, which became standard example for explaining peculiarities behind RNN models.

Good description of LSTM architecture can be found in the article Understanding LSTM Networks.


In [1]:
import time
from collections import namedtuple

import numpy as np
import tensorflow as tf
import random
tf.logging.set_verbosity(tf.logging.ERROR)

3.1 Data preparation

Loading and encoding text

We will train our language model on a complete collection of Donald Trump's tweets obtained from Trump Twitter Archive, which we already downloaded and made available in PATH-TO-REPOSITORY/Day-3/assets/data/trump_tweets_ascii.txt. First, we will load the text file and encode its characters as integers.


In [3]:
with open('assets/data/trump_tweets_ascii.txt', 'r') as f:
    text=f.read()

# get set of characters contained in the loaded text file
vocab = sorted(set(text))

# encoding characters as integers
vocab_to_int = {c: i for i, c in enumerate(vocab)}
encoded_chars = np.array([vocab_to_int[c] for c in text], dtype=np.int32)

# make dict for decoding intergers to corresponding characters
int_to_vocab = dict(enumerate(vocab))

print('Text size: {}'.format(len(encoded_chars)))
print('Vocabulary size: {}'.format(len(vocab)))
print('*******************************')
print('Number of tweets: {}'.format(len(text.split('\n'))))
print('Median size of a tweet: {}'.format(np.percentile([len(t) for t in text.split('\n')], 50)))


Text size: 2167951
Vocabulary size: 92
*******************************
Number of tweets: 20629
Median size of a tweet: 117.0

In the above output, we can see that trump_tweets_ascii.txt contains in total 2 167 951 characters. Tweets contain 92 unique characters which will form a vocabulary for a language model.

Lets see first 300 characters of the provided text:


In [4]:
text[:300]


Out[4]:
'We are building our future with American hands American labor American iron aluminum and steel. Happy #LaborDay! https://t.co/lyvtNfQ5IO\nThe United States is considering in addition to other options stopping all trade with any country doing business with North Korea.\nI will be meeting General Kelly '

And see how they are encoded as integers:


In [5]:
encoded_chars[:300]


Out[5]:
array([53, 66,  1, 62, 79, 66,  1, 63, 82, 70, 73, 65, 70, 75, 68,  1, 76,
       82, 79,  1, 67, 82, 81, 82, 79, 66,  1, 84, 70, 81, 69,  1, 31, 74,
       66, 79, 70, 64, 62, 75,  1, 69, 62, 75, 65, 80,  1, 31, 74, 66, 79,
       70, 64, 62, 75,  1, 73, 62, 63, 76, 79,  1, 31, 74, 66, 79, 70, 64,
       62, 75,  1, 70, 79, 76, 75,  1, 62, 73, 82, 74, 70, 75, 82, 74,  1,
       62, 75, 65,  1, 80, 81, 66, 66, 73, 14,  1, 38, 62, 77, 77, 86,  1,
        4, 42, 62, 63, 76, 79, 34, 62, 86,  2,  1, 69, 81, 81, 77, 80, 26,
       15, 15, 81, 14, 64, 76, 15, 73, 86, 83, 81, 44, 67, 47, 21, 39, 45,
        0, 50, 69, 66,  1, 51, 75, 70, 81, 66, 65,  1, 49, 81, 62, 81, 66,
       80,  1, 70, 80,  1, 64, 76, 75, 80, 70, 65, 66, 79, 70, 75, 68,  1,
       70, 75,  1, 62, 65, 65, 70, 81, 70, 76, 75,  1, 81, 76,  1, 76, 81,
       69, 66, 79,  1, 76, 77, 81, 70, 76, 75, 80,  1, 80, 81, 76, 77, 77,
       70, 75, 68,  1, 62, 73, 73,  1, 81, 79, 62, 65, 66,  1, 84, 70, 81,
       69,  1, 62, 75, 86,  1, 64, 76, 82, 75, 81, 79, 86,  1, 65, 76, 70,
       75, 68,  1, 63, 82, 80, 70, 75, 66, 80, 80,  1, 84, 70, 81, 69,  1,
       44, 76, 79, 81, 69,  1, 41, 76, 79, 66, 62, 14,  0, 39,  1, 84, 70,
       73, 73,  1, 63, 66,  1, 74, 66, 66, 81, 70, 75, 68,  1, 37, 66, 75,
       66, 79, 62, 73,  1, 41, 66, 73, 73, 86,  1])

Making training and validation mini-batches

Neural networks are trained by approximating the gradient of loss function with respect to the neuron weights, by looking at only a small subset of the data, also known as a mini-batch. Here is where we will make our mini-batches for training and validation. Now we need to split up the data into batches, as well as into training and validation sets.

For the test we will observe how the network generates new text, thus we will not be using test set. We will feed a character into the network and sample a next one from the distribution over characters likely to come next. We feed the sampled character right back to get next character. Repeating this process character by character will generate new text, hopefully indistinguishable from Donald Trump's Twitter tweets.


In [6]:
def split_data(arr, batch_size, num_steps, split_frac=0.9):
    """ 
    Split data into batches and training and validation sets.
    
    Arguments
    ---------
    arr: Array of encoded characters as integers 
    batch_size: Number of sequences per batch
    num_steps: Length of the sequence in a batch
    split_frac: Fraction of batches to keep in the training set
    
    
    Returns train_x, train_y, val_x, val_y
    """
    
    slice_size = batch_size * num_steps
    n_batches = int(len(arr) / slice_size)
    
    # Drop the last few characters to make only full batches
    x = arr[: n_batches*slice_size]
    
    # The targets are the same as the inputs, except shifted one character over.
    # number of batches covers full size of arr (no characters dropped)
    if(len(arr) == n_batches*slice_size):
        # for the last target character use first input character
        y = np.roll(x, -1)
    else:
        # for the last target characher use first dropped character
        y = arr[1: n_batches*slice_size + 1]
    
    # Split the data into batch_size slices and then stack slices 
    x = np.stack(np.split(x, batch_size))
    y = np.stack(np.split(y, batch_size))
    
    # Now x and y are arrays with dimensions batch_size x (n_batches x num_steps)
    
    # Split into training and validation sets, keep the first split_frac batches for training
    split_idx = int(n_batches*split_frac)
    train_x, train_y= x[:, :split_idx*num_steps], y[:, :split_idx*num_steps]
    val_x, val_y = x[:, split_idx*num_steps:], y[:, split_idx*num_steps:]
    
    return train_x, train_y, val_x, val_y

Exercise: Generate example integer array. Use function split_data to split example_arr into train and validation sets.


In [7]:
example_arr = np.arange(63)
print(np.array2string(example_arr, max_line_width=100, separator=', '))


[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24,
 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49,
 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62]

In [8]:
batch_size = 5
num_steps = 3
split_frac = 0.9 

train_x, train_y, val_x, val_y = split_data(example_arr, batch_size, num_steps, split_frac)

In [9]:
train_x


Out[9]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8],
       [12, 13, 14, 15, 16, 17, 18, 19, 20],
       [24, 25, 26, 27, 28, 29, 30, 31, 32],
       [36, 37, 38, 39, 40, 41, 42, 43, 44],
       [48, 49, 50, 51, 52, 53, 54, 55, 56]])

In [10]:
train_y


Out[10]:
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9],
       [13, 14, 15, 16, 17, 18, 19, 20, 21],
       [25, 26, 27, 28, 29, 30, 31, 32, 33],
       [37, 38, 39, 40, 41, 42, 43, 44, 45],
       [49, 50, 51, 52, 53, 54, 55, 56, 57]])

Next, we will create a generator function to get batches from the arrays made by split_data. This will provide us with the functionality to iterate over batches, which we can feed to our network model. The arrays are of dimension (batch_size, n_batches*num_steps). Each batch is a sliding window on these arrays with size batch_size X num_steps.


In [11]:
def get_batch(arrs, num_steps):
    batch_size, slice_size = arrs[0].shape
    
    n_batches = int(slice_size/num_steps)
    for b in range(n_batches):
        yield [x[:, b*num_steps: (b+1)*num_steps] for x in arrs]

Exercise: Use the for loop to iterate through all train batches.


In [12]:
for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
    print('\nBatch {}:'.format(b))
    print(np.stack([x,y]))


Batch 1:
[[[ 0  1  2]
  [12 13 14]
  [24 25 26]
  [36 37 38]
  [48 49 50]]

 [[ 1  2  3]
  [13 14 15]
  [25 26 27]
  [37 38 39]
  [49 50 51]]]

Batch 2:
[[[ 3  4  5]
  [15 16 17]
  [27 28 29]
  [39 40 41]
  [51 52 53]]

 [[ 4  5  6]
  [16 17 18]
  [28 29 30]
  [40 41 42]
  [52 53 54]]]

Batch 3:
[[[ 6  7  8]
  [18 19 20]
  [30 31 32]
  [42 43 44]
  [54 55 56]]

 [[ 7  8  9]
  [19 20 21]
  [31 32 33]
  [43 44 45]
  [55 56 57]]]

In [13]:
for b, (x, y) in enumerate(get_batch([val_x, val_y], num_steps), 1):
    print('\nBatch {}:'.format(b))
    print(np.stack([x,y]))


Batch 1:
[[[ 9 10 11]
  [21 22 23]
  [33 34 35]
  [45 46 47]
  [57 58 59]]

 [[10 11 12]
  [22 23 24]
  [34 35 36]
  [46 47 48]
  [58 59 60]]]

3.2 Building the model

After having our data prepared and convenience functions split_data and get_batch for handling the data during the training of our model, we can finally start building the model using the TensorFlow library. We will break the model building into five parts:

  • building input placeholders for x, y and dropout
  • building multi-layer RNN with stacked LSTM cells
  • building softmax output layer
  • computation for training loss
  • building the optimizer for the model parameters

Inputs

First, we will create our input placeholders for Tensorflow computational graph of the model. As we are building supervised learning model, we need to declare placeholders for inputs (x) and targets (y). We also need to one-hot encode the input and target tokens, remember we are getting them as encoded characters. Here, we will also declare scalar placeholder keep_prob for output keep probablity for dropout.

New functions used here:

Exercise: Define placeholders for inputs and targets.


In [14]:
def build_inputs(batch_size, num_steps, num_classes):
    ''' Define placeholders for inputs, targets, and dropout. 
    
        Arguments
        ---------
        batch_size: Batch size, number of sequences per batch
        num_steps: Number of sequence steps in a batch
        num_classes: Number of classes (target values)
        
    '''
    
    with tf.name_scope('inputs'):
        # EXERCISE: Declare placeholder for inputs and one-hot encode inputs
        inputs = tf.placeholder(tf.int32, [batch_size, num_steps], name='inputs')
        x_one_hot = tf.one_hot(inputs, num_classes, name='x_one_hot')
    
    with tf.name_scope('targets'):
        # EXERCISE: Declare placeholder for targets (y) and one-hot encode targets
        targets = tf.placeholder(tf.int32, [batch_size, num_steps], name='targets')
        y_one_hot = tf.one_hot(targets, num_classes, name='y_one_hot')
    
    # Keep probability placeholder for drop out layers
    keep_prob = tf.placeholder(tf.float32, name='keep_prob')
    
    return inputs, x_one_hot, targets, y_one_hot, keep_prob

Multi-layer LSTM Cell

We first implement build_cell function where we create the LSTM cell we will use in the hidden layer. We will use this cell as a building block for the multi-layer RNN. Afterwards, we implement the build_lstm function to create multiple LSTM cells stacked on each other using build_cell function. We can stack up the LSTM cells into layers with tf.contrib.rnn.MultiRNNCell.

Exercise: Fill in build_cell function for building LSTM cell using:


In [15]:
def build_cell(lstm_size, keep_prob):
    ''' Build LSTM cell.
    
        Arguments
        ---------
        lstm_size: Size of the hidden layers in the LSTM cells
        keep_prob: Dropout keep probability
    
    '''
    
    # EXERCISE: Use a basic LSTM cell
    lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)

    # EXERCISE: Add dropout to the cell
    drop = tf.contrib.rnn.DropoutWrapper(lstm, output_keep_prob=keep_prob)
    
    return drop

Exercise: Fill in build_lstm function by stacking layers using tf.contrib.rnn.MultiRNNCell.


In [16]:
def build_lstm(lstm_size, num_layers, batch_size, keep_prob):
    ''' Build Multi-RNN cell.
    
        Arguments
        ---------
        lstm_size: Size of the hidden layers in the LSTM cells
        num_layers: Number of LSTM layers
        batch_size: Batch size
        keep_prob: Dropout keep probability
    
    '''
    
    # EXERCISE: Stack up multiple LSTM layers
    cell = tf.contrib.rnn.MultiRNNCell([build_cell(lstm_size, keep_prob) for _ in range(num_layers)])
    
    with tf.name_scope("RNN_init_state"):
        initial_state = cell.zero_state(batch_size, tf.float32)
    
    return cell, initial_state

Building RNN Output Layer

Here we will create the output layer. We need to connect the output of the RNN cells to a fully connected layer with a softmax output. The softmax output gives us a probability distribution we can use to predict the next character. The output 3D tensor with size $(batch\_size \times num\_steps \times lstm\_size)$ has to be reshaped to $((batch\_size \times num\_steps) \times lstm\_size)$, so we can do the matrix multiplication with the softmax weights.

The output is calculated using softmax function $$ P(y=c\text{ } | \text{ }\mathbf{x}) = \frac{e^{\mathbf{x}^T\mathbf{w}_c+b_c}}{\sum_{k=1}^{|C|}e^{\mathbf{x}^T\mathbf{w}_k+b_k}} ,\\ $$ where $\mathbf{x}\in\mathbb{R}^{512}$ is output of the last hidden layer, and $\mathbf{W}\in\mathbb{R}^{512\times 92}$ and $\mathbf{b}\in\mathbb{R}^{92}$ are the model parameters.

Exercise: Fill in build_output function by defining logits and softmax function.


In [17]:
def build_output(lstm_output, in_size, out_size):
    ''' Build a softmax layer, return the softmax output and logits.
    
        Arguments
        ---------
        
        lstm_output: Output tensor of previous layer
        in_size: Size of the input tensor
        out_size: Size of the softmax layer
    
    '''

    # Reshape output so it is a bunch of rows, one row for each step for each sequence.
    # That is, the shape should be batch_size*num_steps rows by lstm_size columns.
    with tf.name_scope('sequence_reshape'):
        seq_output = tf.concat(lstm_output, axis=1, name='seq_output')
        x = tf.reshape(seq_output, [-1, in_size], name='graph_output')
    
    # Connect the RNN outputs to a softmax layer
    with tf.name_scope('logits'):

        # Since output is a bunch of rows of RNN cell outputs, logits will be a bunch
        # of rows of logit outputs, one for each step and sequence
        
        # EXERCISE: Define W and b and multiply inputs with weights and add bias
        softmax_w = tf.Variable(tf.truncated_normal((in_size, out_size), stddev=0.1), name='softmax_w')
        softmax_b = tf.Variable(tf.zeros(out_size), name='softmax_b')
        logits = tf.matmul(x, softmax_w) + softmax_b
        
        # Tensorboard
        tf.summary.histogram('h_softmax_w', softmax_w)
        tf.summary.histogram('h_softmax_b', softmax_b)
    
    with tf.name_scope('predictions'):
        
        # EXERCISE: Use softmax to get the probabilities for predicted characters
        predictions = tf.nn.softmax(logits, name='predictions')
        
        # Tensorboard
        tf.summary.histogram('h_predictions', predictions)
    
    return predictions, logits

Training loss

Next we need to calculate the training loss. We get the logits and targets and calculate the softmax cross-entropy loss. First, we need to reshape the one-hot targets so it is a 2D tensor with size $((batch\_size \times num\_steps) \times num\_classes)$, which match logits. Remember that we reshaped the LSTM outputs and ran them through a fully connected layer with $num\_classes$ units. Then we run the logits and targets through tf.nn.softmax_cross_entropy_with_logits and find the mean to get the loss.

Exercise: Fill in build loss function:


In [18]:
def build_loss(logits, y_one_hot, lstm_size):
    ''' Calculate the loss from the logits and the targets.
    
        Arguments
        ---------
        logits: Logits from final fully connected layer
        y_one_hot: one hot encoding of target
        lstm_size: Number of LSTM hidden units        
    '''
    
    # Softmax cross entropy loss
    with tf.name_scope('loss'):

        # EXERCISE: Reshape one-hot encoded targets to match logits (one row per batch_size per step)
        # then define loss and cost function
        y_reshaped = tf.reshape(y_one_hot, logits.get_shape(), name='y_reshaped')
        loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=y_reshaped, name='loss')
        cost = tf.reduce_mean(loss, name='cost')
        
        # Tensorboard
        tf.summary.scalar('s_cost', cost)
    
    return cost

Optimizer

Here we build the optimizer. Traditional RNNs face vanishing gradient problem. LSTMs fix the vanishing problem, but the gradients can still grow without bound. To fix this we can clip the gradients larger than some threshold. That is, if a gradient is larger than the prespecified threshold, we set it to the threshold value. This will ensure the gradients never grow too large. Then we use an AdamOptimizer for the learning step.

Exercise: Fill in the function build_optimizer:


In [19]:
def build_optimizer(loss, learning_rate, grad_clip):
    ''' Build optmizer for training, using gradient clipping.
    
        Arguments:
        loss: Network loss
        learning_rate: Learning rate for optimizer
        grad_clip: Clipping ratio
    
    '''
    
    # Optimizer for training, using gradient clipping to control exploding gradients
    with tf.name_scope('optimizer'):
        tvars = tf.trainable_variables()
        
        # EXERCISE: Calculate and clip gradients
        grads, _ = tf.clip_by_global_norm(tf.gradients(loss, tvars), grad_clip)
        
        # EXERCISE: Use Adam optimizer
        train_op = tf.train.AdamOptimizer(learning_rate)
        
        # EXERCISE: Apply gradients to trainable variables
        optimizer = train_op.apply_gradients(zip(grads, tvars))
    
    return optimizer

Build the network

Now we can put all the pieces together and build a class for the network. To actually run data through the LSTM cells, we will use tf.nn.dynamic_rnn. This function will pass the hidden and cell states across LSTM cells appropriately for us. It returns the outputs for each LSTM cell at each step for each sequence in the mini-batch. It also gives us the final LSTM state. We want to save this state as final_state so we can pass it to the first LSTM cell in the the next mini-batch run. For tf.nn.dynamic_rnn, we pass in the cell and initial state we get from build_lstm, as well as our input sequences.

Exercise: Fill in CharRNN class to run each sequence step through the RNN and collect the outputs using tf.nn.dynamic_rnn.


In [20]:
class CharRNN:
    
    def __init__(self, num_classes, batch_size=64, num_steps=50, 
                       lstm_size=128, num_layers=2, learning_rate=0.001, 
                       grad_clip=5, sampling=False):
    
        if sampling == True:
            # When we will use the network for sampling later, we will pass in one character at a time
            batch_size, num_steps = 1, 1
        else:
            batch_size, num_steps = batch_size, num_steps

        tf.reset_default_graph()
        
        # Build the input placeholder tensors, and one-hot encode the input and target tokens
        self.inputs, x_one_hot, self.targets, y_one_hot, self.keep_prob = \
        build_inputs(batch_size, num_steps, num_classes)
        
        # Build the LSTM cell
        cell, self.initial_state = build_lstm(lstm_size, num_layers, batch_size, self.keep_prob)
 
        with tf.name_scope("RNN_forward"):
            
            # EXERCISE: Run each sequence step through the RNN and collect the outputs
            outputs, state = tf.nn.dynamic_rnn(cell, x_one_hot, initial_state=self.initial_state)
        
        self.final_state = state
        
        # Get softmax predictions and logits
        self.prediction, self.logits = build_output(outputs, lstm_size, num_classes)
        
        # Loss and optimizer (with gradient clipping)
        self.loss = build_loss(self.logits, y_one_hot, lstm_size)
        self.optimizer = build_optimizer(self.loss, learning_rate, grad_clip)
        
        self.summary_merged = tf.summary.merge_all()

Hyperparameters

Here we declare the hyperparameters for the network.

  • batch_size - Number of sequences running through the network in one pass.
  • num_steps - Number of characters in the sequence the network is trained on. Larger is better typically, the network will learn more long range dependencies. But it takes longer to train. 100 is typically a good number here.
  • lstm_size - The number of units in the hidden layers.
  • num_layers - Number of hidden LSTM layers to use
  • learning_rate - Learning rate for training
  • keep_prob - The dropout keep probability when training. If you're network is overfitting, try decreasing this.

Here's some good advice from Andrej Karpathy on training the network https://github.com/karpathy/char-rnn#tips-and-tricks.


In [21]:
batch_size = 100        # Sequences per batch
num_steps = 100         # Number of sequence steps per batch
lstm_size = 512         # Size of hidden layers in LSTMs
num_layers = 2          # Number of LSTM layers
learning_rate = 0.001   # Learning rate
keep_prob = 0.5         # Dropout keep probability

Exercise: Create new instance of CharRNN class using parameters defined above. Print trainable variables in the default graph using tensorflow function trainable_variables. Does the number of parameters correspond to what we expect? Hint: Number of parameters in first hidden layer of LSTM is equal to:

$4 \times \big[N_{units} \times (N_{inputs}+1) + N_{units}^{2}\big]$,

where $N_{units}$ is the number of units in hidden layer (lstm_size) and $N_{inputs}$ is the length of the vocabulary.


In [22]:
model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

tf.trainable_variables()


Out[22]:
[<tf.Variable 'rnn/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0' shape=(604, 2048) dtype=float32_ref>,
 <tf.Variable 'rnn/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'rnn/multi_rnn_cell/cell_1/basic_lstm_cell/kernel:0' shape=(1024, 2048) dtype=float32_ref>,
 <tf.Variable 'rnn/multi_rnn_cell/cell_1/basic_lstm_cell/bias:0' shape=(2048,) dtype=float32_ref>,
 <tf.Variable 'logits/softmax_w:0' shape=(512, 92) dtype=float32_ref>,
 <tf.Variable 'logits/softmax_b:0' shape=(92,) dtype=float32_ref>]

Write out the graph for TensorBoard


In [23]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    file_writer = tf.summary.FileWriter('assets/logs/1', sess.graph)
    
    file_writer.close()

Run tensorboard from command line by issuing command (e.g. from root repository directory):

tensorboard --logdir=Day-3/assets/logs/

3.3 Training model

This is typical training code, passing inputs and targets into the network, then running the optimizer. Here we also get back the final LSTM state for the mini-batch. Then, we pass that state back into the network so the next batch can continue the state from the previous batch. And every so often (set by save_every_n) we calculate the validation loss and save a checkpoint.

Please download provided trump_tb_20_i3880_l512_1.327.ckpt checkpoint and place it in assets/checkpoints/ssds direcory in the repository.

Exercise: Fill in the code below:

  • Iterate through all train batches, run session and save loss.
  • Iterate through all validation batches, run session and append validation loss.

In [23]:
epochs = 1 #20
save_every_n = 10 #200
train_x, train_y, val_x, val_y = split_data(encoded_chars, batch_size, num_steps)


model = CharRNN(len(vocab), batch_size=batch_size, num_steps=num_steps,
                lstm_size=lstm_size, num_layers=num_layers, 
                learning_rate=learning_rate)

saver = tf.train.Saver(max_to_keep=100)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Tensorboard
    train_writer = tf.summary.FileWriter('assets/logs/2/train', sess.graph)
    test_writer = tf.summary.FileWriter('assets/logs/2/test')
    
    #############################################################
    # Use the line below to load a checkpoint and resume training
    saver.restore(sess, 'assets/checkpoints/ssds/trump_tb_20_i3880_l512_1.327.ckpt')
    #############################################################
    
    n_batches = int(train_x.shape[1]/num_steps)
    iterations = n_batches * epochs
    
    # Train network
    for e in range(epochs):
        
        new_state = sess.run(model.initial_state)
        loss = 0
        
        # EXERCISE: Iterate through all train batches, run session and save loss
        for b, (x, y) in enumerate(get_batch([train_x, train_y], num_steps), 1):
            start = time.time()
            
            feed = {model.inputs: x,
                    model.targets: y,
                    model.keep_prob: 0.5,
                    model.initial_state: new_state}
            summary, batch_loss, new_state, _ = sess.run([model.summary_merged, model.loss, model.final_state, model.optimizer], 
                                                 feed_dict=feed)
            
            loss += batch_loss
            end = time.time()
            iteration = e*n_batches + b
            print('Epoch {}/{} '.format(e+1, epochs),
                  'Iteration {}/{}'.format(iteration, iterations),
                  'Training loss: {:.4f}'.format(loss/b),
                  '{:.4f} sec/batch'.format((end-start)))
            
            # Tensorboard
            train_writer.add_summary(summary, iteration)
        
            if (iteration%save_every_n == 0) or (iteration == iterations):
                # Check performance, notice dropout has been set to 1
                val_loss = []
                new_state = sess.run(model.initial_state)
                
                # EXERCISE: Same as above, iterate through all validation batches, run session and append validation loss
                for x, y in get_batch([val_x, val_y], num_steps):
                    feed = {model.inputs: x,
                            model.targets: y,
                            model.keep_prob: 1.,
                            model.initial_state: new_state}
                    summary, batch_loss, new_state = sess.run([model.summary_merged, model.loss, model.final_state], feed_dict=feed)
                    val_loss.append(batch_loss)
                
                # Tensorboard
                test_writer.add_summary(summary, iteration)

                print('Validation loss:', np.mean(val_loss),
                      'Saving checkpoint!')
                saver.save(sess, "assets/checkpoints/trump/trump_new_i{}_l{}_{:.3f}.ckpt".format(iteration, lstm_size, np.mean(val_loss)))


Epoch 1/1  Iteration 1/194 Training loss: 1.4705 9.3205 sec/batch
Epoch 1/1  Iteration 2/194 Training loss: 1.4041 7.9575 sec/batch
Epoch 1/1  Iteration 3/194 Training loss: 1.4106 7.8915 sec/batch
Validation loss: 1.33276 Saving checkpoint!
Epoch 1/1  Iteration 4/194 Training loss: 1.4374 7.3634 sec/batch
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-23-77fd76b64b0e> in <module>()
     40                     model.initial_state: new_state}
     41             summary, batch_loss, new_state, _ = sess.run([model.summary_merged, model.loss, model.final_state, model.optimizer], 
---> 42                                                  feed_dict=feed)
     43 
     44             loss += batch_loss

C:\Program Files\Anaconda\envs\ssds\lib\site-packages\tensorflow\python\client\session.py in run(self, fetches, feed_dict, options, run_metadata)
    893     try:
    894       result = self._run(None, fetches, feed_dict, options_ptr,
--> 895                          run_metadata_ptr)
    896       if run_metadata:
    897         proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

C:\Program Files\Anaconda\envs\ssds\lib\site-packages\tensorflow\python\client\session.py in _run(self, handle, fetches, feed_dict, options, run_metadata)
   1122     if final_fetches or final_targets or (handle and feed_dict_tensor):
   1123       results = self._do_run(handle, final_targets, final_fetches,
-> 1124                              feed_dict_tensor, options, run_metadata)
   1125     else:
   1126       results = []

C:\Program Files\Anaconda\envs\ssds\lib\site-packages\tensorflow\python\client\session.py in _do_run(self, handle, target_list, fetch_list, feed_dict, options, run_metadata)
   1319     if handle is None:
   1320       return self._do_call(_run_fn, self._session, feeds, fetches, targets,
-> 1321                            options, run_metadata)
   1322     else:
   1323       return self._do_call(_prun_fn, self._session, handle, feeds, fetches)

C:\Program Files\Anaconda\envs\ssds\lib\site-packages\tensorflow\python\client\session.py in _do_call(self, fn, *args)
   1325   def _do_call(self, fn, *args):
   1326     try:
-> 1327       return fn(*args)
   1328     except errors.OpError as e:
   1329       message = compat.as_text(e.message)

C:\Program Files\Anaconda\envs\ssds\lib\site-packages\tensorflow\python\client\session.py in _run_fn(session, feed_dict, fetch_list, target_list, options, run_metadata)
   1304           return tf_session.TF_Run(session, options,
   1305                                    feed_dict, fetch_list, target_list,
-> 1306                                    status, run_metadata)
   1307 
   1308     def _prun_fn(session, handle, feed_dict, fetch_list):

KeyboardInterrupt: 

Saved checkpoints

Read up on saving and loading checkpoints here: https://www.tensorflow.org/programmers_guide/variables


In [24]:
tf.train.get_checkpoint_state('assets/checkpoints/trump')


Out[24]:
model_checkpoint_path: "assets/checkpoints/trump\\trump_new_i3_l512_1.333.ckpt"
all_model_checkpoint_paths: "assets/checkpoints/trump\\trump_new_i3_l512_1.333.ckpt"

3.4 Testing model - sampling from the model


In [25]:
from IPython.core.display import display, HTML

In [61]:
def pick_top_n(preds, vocab_size, top_n=5):
    p = np.squeeze(preds)
    p[np.argsort(p)[:-top_n]] = 0
    p = p / np.sum(p)
    c = np.random.choice(vocab_size, 1, p=p)[0]
    return c

In [62]:
def sample_model(checkpoint, n_samples, lstm_size, vocab_size, num_layers=2, prime="The "):
    samples = [c for c in prime]
    model = CharRNN(len(vocab), lstm_size=lstm_size, num_layers=num_layers, sampling=True)
    saver = tf.train.Saver()
    
    states = []
    with tf.Session() as sess:
        saver.restore(sess, checkpoint)
        new_state = sess.run(model.initial_state)
         
        for c in prime:
            x = np.zeros((1, 1))
            x[0,0] = vocab_to_int[c]
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)
      
            states.append(new_state)
    
        c = pick_top_n(preds, len(vocab))
        samples.append(int_to_vocab[c])
        states.append(new_state)

        for i in range(n_samples):
            x[0,0] = c
            feed = {model.inputs: x,
                    model.keep_prob: 1.,
                    model.initial_state: new_state}
            preds, new_state = sess.run([model.prediction, model.final_state], 
                                         feed_dict=feed)

            c = pick_top_n(preds, len(vocab))
            samples.append(int_to_vocab[c])
            states.append(new_state)
        
    return (''.join(samples), states)

Exercise: Load the latest checkpoint from assets/checkpoints/trump folder and generate text using sample_model function.


In [63]:
checkpoint = tf.train.latest_checkpoint('assets/checkpoints/trump')
samp, _ = sample_model(checkpoint, 160, lstm_size, len(vocab), prime="Obama")
print(samp)


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-63-92698bd0797f> in <module>()
      1 checkpoint = tf.train.latest_checkpoint('assets/checkpoints/trump')
----> 2 samp, _ = sample_model(checkpoint, 160, lstm_size, len(vocab), prime="Obama")
      3 print(samp)

<ipython-input-62-de4778557dab> in sample_model(checkpoint, n_samples, lstm_size, vocab_size, num_layers, prime)
      6     states = []
      7     with tf.Session() as sess:
----> 8         saver.restore(sess, checkpoint)
      9         new_state = sess.run(model.initial_state)
     10 

C:\Program Files\Anaconda\envs\ssds\lib\site-packages\tensorflow\python\training\saver.py in restore(self, sess, save_path)
   1555       return
   1556     if save_path is None:
-> 1557       raise ValueError("Can't load save_path when it is None.")
   1558     logging.info("Restoring parameters from %s", save_path)
   1559     sess.run(self.saver_def.restore_op_name,

ValueError: Can't load save_path when it is None.

Exercise: Train the model again starting from the initial state for a few iterations and save a checkpoint, then load it and generate text using sample_model function.


In [ ]:
checkpoint = 'assets/checkpoints/trump/EARLY_EPOCH_CHECKPOINT'
samp, _ = sample_model(checkpoint, 1000, lstm_size, len(vocab), prime="Obama (")
print(samp)

3.5 Visualization of memory cell activations


In [43]:
from IPython.core.display import display, HTML
from utils import save_lstm_vis, make_colored_text

Exercise: Load checkpoint trump_tb_20_i3880_l512_1.327.ckpt from assets/checkpoints/ssds/ and generate some sample text using function sample_model. The use utility function make_colored_text to color each character by cell activations in certain layer.


In [44]:
checkpoint = 'assets/checkpoints/ssds/trump_tb_20_i3880_l512_1.327.ckpt'
samp, states = sample_model(checkpoint, 1000, lstm_size, len(vocab), prime="Obama (")
print(samp)


Obama (cont) http://t.co/SAB5500m
The U.S. has a country and will be a begond that they're going to have a great pathing or the press condection. True to my community!
.@BillMeach is an amazing participy won't top crowd. Watch @BarackObama has never been doing a great past release. We have allowed the best plant!
The place of @MittRomney was great off and make the success of our money.
Which is a star star on hell insured. He has not be people are going to think the people. We need sees. Will be the focus and start and and a sense trivele.
I wele that the U.S. would say it on Trump Int'l Hotel &amp; Miss Universe Pageant were fantastic on @FoxNews that @MittRomney has spirad of their strength with the movement for mind.
With the missing of offine and the press conversation is now if what is nuce and the fact that I will so they will stop incredible. Whenered it was great and worse. What will they surprysed it increases?...
In all of my speech in Scotland's today's stall. Will be to be fantasti

Exercise: Use utility funtion make_colored_text and Jupyter widget HTML to visualize cell activations for the text above. Here are some examples of interesting visualizations.

layer_id = 0

cell_id:

  • position in tweet - 4*
  • short urls - 10, 50*, 130, 160, 163, 164, 183, 218, 230
  • separate fixed and variable part of short url - 80, 152
  • just variable part of short url - 75, 84, 118, 273, 380
  • position in short url - 22*, 112, 206, 386
  • urls and references - 115, 403, 483

layer_id = 1

cell_id:

  • just variable part of short url - 21, 107, 250, 300, 420
  • beginning of a word - 22*, 112
  • urls and references - 51, 273, 438
  • position in short url - 202, 326
  • quotation marks - 252*
  • position in a sentence - 413

In [45]:
# Position in a tweet
HTML(make_colored_text(samp, states, cell_id=4, layer_id=0))


Out[45]:
Obama (cont) http://t.co/SAB5500m
The U.S. has a country and will be a begond that they're going to have a great pathing or the press condection. True to my community!
.@BillMeach is an amazing participy won't top crowd. Watch @BarackObama has never been doing a great past release. We have allowed the best plant!
The place of @MittRomney was great off and make the success of our money.
Which is a star star on hell insured. He has not be people are going to think the people. We need sees. Will be the focus and start and and a sense trivele.
I wele that the U.S. would say it on Trump Int'l Hotel &amp; Miss Universe Pageant were fantastic on @FoxNews that @MittRomney has spirad of their strength with the movement for mind.
With the missing of offine and the press conversation is now if what is nuce and the fact that I will so they will stop incredible. Whenered it was great and worse. What will they surprysed it increases?...
In all of my speech in Scotland's today's stall. Will be to be fantasti

In [49]:
# Beggining of a word
HTML(make_colored_text(samp, states, cell_id=22, layer_id=1))


Out[49]:
Obama (cont) http://t.co/SAB5500m
The U.S. has a country and will be a begond that they're going to have a great pathing or the press condection. True to my community!
.@BillMeach is an amazing participy won't top crowd. Watch @BarackObama has never been doing a great past release. We have allowed the best plant!
The place of @MittRomney was great off and make the success of our money.
Which is a star star on hell insured. He has not be people are going to think the people. We need sees. Will be the focus and start and and a sense trivele.
I wele that the U.S. would say it on Trump Int'l Hotel &amp; Miss Universe Pageant were fantastic on @FoxNews that @MittRomney has spirad of their strength with the movement for mind.
With the missing of offine and the press conversation is now if what is nuce and the fact that I will so they will stop incredible. Whenered it was great and worse. What will they surprysed it increases?...
In all of my speech in Scotland's today's stall. Will be to be fantasti

Use the code below to generate html file that contains colorings of the text above from all 512 cells.


In [66]:
save_lstm_vis("assets/html/CA_trump_tb_20_i3880_l512_1.327", samp, states)


Number of layers: 2
Number of memory cells (LSTM size): 512
Saving assets/html/CA_trump_tb_20_i3880_l512_1.327_0.html...
Saving assets/html/CA_trump_tb_20_i3880_l512_1.327_1.html...

Guessing game

In this section you will play a short game of guessing whether the tweet you are shown is real or generated.


In [69]:
with open('assets/data/trump_tweets_ascii.txt') as f:
    tweets_real = f.readlines()

with open('assets/data/trump_tweets_fake.txt') as f:
    tweets_fake = f.readlines()

In [ ]:
score = 0
N = 10
for i in range(N):
    tweet_label = True
    if random.random() <= 0.5:
        tweet_text = random.choice(tweets_real)
    else:
        tweet_text = random.choice(tweets_fake)
        tweet_label = False
    print("\nTweet " + str(i+1) + "/" + str(N) + ": " + tweet_text)
    answer = bool(int(input("true (1) or fake (0): ")))
    if answer^tweet_label:
        print("WRONG!")
    else:
        print("RIGHT!")
        score = score + 1

print("\nYour score: " + str(score) + "/" + str(N))


Tweet 1/10: @marklyvidell  If you're seeing they want to speech and many people and missed them another business and wasted. See anywoed.

true (1) or fake (0): 1
WRONG!

Tweet 2/10: Hillary Clinton should not be given national security briefings in that she is a lose cannon with extraordinarily bad judgement &amp; insticts.