Introduction to Neural Networks, TensorFlow, and its Estimators Interface (with an eye towards learning quantifiers)

About this notebook:

This notebook was written by Shane Steinert-Threlkeld for the Neural Network Methods for Quantifiers coordinated project at the ILLC, Universiteit van Amsterdam in January 2018 (http://shane.st/NNQ).

It introduces the basics of working with TensorFlow to train neural networks, with an eye to applications to quantifiers. In particular, the code is a warm-up to understanding this repository: https://github.com/shanest/quantifier-rnn-learning. The main components of that code that this notebook doesn't directly cover are the lstm_model_fn and the data generation process (data_gen.py), though simpler analogues of both are here.

There are three sections:

  1. Basic TF abstractions: sessions, the graph, Variables/Placeholders
  2. Training a feed-forward neural network to classify bit sequences
  3. Re-doing the above using TF estimators

Intended working environment for this notebook:

  • Python 2.7
  • Tensorflow 1.4

To run: (i) install Jupyter; (ii) save this .ipynb file in a directory; (iii) from that directory, run jupyter notebook; (iv) open this file.

License

Copyright 2018 Shane Steinert-Threlkeld

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

1. TensorFlow Mechanics


In [ ]:
import tensorflow as tf
print tf.__version__

Defining and running a computational graph


In [ ]:
c1 = tf.constant(3.0)
c2 = tf.constant(4.0)
print c1

add1 = tf.add(c1, c2)
add2 = c1 + c2 #same as above, though I prefer to use the `tf.` versions of ops, to be most clear
print add1

Note that what's printed is not the value 3.0, but a Tensor, a TF data-type corresponding to a node in the computational graph.

To get its value, we need to run the graph inside a session.

[Note: it's always good to use a with block to wrap a session, so that it closes automatically.]


In [ ]:
with tf.Session() as sess:
    print sess.run(c1)
    print sess.run(add1)
    # you can also pass a list of ops instead of a single op to `run`
    print sess.run([c1, c2, add1])

Tensors also have a shape, telling you what how many dimensions, and the size of each dimension. I find it to be a good practice to include the shape as a comment above every operation. Because the shape is a property of the Tensor, it can be accessed without running the graph.


In [ ]:
# -- mat: [3, 2]
mat = tf.constant([[1.0, 2.0],
                   [3.0, 4.0],
                   [5.0, 6.0]])
print mat.shape

# -- vec: [2, 1]
vec = tf.constant([[1.0],
                   [1.0]])

# -- mul: [3, 1]
mul = tf.matmul(mat, vec)

with tf.Session() as sess:
    print sess.run(mul)

Variables and placeholders

A neural network learns to approximate a given function by seeing exmples and updating its parameters in order to do a better job at approximating the data it has seen. While we fore-stall an actual discussion of training to the next section, we note two other pieces of machinery that are required for this:

  1. Variables: these are Tensors whose values can be changed. So parameters of a model -- and anything else you want to be updated -- will be Variables.
  2. Placeholders: these are Tensors that represent input to the network/computational graph: their value must be provided externally via what TensorFlow calls a feed_dict.

In [ ]:
W = tf.Variable([[1.0, 2.0],
                   [3.0, 4.0],
                   [5.0, 6.0]])
b = tf.Variable([[1.0],
                 [1.0], 
                 [1.0]])

x = tf.placeholder(shape=(2,1), dtype=tf.float32)

linear = tf.matmul(W, x)
result = tf.add(linear, b)

with tf.Session() as sess:
    # variables must be initialized
    sess.run(tf.global_variables_initializer())
    # result depends on a placeholder, so input must be fed in
    print sess.run(result, feed_dict={x: [[1.0], [1.0]]})

Note that the shape of the placeholder x was specified precisely. While this is good practice, it's often convenient to leave one of the dimensions as None, so that batches of different numbers of input can be sent to the model. (For example, mini-batches during training, one big batch during evaluation. We'll see how this works later.)

2. Training a feed-forward neural network to learn 'at least three'

Generating labeled data

First, we will generate labeled data.

The Xs will be all sequences of 0s and 1s of a specified length.

The Ys will be labels -- 0 or 1 -- provided by a user-defined function that takes a sequence as its input. Here we provide one: at_least_three.

The data is shuffled, so that the order is random. Finally, it is split into training and test sets.


In [ ]:
import itertools as iter
import random
import math

def generate_all_seqs(length, shuffle=True):
    seqs = list(iter.product([0,1], repeat=length))
    if shuffle:
        random.shuffle(seqs)
    return seqs

def at_least_three(seq):
    # we return [0,1] for True and [1,0] for False
    return [0,1] if sum(seq) >= 3 else [1,0]

def get_labeled_data(seqs, func):
    return seqs, [func(seq) for seq in seqs]

# generate all labeled data
SEQ_LEN = 16
NUM_CLASSES = 2
TRAIN_SPLIT = 0.8

X, Y = get_labeled_data(generate_all_seqs(SEQ_LEN), at_least_three)

# split into training and test sets
pivot_index = int(math.ceil(TRAIN_SPLIT*len(X)))

trainX, trainY = X[:pivot_index], Y[:pivot_index]
testX, testY = X[pivot_index:], Y[pivot_index:]

Building a network to classify sequences

We will build the neural network inside a wrapper class which helps readability, separation of code components (graph building, session management/training, et cetera), and the ability to test many different models on the same data.

The initializer builds a simple feed-forward neural network with one hidden layer.

Instances of the class have properties for training, predicting, and evaluating, as well as for inputting sequences and labels. These are the corresponding ops in the graph, so they can be passed directly to Session.run() and used in feed_dicts.


In [ ]:
class FFNN(object):
    
    def __init__(self, input_size, output_size, hidden_size=10):
        
        # first, basic network architecture
        
        # -- inputs: [batch_size, input_size]
        inputs = tf.placeholder(shape=[None, input_size], dtype=tf.float32)
        self._inputs = inputs
        # -- labels: [batch_size, output_size]
        labels = tf.placeholder(shape=[None, output_size], dtype=tf.float32)
        self._labels = labels
        
        # we will have one hidden layer
        # in general, this should be parameterized
        
        # -- weights1: [input_size, hidden_size]
        weights1 = tf.Variable(tf.random_uniform(shape=[input_size, hidden_size]))
        # -- biases1: [hidden_size]
        biases1 = tf.Variable(tf.random_uniform(shape=[hidden_size]))
        # -- linear: [batch_size, hidden_size]
        linear = tf.add(tf.matmul(inputs, weights1), biases1)
        # -- hidden: [batch_size, hidden_size]
        hidden = tf.nn.relu(linear)
        
        # -- weights2: [hidden_size, output_size]
        weights2 = tf.Variable(tf.random_uniform(shape=[hidden_size, output_size]))
        # -- biases2: [output_size]
        biases2 = tf.Variable(tf.random_uniform(shape=[output_size]))
        # -- logits: [batch_size, output_size]
        logits = tf.add(tf.matmul(hidden, weights2), biases2)
        
        # second, define loss and training
        # -- cross_entropy: [batch_size]
        cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
                labels=labels,
                logits=logits)
        # -- loss: []
        loss = tf.reduce_mean(cross_entropy)
        optimizer = tf.train.AdamOptimizer()
        self._train_op = optimizer.minimize(loss)
        
        # finally, some evaluation ops
        
        # -- probabilities: [batch_size, output_size]
        probabilities = tf.nn.softmax(logits)
        self._probabilities = probabilities
        # -- predictions: [batch_size]
        predictions = tf.argmax(probabilities, axis=1)
        # -- targets: [batch_size]
        targets = tf.argmax(labels, axis=1)
        # -- correct_prediction: [batch_size]
        correct_prediction = tf.equal(predictions, targets)
        # -- accuracy: []
        accuracy = tf.reduce_mean(tf.to_float(correct_prediction))
        # more evaluation ops could be added here
        self._eval_dict = {
            'accuracy': accuracy
        }
        
    @property
    def train(self):
        return self._train_op
    
    @property
    def predictions(self):
        return self._probabilities
    
    @property
    def evaluate(self):
        return self._eval_dict
    
    @property
    def inputs(self):
        return self._inputs
    
    @property
    def labels(self):
        return self._labels

Training the network

Here we see the main training loop for our neural network. There are two key parameters to training:

  • number of epochs: how many times to iterate through the whole training set
  • batch size: how large each mini-batch should be. In other words, the network will receive this many labeled examples before computing loss and gradients and updating its parameters.

In general, mini-batches of medium size strike a good balance between speed and variance. If batch size is the size of the training set, then there's no variance in the estimate of the loss and gradients; if the batch size is 1, there's a tremendous amount of variance.


In [ ]:
# reset the graph before building a model
tf.reset_default_graph()

with tf.Session() as sess:

    # build our model
    model = FFNN(SEQ_LEN, NUM_CLASSES)
    # initialize the variables
    sess.run(tf.global_variables_initializer())
    
    # MAIN TRAINING LOOP
    NUM_EPOCHS = 2
    BATCH_SIZE = 12
    num_batches = len(trainX) / BATCH_SIZE
    
    for epoch in xrange(NUM_EPOCHS):
        
        # shuffle the training data at start of each epoch
        train_data = zip(trainX, trainY)
        random.shuffle(train_data)
        trainX = [datum[0] for datum in train_data]
        trainY = [datum[1] for datum in train_data]
        
        for batch_idx in xrange(num_batches):
            # get batch of training data
            batchX = trainX[batch_idx*BATCH_SIZE:(batch_idx+1)*BATCH_SIZE]
            batchY = trainY[batch_idx*BATCH_SIZE:(batch_idx+1)*BATCH_SIZE]
            # train on the batch
            sess.run(model.train, 
                     {model.inputs: batchX,
                      model.labels: batchY})
            
            # evaluate every N training steps (batches)
            if batch_idx % 50 == 0:
                print '\nEpoch {}, batch {}, evaluation'.format(epoch, batch_idx)
                print sess.run(model.evaluate, {model.inputs: testX, model.labels: testY})

3. Re-writing the above using TensorFlow Estimator

The TensorFlow Estimator API -- https://www.tensorflow.org/api_docs/python/tf/estimator -- provides convenience functions that handle a lot of the nitty-gritty around running a training loop, feeding in input data, and things of that sort.

Another benefit of the API: it automatically saves and loads trained models for you, if you use the model_dir argument.

First, we use the library's pre-built DNNClassifier estimator, to show the mechanics of training and evaluating. The basic thing to note is that we have to wrap our training and test datasets in input_functions, so that TensorFlow knows how to feed them to the estimator.

In the next section, we will convert our FFNN class above into a custom-built estimator, to see in more detail how the API works. This is especially important since there are not yet pre-made estimators for RNNs, so the code at https://github.com/shanest/quantifier-rnn-learning implements a custom estimator.

In that next section, I will also show how to implement evaluation inside of a training loop, instead of waiting until the end of training. This is important for our kind of experiments, which want to measure performance on the test set as training proceeds.


In [ ]:
import numpy as np

tf.reset_default_graph()

feature_columns = [tf.feature_column.numeric_column("x", shape=[SEQ_LEN])]

# The library has a pre-made DNNClassifier class
classifier = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                       hidden_units=[10],
                                       n_classes=NUM_CLASSES,
                                       optimizer=tf.train.AdamOptimizer())

train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(trainX)},
    # DNNClassifier wants integer labels, so take argmax of e.g. [0,1] here
    y=np.argmax(trainY, axis=1),
    batch_size=BATCH_SIZE,
    num_epochs=1,
    shuffle=True)

classifier.train(input_fn=train_input_fn)

test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(testX)},
    # DNNClassifier wants integer labels, so take argmax of e.g. [0,1] here
    y=np.argmax(testY, axis=1),
    # one big batch, instead of mini-batches
    batch_size=len(testX),
    shuffle=False)

classifier.evaluate(input_fn=test_input_fn)

Building a custom estimator via a model_fn

To use the tf.estimator library with your own models, one has to define a model_fn. In this section, we convert the above FFNN.__init__ method into such a function. I will also use tf.layers to simplify the code.

Doing so allows one to reap the benefits of estimator while using novel models and/or models for which TF hasn't implemented pre-built estimators.


In [ ]:
# required arguments; params will contain anything custom you want to pass to the model-building function
def ffnn_model_fn(features, labels, mode, params):
    
    # basic network 
    
    # -- inputs: [batch_size, input_size]
    inputs = tf.to_float(features["x"])
    # -- hidden: [batch_size, hidden_size]
    hidden = tf.layers.dense(inputs, params['hidden_size'],
                            activation=params['hidden_activation'])
    # -- logits: [batch_size, num_classes]
    # note: default for tf.layers.dense is no activation, i.e. linear
    logits = tf.layers.dense(hidden, params['num_classes'])

    # predictions
    # -- probs: [batch_size, num_classes]
    probs = tf.nn.softmax(logits)
    # predictions to be output; can be customized!
    out_preds = {'probs': probs,
                'hidden': hidden}
    
    # NOTE: prediction mode needs to be handled first, since TF
    # automatically passes `None` for the `labels` argument in this
    # mode and most loss functions throw an error in that situation
    if mode == tf.estimator.ModeKeys.PREDICT:
        return tf.estimator.EstimatorSpec(mode=mode,
                                         predictions=out_preds)
    
    # training
    # -- cross_entropy: [batch_size]
    cross_entropy = tf.losses.softmax_cross_entropy(
        onehot_labels=labels,
        logits=logits)
    # -- loss: []
    loss = tf.reduce_mean(cross_entropy)
    
    optimizer = tf.train.AdamOptimizer()
    # it's important to pass global_step here!
    train_op = optimizer.minimize(loss,
                                 global_step=tf.train.get_global_step())
    
    # evaluation metrics
    
    # -- predictions: [batch_size]
    predictions = tf.argmax(probs, axis=1)
    # -- targets: [batch_size]
    targets = tf.argmax(labels, axis=1)
    # -- accuracy: scalar
    accuracy = tf.metrics.accuracy(targets, predictions)
    
    # evaluation metrics to be output; can be customized!
    eval_metrics = {'accuracy': accuracy}
    
    # return an estimator spec, specifying mode, loss, train op, predictions, and evaluation metrics
    return tf.estimator.EstimatorSpec(mode=mode,
                                     loss=loss,
                                     train_op=train_op,
                                     eval_metric_ops=eval_metrics)

Once we have built our model_fn, we can train in much the same way as before. The only real difference is that we pass our new function as the model_fn argument to tf.estimator.Estimator, instead of initializing one of the predefined estimators.


In [ ]:
tf.reset_default_graph()

# hyperparameters
hparams = {'hidden_size': 10, 'hidden_activation': tf.nn.relu, 'num_classes': 2}

# to build custom estimator, use model_fn and params arguments
estimator = tf.estimator.Estimator(model_fn=ffnn_model_fn, params=hparams)

new_train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(trainX)},
    y=np.array(trainY),
    batch_size=BATCH_SIZE,
    shuffle=True)

estimator.train(input_fn=new_train_input_fn, steps=50)

new_test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(testX)},
    y=np.array(testY),
    batch_size=len(testX),
    shuffle=False)

estimator.evaluate(input_fn=new_test_input_fn)

Prediction: getting output from an Estimator

The predict method allows you to generate predictions from given inputs to the model. The intended use for this is to use a trained model to generate predictions on new data. For example, a trained image classifier can be given a new image to label.

For the purposes of this project, however, we can make use of some flexibility in predict: in your model_fn, you get to specify exactly what the predict method outputs, as a dictionary whose keys are names and values are Tensors (which will be evaluated).

EXERCISE: modify ffnn_model_fn so that the hidden layer is also output by predict. For what might this be useful?


In [ ]:
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(testX[:5])},
    shuffle=False)

predictions = list(estimator.predict(input_fn=predict_input_fn))
print predictions
for idx in range(5):
    print '{}: {}'.format(testX[idx], predictions[idx]['probs'])

Early stopping and continuous evaluation using SessionRunHook

Using tf.estimator.Estimator.train, while convenient in many ways, appears to give us less control over the training loop. When we manually managed training, it was easy to evaluate during training and to do early stopping (i.e. stop training when a certain condition is met, instead of when the entire training cycle is over).

Luckily, we can re-create these abilities using SessionRunHook. While there are still disadvantages (the model has to be saved/loaded everytime you want to evaluate), the net benefits of estimator are positive.

With a SessionRunHook, you implement behavior that you want to execute before and after every session.run call made by estimator.train(). Full documentation here: https://www.tensorflow.org/api_docs/python/tf/train/SessionRunHook.

To implement evaluation during training, we make a new class, extending SessionRunHook, which takes a given estimator and input_function, and evaluates the estimator on that input every N steps. The more complicated Hook in the main repository also implements early stopping and writes the data to a CSV file at the end of training.

EXERCISE: implement an early stopping condition in after_run of the hook below. Example: when the evaluation loss is below a certain threshold. Hint: run_context.request_stop() should be called when your stop condition is met.


In [ ]:
class EvalDuringHook(tf.train.SessionRunHook):
    
    def __init__(self, estimator, input_fn, num_steps=50):
        self._estimator = estimator
        self._input_fn = input_fn
        self._num_steps = num_steps
        
    def begin(self):

        # get the tensor that keeps track of the global step
        self._global_step_tensor = tf.train.get_or_create_global_step()
        if self._global_step_tensor is None:
            raise ValueError("global_step needed for EvalEarlyStop")

    # before session run calls, put here the tensors you want to run
    # these will be given to after_run
    def before_run(self, run_context):

        requests = {'global_step': self._global_step_tensor}
        return tf.train.SessionRunArgs(requests)

    def after_run(self, run_context, run_values):

        global_step = run_values.results['global_step']
        # evaluate and print if it's the right number of steps
        if (global_step-1) % self._num_steps == 0:
            eval_results = self._estimator.evaluate(input_fn=self._input_fn)
            print eval_results
            if eval_results['loss'] < 0.05:
                run_context.request_stop()
            
tf.reset_default_graph()

# now, we train, passing the Hook to the method
# it's useful to tell TF to also save checkpoints every N steps, which we do with a custom RunConfig
num_steps = 50

run_config = tf.estimator.RunConfig(
        save_checkpoints_steps=num_steps,
        save_checkpoints_secs=None)

new_estimator = tf.estimator.Estimator(model_fn=ffnn_model_fn, params=hparams, config=run_config)

new_estimator.train(input_fn=new_train_input_fn,
                hooks=[EvalDuringHook(new_estimator, new_test_input_fn)])