TensorFlow Tutorial #17

Estimator API

by Magnus Erik Hvass Pedersen / GitHub / Videos on YouTube

Introduction

High-level API's are extremely important in all software development because they provide simple abstractions for doing very complicated tasks. This makes it easier to write and understand your source-code, and it lowers the risk of errors.

In Tutorial #03 we saw how to use various builder API's for creating Neural Networks in TensorFlow. However, there was a lot of additional code required for training the models and using them on new data. The Estimator is another high-level API that implements most of this, although it can be debated how simple it really is.

Using the Estimator API consists of several steps:

  1. Define functions for inputting data to the Estimator.
  2. Either use an existing Estimator (e.g. a Deep Neural Network), which is also called a pre-made or Canned Estimator. Or create your own Estimator, in which case you also need to define the optimizer, performance metrics, etc.
  3. Train the Estimator using the training-set defined in step 1.
  4. Evaluate the performance of the Estimator on the test-set defined in step 1.
  5. Use the trained Estimator to make predictions on other data.

Imports


In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import tensorflow as tf
import numpy as np


/home/magnus/anaconda3/envs/tf-gpu/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6
  return f(*args, **kwds)

This was developed using Python 3.6 (Anaconda) and TensorFlow version:


In [2]:
tf.__version__


Out[2]:
'1.4.0'

Load Data

The MNIST data-set is about 12 MB and will be downloaded automatically if it is not located in the given path.


In [3]:
from tensorflow.examples.tutorials.mnist import input_data
data = input_data.read_data_sets('data/MNIST/', one_hot=True)


Extracting data/MNIST/train-images-idx3-ubyte.gz
Extracting data/MNIST/train-labels-idx1-ubyte.gz
Extracting data/MNIST/t10k-images-idx3-ubyte.gz
Extracting data/MNIST/t10k-labels-idx1-ubyte.gz

The MNIST data-set has now been loaded and consists of 70,000 images and associated labels (i.e. classifications of the images). The data-set is split into 3 mutually exclusive sub-sets. We will only use the training and test-sets in this tutorial.


In [4]:
print("Size of:")
print("- Training-set:\t\t{}".format(len(data.train.labels)))
print("- Test-set:\t\t{}".format(len(data.test.labels)))
print("- Validation-set:\t{}".format(len(data.validation.labels)))


Size of:
- Training-set:		55000
- Test-set:		10000
- Validation-set:	5000

The class-labels are One-Hot encoded, which means that each label is a vector with 10 elements, all of which are zero except for one element. The index of this one element is the class-number, that is, the digit shown in the associated image. We also need the class-numbers as integers so we calculate that now.


In [5]:
data.train.cls = np.argmax(data.train.labels, axis=1)

In [6]:
data.test.cls = np.argmax(data.test.labels, axis=1)

This is an example of one-hot encoded labels:


In [7]:
data.train.labels[0:10]


Out[7]:
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.],
       [ 0.,  1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  1.,  0.]])

These are the corresponding class-numbers:


In [8]:
data.train.cls[0:10]


Out[8]:
array([7, 3, 4, 6, 1, 8, 1, 0, 9, 8])

Data Dimensions

The data dimensions are used in several places in the source-code below. They are defined once so we can use these variables instead of numbers throughout the source-code below.


In [9]:
# We know that MNIST images are 28 pixels in each dimension.
img_size = 28

# Images are stored in one-dimensional arrays of this length.
img_size_flat = img_size * img_size

# Tuple with height and width of images used to reshape arrays.
img_shape = (img_size, img_size)

# Number of colour channels for the images: 1 channel for gray-scale.
num_channels = 1

# Number of classes, one class for each of 10 digits.
num_classes = 10

Helper-function for plotting images

Function used to plot 9 images in a 3x3 grid, and writing the true and predicted classes below each image.


In [10]:
def plot_images(images, cls_true, cls_pred=None):
    assert len(images) == len(cls_true) == 9
    
    # Create figure with 3x3 sub-plots.
    fig, axes = plt.subplots(3, 3)
    fig.subplots_adjust(hspace=0.3, wspace=0.3)

    for i, ax in enumerate(axes.flat):
        # Plot image.
        ax.imshow(images[i].reshape(img_shape), cmap='binary')

        # Show true and predicted classes.
        if cls_pred is None:
            xlabel = "True: {0}".format(cls_true[i])
        else:
            xlabel = "True: {0}, Pred: {1}".format(cls_true[i], cls_pred[i])

        # Show the classes as the label on the x-axis.
        ax.set_xlabel(xlabel)
        
        # Remove ticks from the plot.
        ax.set_xticks([])
        ax.set_yticks([])
    
    # Ensure the plot is shown correctly with multiple plots
    # in a single Notebook cell.
    plt.show()

Plot a few images to see if data is correct


In [11]:
# Get the first images from the test-set.
images = data.test.images[0:9]

# Get the true classes for those images.
cls_true = data.test.cls[0:9]

# Plot the images and labels using our helper-function above.
plot_images(images=images, cls_true=cls_true)


Input Functions for the Estimator

Rather than providing raw data directly to the Estimator, we must provide functions that return the data. This allows for more flexibility in data-sources and how the data is randomly shuffled and iterated.

Note that we will create an Estimator using the DNNClassifier which assumes the class-numbers are integers so we use data.train.cls instead of data.train.labels which are one-hot encoded arrays.

The function also has parameters for batch_size, queue_capacity and num_threads for finer control of the data reading. In our case we take the data directly from a numpy array in memory, so it is not needed.


In [12]:
train_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(data.train.images)},
    y=np.array(data.train.cls),
    num_epochs=None,
    shuffle=True)

This actually returns a function:


In [13]:
train_input_fn


Out[13]:
<function tensorflow.python.estimator.inputs.numpy_io.numpy_input_fn.<locals>.input_fn>

Calling this function returns a tuple with TensorFlow ops for returning the input and output data:


In [14]:
train_input_fn()


Out[14]:
({'x': <tf.Tensor 'random_shuffle_queue_DequeueMany:1' shape=(128, 784) dtype=float32>},
 <tf.Tensor 'random_shuffle_queue_DequeueMany:2' shape=(128,) dtype=int64>)

Similarly we need to create a function for reading the data for the test-set. Note that we only want to process these images once so num_epochs=1 and we do not want the images shuffled so shuffle=False.


In [15]:
test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": np.array(data.test.images)},
    y=np.array(data.test.cls),
    num_epochs=1,
    shuffle=False)

An input-function is also needed for predicting the class of new data. As an example we just use a few images from the test-set.


In [16]:
some_images = data.test.images[0:9]

In [17]:
predict_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={"x": some_images},
    num_epochs=1,
    shuffle=False)

The class-numbers are actually not used in the input-function as it is not needed for prediction. However, the true class-number is needed when we plot the images further below.


In [18]:
some_images_cls = data.test.cls[0:9]

Pre-Made / Canned Estimator

When using a pre-made Estimator, we need to specify the input features for the data. In this case we want to input images from our data-set which are numeric arrays of the given shape.


In [19]:
feature_x = tf.feature_column.numeric_column("x", shape=img_shape)

You can have several input features which would then be combined in a list:


In [20]:
feature_columns = [feature_x]

In this example we want to use a 3-layer DNN with 512, 256 and 128 units respectively.


In [21]:
num_hidden_units = [512, 256, 128]

The DNNClassifier then constructs the neural network for us. We can also specify the activation function and various other parameters (see the docs). Here we just specify the number of classes and the directory where the checkpoints will be saved.


In [22]:
model = tf.estimator.DNNClassifier(feature_columns=feature_columns,
                                   hidden_units=num_hidden_units,
                                   activation_fn=tf.nn.relu,
                                   n_classes=num_classes,
                                   model_dir="./checkpoints_tutorial17-1/")


INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': './checkpoints_tutorial17-1/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0028f03198>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Training

We can now train the model for a given number of iterations. This automatically loads and saves checkpoints so we can continue the training later.

Note that the text INFO:tensorflow: is printed on every line and makes it harder to quickly read the actual progress. It should have been printed on a single line instead.


In [23]:
model.train(input_fn=train_input_fn, steps=2000)


INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ./checkpoints_tutorial17-1/model.ckpt.
INFO:tensorflow:loss = 300.688, step = 1
INFO:tensorflow:global_step/sec: 370.039
INFO:tensorflow:loss = 26.462, step = 101 (0.271 sec)
INFO:tensorflow:global_step/sec: 521.366
INFO:tensorflow:loss = 22.0528, step = 201 (0.191 sec)
INFO:tensorflow:global_step/sec: 549.886
INFO:tensorflow:loss = 32.07, step = 301 (0.182 sec)
INFO:tensorflow:global_step/sec: 548.856
INFO:tensorflow:loss = 13.8037, step = 401 (0.182 sec)
INFO:tensorflow:global_step/sec: 516.064
INFO:tensorflow:loss = 23.2653, step = 501 (0.194 sec)
INFO:tensorflow:global_step/sec: 552.268
INFO:tensorflow:loss = 17.7141, step = 601 (0.180 sec)
INFO:tensorflow:global_step/sec: 529.426
INFO:tensorflow:loss = 25.7157, step = 701 (0.189 sec)
INFO:tensorflow:global_step/sec: 513.375
INFO:tensorflow:loss = 5.08285, step = 801 (0.195 sec)
INFO:tensorflow:global_step/sec: 536.319
INFO:tensorflow:loss = 10.3937, step = 901 (0.187 sec)
INFO:tensorflow:global_step/sec: 534.847
INFO:tensorflow:loss = 3.12976, step = 1001 (0.187 sec)
INFO:tensorflow:global_step/sec: 540.827
INFO:tensorflow:loss = 5.54126, step = 1101 (0.185 sec)
INFO:tensorflow:global_step/sec: 483.467
INFO:tensorflow:loss = 10.2708, step = 1201 (0.209 sec)
INFO:tensorflow:global_step/sec: 527.042
INFO:tensorflow:loss = 7.62363, step = 1301 (0.187 sec)
INFO:tensorflow:global_step/sec: 557.67
INFO:tensorflow:loss = 2.30585, step = 1401 (0.180 sec)
INFO:tensorflow:global_step/sec: 547.406
INFO:tensorflow:loss = 7.69151, step = 1501 (0.182 sec)
INFO:tensorflow:global_step/sec: 557.682
INFO:tensorflow:loss = 10.7881, step = 1601 (0.179 sec)
INFO:tensorflow:global_step/sec: 547.859
INFO:tensorflow:loss = 7.09411, step = 1701 (0.184 sec)
INFO:tensorflow:global_step/sec: 544.495
INFO:tensorflow:loss = 2.6387, step = 1801 (0.182 sec)
INFO:tensorflow:global_step/sec: 549.648
INFO:tensorflow:loss = 0.772691, step = 1901 (0.182 sec)
INFO:tensorflow:Saving checkpoints for 2000 into ./checkpoints_tutorial17-1/model.ckpt.
INFO:tensorflow:Loss for final step: 7.35222.
Out[23]:
<tensorflow.python.estimator.canned.dnn.DNNClassifier at 0x7f0028f03a90>

Evaluation

Once the model has been trained, we can evaluate its performance on the test-set.


In [24]:
result = model.evaluate(input_fn=test_input_fn)


INFO:tensorflow:Starting evaluation at 2017-11-17-12:07:56
INFO:tensorflow:Restoring parameters from ./checkpoints_tutorial17-1/model.ckpt-2000
INFO:tensorflow:Finished evaluation at 2017-11-17-12:07:56
INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.9727, average_loss = 0.0934177, global_step = 2000, loss = 11.825

In [25]:
result


Out[25]:
{'accuracy': 0.9727,
 'average_loss': 0.093417682,
 'global_step': 2000,
 'loss': 11.825023}

In [26]:
print("Classification accuracy: {0:.2%}".format(result["accuracy"]))


Classification accuracy: 97.27%

Predictions

The trained model can also be used to make predictions on new data.

Note that the TensorFlow graph is recreated and the checkpoint is reloaded every time we make predictions on new data. If the model is very large then this could add a significant overhead.

It is unclear why the Estimator is designed this way, possibly because it will always use the latest checkpoint and it can also be distributed easily for use on multiple computers.


In [27]:
predictions = model.predict(input_fn=predict_input_fn)

In [28]:
cls = [p['classes'] for p in predictions]


INFO:tensorflow:Restoring parameters from ./checkpoints_tutorial17-1/model.ckpt-2000

In [29]:
cls_pred = np.array(cls, dtype='int').squeeze()
cls_pred


Out[29]:
array([7, 2, 1, 0, 4, 1, 4, 9, 5])

In [30]:
plot_images(images=some_images,
            cls_true=some_images_cls,
            cls_pred=cls_pred)


New Estimator

If you cannot use one of the built-in Estimators, then you can create an arbitrary TensorFlow model yourself. To do this, you first need to create a function which defines the following:

  1. The TensorFlow model, e.g. a Convolutional Neural Network.
  2. The output of the model.
  3. The loss-function used to improve the model during optimization.
  4. The optimization method.
  5. Performance metrics.

The Estimator can be run in three modes: Training, Evaluation, or Prediction. The code is mostly the same, but in Prediction-mode we do not need to setup the loss-function and optimizer.

This is another aspect of the Estimator API that is poorly designed and resembles how we did ANSI C programming using structs in the old days. It would probably have been more elegant to split this into several functions and sub-classed the Estimator-class.


In [31]:
def model_fn(features, labels, mode, params):
    # Args:
    #
    # features: This is the x-arg from the input_fn.
    # labels:   This is the y-arg from the input_fn,
    #           see e.g. train_input_fn for these two.
    # mode:     Either TRAIN, EVAL, or PREDICT
    # params:   User-defined hyper-parameters, e.g. learning-rate.
    
    # Reference to the tensor named "x" in the input-function.
    x = features["x"]

    # The convolutional layers expect 4-rank tensors
    # but x is a 2-rank tensor, so reshape it.
    net = tf.reshape(x, [-1, img_size, img_size, num_channels])    

    # First convolutional layer.
    net = tf.layers.conv2d(inputs=net, name='layer_conv1',
                           filters=16, kernel_size=5,
                           padding='same', activation=tf.nn.relu)
    net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)

    # Second convolutional layer.
    net = tf.layers.conv2d(inputs=net, name='layer_conv2',
                           filters=36, kernel_size=5,
                           padding='same', activation=tf.nn.relu)
    net = tf.layers.max_pooling2d(inputs=net, pool_size=2, strides=2)    

    # Flatten to a 2-rank tensor.
    net = tf.contrib.layers.flatten(net)
    # Eventually this should be replaced with:
    # net = tf.layers.flatten(net)

    # First fully-connected / dense layer.
    # This uses the ReLU activation function.
    net = tf.layers.dense(inputs=net, name='layer_fc1',
                          units=128, activation=tf.nn.relu)    

    # Second fully-connected / dense layer.
    # This is the last layer so it does not use an activation function.
    net = tf.layers.dense(inputs=net, name='layer_fc2',
                          units=10)

    # Logits output of the neural network.
    logits = net

    # Softmax output of the neural network.
    y_pred = tf.nn.softmax(logits=logits)
    
    # Classification output of the neural network.
    y_pred_cls = tf.argmax(y_pred, axis=1)

    if mode == tf.estimator.ModeKeys.PREDICT:
        # If the estimator is supposed to be in prediction-mode
        # then use the predicted class-number that is output by
        # the neural network. Optimization etc. is not needed.
        spec = tf.estimator.EstimatorSpec(mode=mode,
                                          predictions=y_pred_cls)
    else:
        # Otherwise the estimator is supposed to be in either
        # training or evaluation-mode. Note that the loss-function
        # is also required in Evaluation mode.
        
        # Define the loss-function to be optimized, by first
        # calculating the cross-entropy between the output of
        # the neural network and the true labels for the input data.
        # This gives the cross-entropy for each image in the batch.
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels,
                                                                       logits=logits)

        # Reduce the cross-entropy batch-tensor to a single number
        # which can be used in optimization of the neural network.
        loss = tf.reduce_mean(cross_entropy)

        # Define the optimizer for improving the neural network.
        optimizer = tf.train.AdamOptimizer(learning_rate=params["learning_rate"])

        # Get the TensorFlow op for doing a single optimization step.
        train_op = optimizer.minimize(
            loss=loss, global_step=tf.train.get_global_step())

        # Define the evaluation metrics,
        # in this case the classification accuracy.
        metrics = \
        {
            "accuracy": tf.metrics.accuracy(labels, y_pred_cls)
        }

        # Wrap all of this in an EstimatorSpec.
        spec = tf.estimator.EstimatorSpec(
            mode=mode,
            loss=loss,
            train_op=train_op,
            eval_metric_ops=metrics)
        
    return spec

Create an Instance of the Estimator

We can specify hyper-parameters e.g. for the learning-rate of the optimizer.


In [32]:
params = {"learning_rate": 1e-4}

We can then create an instance of the new Estimator.

Note that we don't provide feature-columns here as it is inferred automatically from the data-functions when model_fn() is called.

It is unclear from the TensorFlow documentation why it is necessary to specify the feature-columns when using DNNClassifier in the example above, when it is not needed here.


In [33]:
model = tf.estimator.Estimator(model_fn=model_fn,
                               params=params,
                               model_dir="./checkpoints_tutorial17-2/")


INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': './checkpoints_tutorial17-2/', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f00290fac88>, '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}

Training

Now that our new Estimator has been created, we can train it.


In [34]:
model.train(input_fn=train_input_fn, steps=2000)


INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Saving checkpoints for 1 into ./checkpoints_tutorial17-2/model.ckpt.
INFO:tensorflow:loss = 2.33444, step = 1
INFO:tensorflow:global_step/sec: 190.454
INFO:tensorflow:loss = 0.810317, step = 101 (0.527 sec)
INFO:tensorflow:global_step/sec: 198.129
INFO:tensorflow:loss = 0.349305, step = 201 (0.504 sec)
INFO:tensorflow:global_step/sec: 184.116
INFO:tensorflow:loss = 0.288062, step = 301 (0.543 sec)
INFO:tensorflow:global_step/sec: 195.138
INFO:tensorflow:loss = 0.0948148, step = 401 (0.512 sec)
INFO:tensorflow:global_step/sec: 199.116
INFO:tensorflow:loss = 0.203272, step = 501 (0.502 sec)
INFO:tensorflow:global_step/sec: 190.777
INFO:tensorflow:loss = 0.22347, step = 601 (0.524 sec)
INFO:tensorflow:global_step/sec: 198.669
INFO:tensorflow:loss = 0.161297, step = 701 (0.505 sec)
INFO:tensorflow:global_step/sec: 192.277
INFO:tensorflow:loss = 0.154663, step = 801 (0.518 sec)
INFO:tensorflow:global_step/sec: 158.865
INFO:tensorflow:loss = 0.136487, step = 901 (0.634 sec)
INFO:tensorflow:global_step/sec: 121.05
INFO:tensorflow:loss = 0.144933, step = 1001 (0.826 sec)
INFO:tensorflow:global_step/sec: 118.257
INFO:tensorflow:loss = 0.103951, step = 1101 (0.848 sec)
INFO:tensorflow:global_step/sec: 118.136
INFO:tensorflow:loss = 0.133236, step = 1201 (0.845 sec)
INFO:tensorflow:global_step/sec: 112.046
INFO:tensorflow:loss = 0.060983, step = 1301 (0.896 sec)
INFO:tensorflow:global_step/sec: 99.9212
INFO:tensorflow:loss = 0.0838628, step = 1401 (0.997 sec)
INFO:tensorflow:global_step/sec: 115.121
INFO:tensorflow:loss = 0.118691, step = 1501 (0.868 sec)
INFO:tensorflow:global_step/sec: 96.8269
INFO:tensorflow:loss = 0.179758, step = 1601 (1.038 sec)
INFO:tensorflow:global_step/sec: 99.8103
INFO:tensorflow:loss = 0.0996531, step = 1701 (0.998 sec)
INFO:tensorflow:global_step/sec: 128.677
INFO:tensorflow:loss = 0.097964, step = 1801 (0.775 sec)
INFO:tensorflow:global_step/sec: 124.224
INFO:tensorflow:loss = 0.086759, step = 1901 (0.806 sec)
INFO:tensorflow:Saving checkpoints for 2000 into ./checkpoints_tutorial17-2/model.ckpt.
INFO:tensorflow:Loss for final step: 0.0712585.
Out[34]:
<tensorflow.python.estimator.estimator.Estimator at 0x7f0026148f98>

Evaluation

Once the model has been trained, we can evaluate its performance on the test-set.


In [35]:
result = model.evaluate(input_fn=test_input_fn)


INFO:tensorflow:Starting evaluation at 2017-11-17-12:08:18
INFO:tensorflow:Restoring parameters from ./checkpoints_tutorial17-2/model.ckpt-2000
INFO:tensorflow:Finished evaluation at 2017-11-17-12:08:18
INFO:tensorflow:Saving dict for global step 2000: accuracy = 0.9761, global_step = 2000, loss = 0.0760049

In [36]:
result


Out[36]:
{'accuracy': 0.97610003, 'global_step': 2000, 'loss': 0.076004863}

In [37]:
print("Classification accuracy: {0:.2%}".format(result["accuracy"]))


Classification accuracy: 97.61%

Predictions

The model can also be used to make predictions on new data.


In [38]:
predictions = model.predict(input_fn=predict_input_fn)

In [39]:
cls_pred = np.array(list(predictions))
cls_pred


INFO:tensorflow:Restoring parameters from ./checkpoints_tutorial17-2/model.ckpt-2000
Out[39]:
array([7, 2, 1, 0, 4, 1, 4, 9, 5])

In [40]:
plot_images(images=some_images,
            cls_true=some_images_cls,
            cls_pred=cls_pred)


Conclusion

This tutorial showed how to use the Estimator API in TensorFlow. It is supposed to make it easier to train and use a model, but it seems to have several design problems:

  • The Estimator API is complicated, inconsistent and confusing.
  • Error-messages are extremely long and often impossible to understand.
  • The TensorFlow graph is recreated and the checkpoint is reloaded EVERY time you want to use a trained model to make a prediction on new data. Some models are very big so this could add a very large overhead. A better way might be to only reload the model if the checkpoint has changed on disk.
  • It is unclear how to gain access to the trained model, e.g. to plot the weights of a neural network.

It seems that the Estimator API could have been much simpler and easier to use. For small projects you may find it too complicated and confusing to be worth the effort. But it is possible that the Estimator API is useful if you have a very large dataset and if you train on many machines.

Exercises

These are a few suggestions for exercises that may help improve your skills with TensorFlow. It is important to get hands-on experience with TensorFlow in order to learn how to use it properly.

You may want to backup this Notebook before making any changes.

  • Run another 10000 training iterations for each model.
  • Print classification accuracy on the test-set before optimization and after 1000, 2000 and 10000 iterations.
  • Change the structure of the neural network inside the Estimator. Do you have to delete the checkpoint-files? Why?
  • Change the batch-size for the input-functions.
  • In many of the previous tutorials we plotted examples of mis-classified images. Do that here as well.
  • Change the Estimator to use one-hot encoded labels instead of integer class-numbers.
  • Change the input-functions to load image-files instead of using numpy-arrays.
  • Can you find a way to plot the weights of the neural network and the output of the individual layers?
  • List 5 things you like and don't like about the Estimator API. Do you have any suggestions for improvements? Maybe you should suggest them to the developers?
  • Explain to a friend how the program works.

License (MIT)

Copyright (c) 2016-2017 by Magnus Erik Hvass Pedersen

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.