MNIST Image Classification with TensorFlow

This notebook demonstrates how to implement a simple linear image model on MNIST using the tf.keras API. It builds the foundation for this companion notebook, which explores tackling the same problem with other types of models such as DNN and CNN.

Learning Objectives

Know how to read and display image data
Know how to find incorrect predictions to analyze the model
Visually see how computers see images

This notebook uses TF2.0 Please check your tensorflow version using the cell below. If it is not 2.0, please run the pip line below and restart the kernel.



In [ ]:

    
!sudo chown -R jupyter:jupyter /home/jupyter/training-data-analyst



In [ ]:

    
import os
import shutil

import matplotlib.pyplot as plt
import numpy as np
import tensorflow as tf
from tensorflow.keras import Sequential
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from tensorflow.keras.layers import Dense, Flatten, Softmax

print(tf.__version__)



In [ ]:

    
!python3 -m pip freeze | grep 'tensorflow==2\|tensorflow-gpu==2' || \
    python3 -m pip install tensorflow==2

Exploring the data

The MNIST dataset is already included in tensorflow through the keras datasets module. Let's load it and get a sense of the data.



In [ ]:

    
mnist = tf.keras.datasets.mnist.load_data()
(x_train, y_train), (x_test, y_test) = mnist



In [ ]:

    
HEIGHT, WIDTH = x_train[0].shape
NCLASSES = tf.size(tf.unique(y_train).y)
print("Image height x width is", HEIGHT, "x", WIDTH)
tf.print("There are", NCLASSES, "classes")

Each image is 28 x 28 pixels and represents a digit from 0 to 9. These images are black and white, so each pixel is a value from 0 (white) to 255 (black). Raw numbers can be hard to interpret sometimes, so we can plot the values to see the handwritten digit as an image.



In [ ]:

    
IMGNO = 12
# Uncomment to see raw numerical values.
# print(x_test[IMGNO])
plt.imshow(x_test[IMGNO].reshape(HEIGHT, WIDTH));
print("The label for image number", IMGNO, "is", y_test[IMGNO])

Define the model

Let's start with a very simple linear classifier. This was the first method to be tried on MNIST in 1998, and scored an 88% accuracy. Quite ground breaking at the time!

We can build our linear classifer using the tf.keras API, so we don't have to define or initialize our weights and biases. This happens automatically for us in the background. We can also add a softmax layer to transform the logits into probabilities. Finally, we can compile the model using categorical cross entropy in order to strongly penalize high probability predictions that were incorrect.

When building more complex models such as DNNs and CNNs our code will be more readable by using the tf.keras API. Let's get one working so we can test it and use it as a benchmark.



In [ ]:

    
def linear_model():
    # TODO: Build a sequential model and compile it.
    return model

Write Input Functions

As usual, we need to specify input functions for training and evaluating. We'll scale each pixel value so it's a decimal value between 0 and 1 as a way of normalizing the data.

TODO 1: Define the scale function below and build the dataset



In [ ]:

    
BUFFER_SIZE = 5000
BATCH_SIZE = 100


def scale(image, label):
    # TODO


def load_dataset(training=True):
    """Loads MNIST dataset into a tf.data.Dataset"""
    (x_train, y_train), (x_test, y_test) = mnist
    x = x_train if training else x_test
    y = y_train if training else y_test
    # TODO: a) one-hot encode labels, apply `scale` function, and create dataset.
    # One-hot encode the classes
    if training:
         # TODO
    return dataset



In [ ]:

    
def create_shape_test(training):
    dataset = load_dataset(training=training)
    data_iter = dataset.__iter__()
    (images, labels) = data_iter.get_next()
    expected_image_shape = (BATCH_SIZE, HEIGHT, WIDTH)
    expected_label_ndim = 2
    assert(images.shape == expected_image_shape)
    assert(labels.numpy().ndim == expected_label_ndim)
    test_name = 'training' if training else 'eval'
    print("Test for", test_name, "passed!")


create_shape_test(True)
create_shape_test(False)

Time to train the model! The original MNIST linear classifier had an error rate of 12%. Let's use that to sanity check that our model is learning.



In [ ]:

    
NUM_EPOCHS = 10
STEPS_PER_EPOCH = 100

model = linear_model()
train_data = load_dataset()
validation_data = load_dataset(training=False)

OUTDIR = "mnist_linear/"
checkpoint_callback = ModelCheckpoint(
    OUTDIR, save_weights_only=True, verbose=1)
tensorboard_callback = TensorBoard(log_dir=OUTDIR)

history = model.fit(
    # TODO: specify training/eval data, # epochs, steps per epoch.
    verbose=2,
    callbacks=[checkpoint_callback, tensorboard_callback]
)



In [ ]:

    
BENCHMARK_ERROR = .12
BENCHMARK_ACCURACY = 1 - BENCHMARK_ERROR

accuracy = history.history['accuracy']
val_accuracy = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
    
assert(accuracy[-1] > BENCHMARK_ACCURACY)
assert(val_accuracy[-1] > BENCHMARK_ACCURACY)
print("Test to beat benchmark accuracy passed!")
        
assert(accuracy[0] < accuracy[1])
assert(accuracy[1] < accuracy[-1])
assert(val_accuracy[0] < val_accuracy[1])
assert(val_accuracy[1] < val_accuracy[-1])
print("Test model accuracy is improving passed!")
    
assert(loss[0] > loss[1])
assert(loss[1] > loss[-1])
assert(val_loss[0] > val_loss[1])
assert(val_loss[1] > val_loss[-1])
print("Test loss is decreasing passed!")

Evaluating Predictions

Were you able to get an accuracy of over 90%? Not bad for a linear estimator! Let's make some predictions and see if we can find where the model has trouble. Change the range of values below to find incorrect predictions, and plot the corresponding images. What would you have guessed for these images?

TODO 2: Change the range below to find an incorrect prediction



In [ ]:

    
image_numbers = range(0, 10, 1)  # Change me, please.

def load_prediction_dataset():
    dataset = (x_test[image_numbers], y_test[image_numbers])
    dataset = tf.data.Dataset.from_tensor_slices(dataset)
    dataset = dataset.map(scale).batch(len(image_numbers))
    return dataset

predicted_results = model.predict(load_prediction_dataset())
for index, prediction in enumerate(predicted_results):
    predicted_value = np.argmax(prediction)
    actual_value = y_test[image_numbers[index]]
    if actual_value != predicted_value:
        print("image number: " + str(image_numbers[index]))
        print("the prediction was " + str(predicted_value))
        print("the actual label is " + str(actual_value))
        print("")



In [ ]:

    
bad_image_number = 8
plt.imshow(x_test[bad_image_number].reshape(HEIGHT, WIDTH));

It's understandable why the poor computer would have some trouble. Some of these images are difficult for even humans to read. In fact, we can see what the computer thinks each digit looks like.

Each of the 10 neurons in the dense layer of our model has 785 weights feeding into it. That's 1 weight for every pixel in the image + 1 for a bias term. These weights are flattened feeding into the model, but we can reshape them back into the original image dimensions to see what the computer sees.

TODO 3: Reshape the layer weights to be the shape of an input image and plot.



In [ ]:

    
DIGIT = 0  # Change me to be an integer from 0 to 9.
LAYER = 1  # Layer 0 flattens image, so no weights
WEIGHT_TYPE = 0  # 0 for variable weights, 1 for biases

dense_layer_weights = model.layers[LAYER].get_weights()
digit_weights = dense_layer_weights[WEIGHT_TYPE][:, DIGIT]
plt.imshow(digit_weights.reshape((HEIGHT, WIDTH)))

Did you recognize the digit the computer was trying to learn? Pretty trippy, isn't it! Even with a simple "brain", the computer can form an idea of what a digit should be. The human brain, however, uses layers and layers of calculations for image recognition. Ready for the next challenge? Click here to super charge our models with human-like vision.

Bonus Exercise

Want to push your understanding further? Instead of using Keras' built in layers, try repeating the above exercise with your own custom layers.

Copyright 2019 Google Inc. Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.