In [0]:
#@title Copyright 2020 Google LLC. Double-click here for license information.
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Yann LeCun and Corinna Cortes hold the copyright of MNIST dataset,
# which is a derivative work from original NIST datasets.
# MNIST dataset is made available under the terms of the
# Creative Commons Attribution-Share Alike 3.0 license.
This MNIST dataset contains a lot of examples:
Each example contains a pixel map showing how a person wrote a digit. For example, the following images shows how a person wrote the digit 1
and how that digit might be represented in a 14x14 pixel map (after the input data is normalized).
Each example in the MNIST dataset consists of:
1
to the example.This is a multi-class classification problem with 10 output classes, one for each digit.
In [0]:
#@title Run on TensorFlow 2.x
%tensorflow_version 2.x
from __future__ import absolute_import, division, print_function, unicode_literals
In [0]:
#@title Import relevant modules
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras import layers
from matplotlib import pyplot as plt
# The following lines adjust the granularity of reporting.
pd.options.display.max_rows = 10
pd.options.display.float_format = "{:.1f}".format
# The following line improves formatting when ouputting NumPy arrays.
np.set_printoptions(linewidth = 200)
tf.keras
provides a set of convenience functions for loading well-known datasets. Each of these convenience functions does the following:
The relevant convenience function for MNIST is called mnist.load_data()
:
In [0]:
(x_train, y_train),(x_test, y_test) = tf.keras.datasets.mnist.load_data()
Notice that mnist.load_data()
returned four separate values:
x_train
contains the training set's features.y_train
contains the training set's labels.x_test
contains the test set's features.y_test
contains the test set's labels.Note: The MNIST .csv training set is already shuffled.
The .csv file for the California Housing Dataset contains column names (for example, latitude
, longitude
, population
). By contrast, the .csv file for MNIST does not contain column names. Instead of column names, you use ordinal numbers to access different subsets of the MNIST dataset. In fact, it is probably best to think of x_train
and x_test
as three-dimensional NumPy arrays:
In [0]:
# Output example #2917 of the training set.
x_train[2917]
Alternatively, you can call matplotlib.pyplot.imshow
to interpret the preceding numeric array as an image.
In [0]:
# Use false colors to visualize the array.
plt.imshow(x_train[2917])
In [0]:
# Output row #10 of example #2917.
x_train[2917][10]
In [0]:
# Output pixel #16 of row #10 of example #2900.
x_train[2917][10][16]
In [0]:
x_train_normalized = ?
x_test_normalized = ?
print(x_train_normalized[2900][10]) # Output a normalized row
In [0]:
#@title Double-click to see a solution to Task 1.
x_train_normalized = x_train / 255.0
x_test_normalized = x_test / 255.0
print(x_train_normalized[2900][12]) # Output a normalized row
In [0]:
#@title Define the plotting function
def plot_curve(epochs, hist, list_of_metrics):
"""Plot a curve of one or more classification metrics vs. epoch."""
# list_of_metrics should be one of the names shown in:
# https://www.tensorflow.org/tutorials/structured_data/imbalanced_data#define_the_model_and_metrics
plt.figure()
plt.xlabel("Epoch")
plt.ylabel("Value")
for m in list_of_metrics:
x = hist[m]
plt.plot(epochs[1:], x[1:], label=m)
plt.legend()
print("Loaded the plot_curve function.")
The create_model
function defines the topography of the deep neural net, specifying the following:
The create_model
function also defines the activation function of each layer. The activation function of the output layer is softmax, which will yield 10 different outputs for each example. Each of the 10 outputs provides the probability that the input example is a certain digit.
Note: Unlike several of the recent Colabs, this exercise does not define feature columns or a feature layer. Instead, the model will train on the NumPy array.
In [0]:
def create_model(my_learning_rate):
"""Create and compile a deep neural net."""
# All models in this course are sequential.
model = tf.keras.models.Sequential()
# The features are stored in a two-dimensional 28X28 array.
# Flatten that two-dimensional array into a a one-dimensional
# 784-element array.
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
# Define the first hidden layer.
model.add(tf.keras.layers.Dense(units=32, activation='relu'))
# Define a dropout regularization layer.
model.add(tf.keras.layers.Dropout(rate=0.2))
# Define the output layer. The units parameter is set to 10 because
# the model must choose among 10 possible output values (representing
# the digits from 0 to 9, inclusive).
#
# Don't change this layer.
model.add(tf.keras.layers.Dense(units=10, activation='softmax'))
# Construct the layers into a model that TensorFlow can execute.
# Notice that the loss function for multi-class classification
# is different than the loss function for binary classification.
model.compile(optimizer=tf.keras.optimizers.Adam(lr=my_learning_rate),
loss="sparse_categorical_crossentropy",
metrics=['accuracy'])
return model
def train_model(model, train_features, train_label, epochs,
batch_size=None, validation_split=0.1):
"""Train the model by feeding it data."""
history = model.fit(x=train_features, y=train_label, batch_size=batch_size,
epochs=epochs, shuffle=True,
validation_split=validation_split)
# To track the progression of training, gather a snapshot
# of the model's metrics at each epoch.
epochs = history.epoch
hist = pd.DataFrame(history.history)
return epochs, hist
Run the following code cell to invoke the preceding functions and actually train the model on the training set.
Note: Due to several factors (for example, more examples and a more complex neural network) training MNIST might take longer than training the California Housing Dataset. Be patient.
In [0]:
# The following variables are the hyperparameters.
learning_rate = 0.003
epochs = 50
batch_size = 4000
validation_split = 0.2
# Establish the model's topography.
my_model = create_model(learning_rate)
# Train the model on the normalized training set.
epochs, hist = train_model(my_model, x_train_normalized, y_train,
epochs, batch_size, validation_split)
# Plot a graph of the metric vs. epochs.
list_of_metrics_to_plot = ['accuracy']
plot_curve(epochs, hist, list_of_metrics_to_plot)
# Evaluate against the test set.
print("\n Evaluate the new model against the test set:")
my_model.evaluate(x=x_test_normalized, y=y_test, batch_size=batch_size)
In [0]:
#@title Double-click to view some possible answers.
# It would take much too long to experiment
# fully with topography and dropout regularization
# rate. In the real world, you would
# also experiment with learning rate, batch size,
# and number of epochs. Since you only have a
# few minutes, searching for trends can be helpful.
# Here is what we discovered:
# * Adding more nodes (at least until 256 nodes)
# to the first hidden layer improved accuracy.
# * Adding a second hidden layer generally
# improved accuracy.
# * When the model contains a lot of nodes,
# the model overfits unless the dropout rate
# is at least 0.5.
# We reached 98% test accuracy with the
# following configuration:
# * One hidden layer of 256 nodes; no second
hidden layer.
# * dropout regularization rate of 0.4
# We reached 98.2% test accuracy with the
# following configuration:
# * First hidden layer of 256 nodes;
# second hidden layer of 128 nodes.
# * dropout regularization rate of 0.2