As of 8/28/16, the latest stable release of TF is r0.10, which does not contain the latest version of slim. To obtain the latest version of TF-Slim, please install the most recent nightly build of TF as explained here.
To use TF-Slim for image classification (as we do in this notebook), you also have to install the TF-Slim image models library from here. Let's suppose you install this into a directory called TF_MODELS. Then you should change directory to TF_MODELS/slim before running this notebook, so that these files are in your python path.
To check you've got these two steps to work, just execute the cell below. If it complains about unknown modules, restart the notebook after moving to the TF-Slim models directory.
In [ ]:
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import math
import numpy as np
import tensorflow as tf
import time
from datasets import dataset_utils
# Main slim library
slim = tf.contrib.slim
Below we give some code to create a simple multilayer perceptron (MLP) which can be used for regression problems. The model has 2 hidden layers. The output is a single node. When this function is called, it will create various nodes, and silently add them to whichever global TF graph is currently in scope. When a node which corresponds to a layer with adjustable parameters (eg., a fully connected layer) is created, additional parameter variable nodes are silently created, and added to the graph. (We will discuss how to train the parameters later.)
We use variable scope to put all the nodes under a common name, so that the graph has some hierarchical structure. This is useful when we want to visualize the TF graph in tensorboard, or if we want to query related variables. The fully connected layers all use the same L2 weight decay and ReLu activations, as specified by arg_scope. (However, the final layer overrides these defaults, and uses an identity activation function.)
We also illustrate how to add a dropout layer after the first fully connected layer (FC1). Note that at test time, we do not drop out nodes, but instead use the average activations; hence we need to know whether the model is being constructed for training or testing, since the computational graph will be different in the two cases (although the variables, storing the model parameters, will be shared, since they have the same name/scope).
In [ ]:
def regression_model(inputs, is_training=True, scope="deep_regression"):
"""Creates the regression model.
Args:
inputs: A node that yields a `Tensor` of size [batch_size, dimensions].
is_training: Whether or not we're currently training the model.
scope: An optional variable_op scope for the model.
Returns:
predictions: 1-D `Tensor` of shape [batch_size] of responses.
end_points: A dict of end points representing the hidden layers.
"""
with tf.variable_scope(scope, 'deep_regression', [inputs]):
end_points = {}
# Set the default weight _regularizer and acvitation for each fully_connected layer.
with slim.arg_scope([slim.fully_connected],
activation_fn=tf.nn.relu,
weights_regularizer=slim.l2_regularizer(0.01)):
# Creates a fully connected layer from the inputs with 32 hidden units.
net = slim.fully_connected(inputs, 32, scope='fc1')
end_points['fc1'] = net
# Adds a dropout layer to prevent over-fitting.
net = slim.dropout(net, 0.8, is_training=is_training)
# Adds another fully connected layer with 16 hidden units.
net = slim.fully_connected(net, 16, scope='fc2')
end_points['fc2'] = net
# Creates a fully-connected layer with a single hidden unit. Note that the
# layer is made linear by setting activation_fn=None.
predictions = slim.fully_connected(net, 1, activation_fn=None, scope='prediction')
end_points['out'] = predictions
return predictions, end_points
We create a TF graph and call regression_model(), which adds nodes (tensors) to the graph. We then examine their shape, and print the names of all the model variables which have been implicitly created inside of each layer. We see that the names of the variables follow the scopes that we specified.
In [ ]:
with tf.Graph().as_default():
# Dummy placeholders for arbitrary number of 1d inputs and outputs
inputs = tf.placeholder(tf.float32, shape=(None, 1))
outputs = tf.placeholder(tf.float32, shape=(None, 1))
# Build model
predictions, end_points = regression_model(inputs)
# Print name and shape of each tensor.
print "Layers"
for k, v in end_points.iteritems():
print 'name = {}, shape = {}'.format(v.name, v.get_shape())
# Print name and shape of parameter nodes (values not yet initialized)
print "\n"
print "Parameters"
for v in slim.get_model_variables():
print 'name = {}, shape = {}'.format(v.name, v.get_shape())
In [ ]:
def produce_batch(batch_size, noise=0.3):
xs = np.random.random(size=[batch_size, 1]) * 10
ys = np.sin(xs) + 5 + np.random.normal(size=[batch_size, 1], scale=noise)
return [xs.astype(np.float32), ys.astype(np.float32)]
x_train, y_train = produce_batch(200)
x_test, y_test = produce_batch(200)
plt.scatter(x_train, y_train)
The user has to specify the loss function and the optimizer, and slim does the rest. In particular, the slim.learning.train function does the following:
In [ ]:
def convert_data_to_tensors(x, y):
inputs = tf.constant(x)
inputs.set_shape([None, 1])
outputs = tf.constant(y)
outputs.set_shape([None, 1])
return inputs, outputs
In [ ]:
# The following snippet trains the regression model using a sum_of_squares loss.
ckpt_dir = '/tmp/regression_model/'
with tf.Graph().as_default():
tf.logging.set_verbosity(tf.logging.INFO)
inputs, targets = convert_data_to_tensors(x_train, y_train)
# Make the model.
predictions, nodes = regression_model(inputs, is_training=True)
# Add the loss function to the graph.
loss = slim.losses.sum_of_squares(predictions, targets)
# The total loss is the uers's loss plus any regularization losses.
total_loss = slim.losses.get_total_loss()
# Specify the optimizer and create the train op:
optimizer = tf.train.AdamOptimizer(learning_rate=0.005)
train_op = slim.learning.create_train_op(total_loss, optimizer)
# Run the training inside a session.
final_loss = slim.learning.train(
train_op,
logdir=ckpt_dir,
number_of_steps=5000,
save_summaries_secs=5,
log_every_n_steps=500)
print("Finished training. Last batch loss:", final_loss)
print("Checkpoint saved in %s" % ckpt_dir)
In [ ]:
with tf.Graph().as_default():
inputs, targets = convert_data_to_tensors(x_train, y_train)
predictions, end_points = regression_model(inputs, is_training=True)
# Add multiple loss nodes.
sum_of_squares_loss = slim.losses.sum_of_squares(predictions, targets)
absolute_difference_loss = slim.losses.absolute_difference(predictions, targets)
# The following two ways to compute the total loss are equivalent
regularization_loss = tf.add_n(slim.losses.get_regularization_losses())
total_loss1 = sum_of_squares_loss + absolute_difference_loss + regularization_loss
# Regularization Loss is included in the total loss by default.
# This is good for training, but not for testing.
total_loss2 = slim.losses.get_total_loss(add_regularization_losses=True)
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
sess.run(init_op) # Will initialize the parameters with random weights.
total_loss1, total_loss2 = sess.run([total_loss1, total_loss2])
print('Total Loss1: %f' % total_loss1)
print('Total Loss2: %f' % total_loss2)
print('Regularization Losses:')
for loss in slim.losses.get_regularization_losses():
print(loss)
print('Loss Functions:')
for loss in slim.losses.get_losses():
print(loss)
In [ ]:
with tf.Graph().as_default():
inputs, targets = convert_data_to_tensors(x_test, y_test)
# Create the model structure. (Parameters will be loaded below.)
predictions, end_points = regression_model(inputs, is_training=False)
# Make a session which restores the old parameters from a checkpoint.
sv = tf.train.Supervisor(logdir=ckpt_dir)
with sv.managed_session() as sess:
inputs, predictions, targets = sess.run([inputs, predictions, targets])
plt.scatter(inputs, targets, c='r');
plt.scatter(inputs, predictions, c='b');
plt.title('red=true, blue=predicted')
In TF-Slim termiology, losses are optimized, but metrics (which may not be differentiable, e.g., precision and recall) are just measured. As an illustration, the code below computes mean squared error and mean absolute error metrics on the test set.
Each metric declaration creates several local variables (which must be initialized via tf.initialize_local_variables()) and returns both a value_op and an update_op. When evaluated, the value_op returns the current value of the metric. The update_op loads a new batch of data, runs the model, obtains the predictions and accumulates the metric statistics appropriately before returning the current value of the metric. We store these value nodes and update nodes in 2 dictionaries.
After creating the metric nodes, we can pass them to slim.evaluation.evaluation, which repeatedly evaluates these nodes the specified number of times. (This allows us to compute the evaluation in a streaming fashion across minibatches, which is usefulf for large datasets.) Finally, we print the final value of each metric.
In [ ]:
with tf.Graph().as_default():
inputs, targets = convert_data_to_tensors(x_test, y_test)
predictions, end_points = regression_model(inputs, is_training=False)
# Specify metrics to evaluate:
names_to_value_nodes, names_to_update_nodes = slim.metrics.aggregate_metric_map({
'Mean Squared Error': slim.metrics.streaming_mean_squared_error(predictions, targets),
'Mean Absolute Error': slim.metrics.streaming_mean_absolute_error(predictions, targets)
})
# Make a session which restores the old graph parameters, and then run eval.
sv = tf.train.Supervisor(logdir=ckpt_dir)
with sv.managed_session() as sess:
metric_values = slim.evaluation.evaluation(
sess,
num_evals=1, # Single pass over data
eval_op=names_to_update_nodes.values(),
final_op=names_to_value_nodes.values())
names_to_values = dict(zip(names_to_value_nodes.keys(), metric_values))
for key, value in names_to_values.iteritems():
print('%s: %f' % (key, value))
Reading data with TF-Slim has two main components: A Dataset and a DatasetDataProvider. The former is a descriptor of a dataset, while the latter performs the actions necessary for actually reading the data. Lets look at each one in detail:
A TF-Slim Dataset contains descriptive information about a dataset necessary for reading it, such as the list of data files and how to decode them. It also contains metadata including class labels, the size of the train/test splits and descriptions of the tensors that the dataset provides. For example, some datasets contain images with labels. Others augment this data with bounding box annotations, etc. The Dataset object allows us to write generic code using the same API, regardless of the data content and encoding type.
TF-Slim's Dataset works especially well when the data is stored as a (possibly sharded) TFRecords file, where each record contains a tf.train.Example protocol buffer. TF-Slim uses a consistent convention for naming the keys and values inside each Example record.
A DatasetDataProvider is a class which actually reads the data from a dataset. It is highly configurable to read the data in various ways that may make a big impact on the efficiency of your training process. For example, it can be single or multi-threaded. If your data is sharded across many files, it can read each files serially, or from every file simultaneously.
For convenience, we've include scripts to convert several common image datasets into TFRecord format and have provided the Dataset descriptor files necessary for reading them. We demonstrate how easy it is to use these dataset via the Flowers dataset below.
In [ ]:
import tensorflow as tf
from datasets import dataset_utils
url = "http://download.tensorflow.org/data/flowers.tar.gz"
flowers_data_dir = '/tmp/flowers'
if not tf.gfile.Exists(flowers_data_dir):
tf.gfile.MakeDirs(flowers_data_dir)
dataset_utils.download_and_uncompress_tarball(url, flowers_data_dir)
In [ ]:
from datasets import flowers
import tensorflow as tf
slim = tf.contrib.slim
with tf.Graph().as_default():
dataset = flowers.get_split('train', flowers_data_dir)
data_provider = slim.dataset_data_provider.DatasetDataProvider(
dataset, common_queue_capacity=32, common_queue_min=1)
image, label = data_provider.get(['image', 'label'])
with tf.Session() as sess:
with slim.queues.QueueRunners(sess):
for i in xrange(4):
np_image, np_label = sess.run([image, label])
height, width, _ = np_image.shape
class_name = name = dataset.labels_to_names[np_label]
plt.figure()
plt.imshow(np_image)
plt.title('%s, %d x %d' % (name, height, width))
plt.axis('off')
plt.show()
In [ ]:
def my_cnn(images, num_classes, is_training): # is_training is not used...
with slim.arg_scope([slim.max_pool2d], kernel_size=[3, 3], stride=2):
net = slim.conv2d(images, 64, [5, 5])
net = slim.max_pool2d(net)
net = slim.conv2d(net, 64, [5, 5])
net = slim.max_pool2d(net)
net = slim.flatten(net)
net = slim.fully_connected(net, 192)
net = slim.fully_connected(net, num_classes, activation_fn=None)
return net
In [ ]:
import tensorflow as tf
with tf.Graph().as_default():
# The model can handle any input size because the first layer is convolutional.
# The size of the model is determined when image_node is first passed into the my_cnn function.
# Once the variables are initialized, the size of all the weight matrices is fixed.
# Because of the fully connected layers, this means that all subsequent images must have the same
# input size as the first image.
batch_size, height, width, channels = 3, 28, 28, 3
images = tf.random_uniform([batch_size, height, width, channels], maxval=1)
# Create the model.
num_classes = 10
logits = my_cnn(images, num_classes, is_training=True)
probabilities = tf.nn.softmax(logits)
# Initialize all the variables (including parameters) randomly.
init_op = tf.initialize_all_variables()
with tf.Session() as sess:
# Run the init_op, evaluate the model outputs and print the results:
sess.run(init_op)
probabilities = sess.run(probabilities)
print('Probabilities Shape:')
print(probabilities.shape) # batch_size x num_classes
print('\nProbabilities:')
print(probabilities)
print('\nSumming across all classes (Should equal 1):')
print(np.sum(probabilities, 1)) # Each row sums to 1
Before starting, make sure you've run the code to Download the Flowers dataset. Now, we'll get a sense of what it looks like to use TF-Slim's training functions found in
learning.py. First, we'll create a function, load_batch
, that loads batches of dataset from a dataset. Next, we'll train a model for a single step (just to demonstrate the API), and evaluate the results.
In [ ]:
from preprocessing import inception_preprocessing
import tensorflow as tf
slim = tf.contrib.slim
def load_batch(dataset, batch_size=32, height=299, width=299, is_training=False):
"""Loads a single batch of data.
Args:
dataset: The dataset to load.
batch_size: The number of images in the batch.
height: The size of each image after preprocessing.
width: The size of each image after preprocessing.
is_training: Whether or not we're currently training or evaluating.
Returns:
images: A Tensor of size [batch_size, height, width, 3], image samples that have been preprocessed.
images_raw: A Tensor of size [batch_size, height, width, 3], image samples that can be used for visualization.
labels: A Tensor of size [batch_size], whose values range between 0 and dataset.num_classes.
"""
data_provider = slim.dataset_data_provider.DatasetDataProvider(
dataset, common_queue_capacity=32,
common_queue_min=8)
image_raw, label = data_provider.get(['image', 'label'])
# Preprocess image for usage by Inception.
image = inception_preprocessing.preprocess_image(image_raw, height, width, is_training=is_training)
# Preprocess the image for display purposes.
image_raw = tf.expand_dims(image_raw, 0)
image_raw = tf.image.resize_images(image_raw, height, width)
image_raw = tf.squeeze(image_raw)
# Batch it up.
images, images_raw, labels = tf.train.batch(
[image, image_raw, label],
batch_size=batch_size,
num_threads=1,
capacity=2 * batch_size)
return images, images_raw, labels
In [ ]:
from datasets import flowers
# This might take a few minutes.
train_dir = '/tmp/tfslim_model/'
print('Will save model to %s' % train_dir)
with tf.Graph().as_default():
tf.logging.set_verbosity(tf.logging.INFO)
dataset = flowers.get_split('train', flowers_data_dir)
images, _, labels = load_batch(dataset)
# Create the model:
logits = my_cnn(images, num_classes=dataset.num_classes, is_training=True)
# Specify the loss function:
one_hot_labels = slim.one_hot_encoding(labels, dataset.num_classes)
slim.losses.softmax_cross_entropy(logits, one_hot_labels)
total_loss = slim.losses.get_total_loss()
# Create some summaries to visualize the training process:
tf.scalar_summary('losses/Total Loss', total_loss)
# Specify the optimizer and create the train op:
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = slim.learning.create_train_op(total_loss, optimizer)
# Run the training:
final_loss = slim.learning.train(
train_op,
logdir=train_dir,
number_of_steps=1, # For speed, we just do 1 epoch
save_summaries_secs=1)
print('Finished training. Final batch loss %d' % final_loss)
As we discussed above, we can compute various metrics besides the loss. Below we show how to compute prediction accuracy of the trained model, as well as top-5 classification accuracy. (The difference between evaluation and evaluation_loop is that the latter writes the results to a log directory, so they can be viewed in tensorboard.)
In [ ]:
from datasets import flowers
# This might take a few minutes.
with tf.Graph().as_default():
tf.logging.set_verbosity(tf.logging.DEBUG)
dataset = flowers.get_split('train', flowers_data_dir)
images, _, labels = load_batch(dataset)
logits = my_cnn(images, num_classes=dataset.num_classes, is_training=False)
predictions = tf.argmax(logits, 1)
# Define the metrics:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
'eval/Accuracy': slim.metrics.streaming_accuracy(predictions, labels),
'eval/Recall@5': slim.metrics.streaming_recall_at_k(logits, labels, 5),
})
print('Running evaluation Loop...')
checkpoint_path = tf.train.latest_checkpoint(train_dir)
metric_values = slim.evaluation.evaluate_once(
master='',
checkpoint_path=checkpoint_path,
logdir=train_dir,
eval_op=names_to_updates.values(),
final_op=names_to_values.values())
names_to_values = dict(zip(names_to_values.keys(), metric_values))
for name in names_to_values:
print('%s: %f' % (name, names_to_values[name]))
Neural nets work best when they have many parameters, making them very flexible function approximators. However, this means they must be trained on big datasets. Since this process is slow, we provide various pre-trained models - see the list here.
You can either use these models as-is, or you can perform "surgery" on them, to modify them for some other task. For example, it is common to "chop off" the final pre-softmax layer, and replace it with a new set of weights corresponding to some new set of labels. You can then quickly fine tune the new model on a small new dataset. We illustrate this below, using inception-v1 as the base model. While models like Inception V3 are more powerful, Inception V1 is used for speed purposes.
In [ ]:
from datasets import dataset_utils
url = "http://download.tensorflow.org/models/inception_v1_2016_08_28.tar.gz"
checkpoints_dir = '/tmp/checkpoints'
if not tf.gfile.Exists(checkpoints_dir):
tf.gfile.MakeDirs(checkpoints_dir)
dataset_utils.download_and_uncompress_tarball(url, checkpoints_dir)
In [ ]:
import numpy as np
import os
import tensorflow as tf
import urllib2
from datasets import imagenet
from nets import inception
from preprocessing import inception_preprocessing
slim = tf.contrib.slim
batch_size = 3
image_size = inception.inception_v1.default_image_size
with tf.Graph().as_default():
url = 'https://upload.wikimedia.org/wikipedia/commons/7/70/EnglishCockerSpaniel_simon.jpg'
image_string = urllib2.urlopen(url).read()
image = tf.image.decode_jpeg(image_string, channels=3)
processed_image = inception_preprocessing.preprocess_image(image, image_size, image_size, is_training=False)
processed_images = tf.expand_dims(processed_image, 0)
# Create the model, use the default arg scope to configure the batch norm parameters.
with slim.arg_scope(inception.inception_v1_arg_scope()):
logits, _ = inception.inception_v1(processed_images, num_classes=1001, is_training=False)
probabilities = tf.nn.softmax(logits)
init_fn = slim.assign_from_checkpoint_fn(
os.path.join(checkpoints_dir, 'inception_v1.ckpt'),
slim.get_model_variables('InceptionV1'))
with tf.Session() as sess:
init_fn(sess)
np_image, probabilities = sess.run([image, probabilities])
probabilities = probabilities[0, 0:]
sorted_inds = [i[0] for i in sorted(enumerate(-probabilities), key=lambda x:x[1])]
plt.figure()
plt.imshow(np_image.astype(np.uint8))
plt.axis('off')
plt.show()
names = imagenet.create_readable_names_for_imagenet_labels()
for i in range(5):
index = sorted_inds[i]
print('Probability %0.2f%% => [%s]' % (probabilities[index], names[index]))
In [ ]:
# Note that this may take several minutes.
import os
from datasets import flowers
from nets import inception
from preprocessing import inception_preprocessing
slim = tf.contrib.slim
image_size = inception.inception_v1.default_image_size
def get_init_fn():
"""Returns a function run by the chief worker to warm-start the training."""
checkpoint_exclude_scopes=["InceptionV1/Logits", "InceptionV1/AuxLogits"]
exclusions = [scope.strip() for scope in checkpoint_exclude_scopes]
variables_to_restore = []
for var in slim.get_model_variables():
excluded = False
for exclusion in exclusions:
if var.op.name.startswith(exclusion):
excluded = True
break
if not excluded:
variables_to_restore.append(var)
return slim.assign_from_checkpoint_fn(
os.path.join(checkpoints_dir, 'inception_v1.ckpt'),
variables_to_restore)
train_dir = '/tmp/inception_finetuned/'
with tf.Graph().as_default():
tf.logging.set_verbosity(tf.logging.INFO)
dataset = flowers.get_split('train', flowers_data_dir)
images, _, labels = load_batch(dataset, height=image_size, width=image_size)
# Create the model, use the default arg scope to configure the batch norm parameters.
with slim.arg_scope(inception.inception_v1_arg_scope()):
logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)
# Specify the loss function:
one_hot_labels = slim.one_hot_encoding(labels, dataset.num_classes)
slim.losses.softmax_cross_entropy(logits, one_hot_labels)
total_loss = slim.losses.get_total_loss()
# Create some summaries to visualize the training process:
tf.scalar_summary('losses/Total Loss', total_loss)
# Specify the optimizer and create the train op:
optimizer = tf.train.AdamOptimizer(learning_rate=0.01)
train_op = slim.learning.create_train_op(total_loss, optimizer)
# Run the training:
final_loss = slim.learning.train(
train_op,
logdir=train_dir,
init_fn=get_init_fn(),
number_of_steps=2)
print('Finished training. Last batch loss %f' % final_loss)
In [ ]:
import numpy as np
import tensorflow as tf
from datasets import flowers
from nets import inception
slim = tf.contrib.slim
image_size = inception.inception_v1.default_image_size
batch_size = 3
with tf.Graph().as_default():
tf.logging.set_verbosity(tf.logging.INFO)
dataset = flowers.get_split('train', flowers_data_dir)
images, images_raw, labels = load_batch(dataset, height=image_size, width=image_size)
# Create the model, use the default arg scope to configure the batch norm parameters.
with slim.arg_scope(inception.inception_v1_arg_scope()):
logits, _ = inception.inception_v1(images, num_classes=dataset.num_classes, is_training=True)
probabilities = tf.nn.softmax(logits)
checkpoint_path = tf.train.latest_checkpoint(train_dir)
init_fn = slim.assign_from_checkpoint_fn(
checkpoint_path,
slim.get_variables_to_restore())
with tf.Session() as sess:
with slim.queues.QueueRunners(sess):
sess.run(tf.initialize_local_variables())
init_fn(sess)
np_probabilities, np_images_raw, np_labels = sess.run([probabilities, images_raw, labels])
for i in xrange(batch_size):
image = np_images_raw[i, :, :, :]
true_label = np_labels[i]
predicted_label = np.argmax(np_probabilities[i, :])
predicted_name = dataset.labels_to_names[predicted_label]
true_name = dataset.labels_to_names[true_label]
plt.figure()
plt.imshow(image.astype(np.uint8))
plt.title('Ground Truth: [%s], Prediction [%s]' % (true_name, predicted_name))
plt.axis('off')
plt.show()