Scenario: In TensorFlow 1.3, a higher level API called Estimators was introduced and has since been a popular API of choice within the TensorFlow community. Suppose that an ML researcher has trained a Resnet model on the Imagenet dataset using TensorFlow's Estimator API, located at https://github.com/tensorflow/models/tree/v1.4.0/official/resnet. (Note that we used v1.4.0. You always want to use a stable tag for a model version to deploy as the researcher can continue to modify the model and architecture at the head of master.) Our task is to deploy this model into TensorFlow Serving. You have access to their python code as well as a saved state (checkpoint) that points to their favorite trained result.
This notebook teaches how to use the Estimator API to create a servable version of a pre-trained Resnet 50 model trained on ImageNet. The servable model can be served using TensorFlow Serving, which runs very efficiently in C++ and supports multiple platforms (different OSes, as well as hardware with different types of accelerators such as GPUs). The model will need to handle RPC prediction calls coming from a client that sends requests containing a batch of jpeg images.
In [0]:
import numpy as np
import os
import tensorflow as tf
import urllib.request
In [0]:
# Define a constant indicating the number of layers in our loaded model. We're loading a
# resnet-50 model.
RESNET_SIZE = 50
# Model and serving directories
MODEL_DIR="resnet_model_checkpoints"
SERVING_DIR="estimator_servable"
SAMPLE_DIR="../client"
Download the estimator saved checkpoint file from http://download.tensorflow.org/models/official/resnet50_2017_11_30.tar.gz, and extract to MODEL_DIR.
In [0]:
urllib.request.urlretrieve("http://download.tensorflow.org/models/official/resnet50_2017_11_30.tar.gz ", "resnet.tar.gz")
In [0]:
#unzip the file into a directory called resnet
from subprocess import call
call(["mkdir", MODEL_DIR])
call(["tar", "-zxvf", "resnet.tar.gz", "-C", MODEL_DIR])
In [0]:
# Make sure you see model checkpoint files in this directory
os.listdir(MODEL_DIR)
In order to reconstruct the Resnet neural network used to train the Imagenet model, we need to load the architecture pieces. During the setup step, we checked out https://github.com/tensorflow/models/tree/v1.4.0/official/resnet into the parent directory + "/models". We can now load functions and constants from resnet_model.py into the notebook.
In [0]:
%run ../../models/official/resnet/resnet_model.py
Exercise: We also need to import some constants from imagenet_main.py, but we cannot run this file as it is a main class that will attempt to train ResNet. Open imagenet_main.py and copy over a few constants that are important--namely, the image size, channels, and number of classes--into the cell below.
In [0]:
# TODO: Copy constants from imagenet_main.py.
_DEFAULT_IMAGE_SIZE = 224
_NUM_CHANNELS = 3
_LABEL_CLASSES = 1001
The TensorFlow Estimator API is an abstraction that simplifies the process of training, evaluation, prediction, and serving. Central to the Estimator API is an argument called the model function (model_fn). Essentially, a model function defines which graph nodes are used in training, evaluation, and prediction. Depending on the mode (TRAIN, EVAL, PREDICT) used, the model function will return an EstimatorSpec object tell the Estimator to run different graph nodes. The typical behavior of a model function would be:
Exercise: Below is the training code used in the imagenet_main.py resnet_model_fn(), renamed to serving_model_fn(). Portions of the code are modified and refactored into separate helper functions for debugging purposes. Since model serving is essentially prediction, graph elements associated with TRAIN and EVAL modes are no longer relevant. Remove/shortcut graph elements that are unrelated to prediction in the code cell below (marked with TODOs).
Useful References:
In [0]:
def serving_model_fn(features, labels, mode):
'''The main model function used by the estimator to define the TensorFlow model server API.
Args:
features: The client request, which is a dictionary: {'image': 1D tensor of jpeg strings}
labels: None or not used since we are predicting only
mode: TRAIN, EVAL, or PREDICT. Serving only uses PREDICT mode.
Returns:
If training or evaluating (should not happen), return a blank EstimatorSpec that does nothing.
If predicting (always), return an EstimatorSpec that produces a response with top k classes
and probabilities to send back to the client.
'''
# TODO: Remove tf.summary.image(). This is used for monitoring during training.
#tf.summary.image('images', features, max_outputs=6)
# Move preprocessing, network, and postprocessing into a helper function.
# serving_input_to_output() will be defined below.
predictions = serving_input_to_output(features, mode)
# Create the PREDICT EstimatorSpec that will send a proper response back to the client.
if mode == tf.estimator.ModeKeys.PREDICT:
return create_servable_estimator_spec(predictions, mode)
# TODO: You already returned the EstimatorSpec for predictions.
# Training and evaluation are not needed.
# Shortcut every graph element below here by returning a minimal EstimatorSpec.
return tf.estimator.EstimatorSpec(mode=mode)
TensorFlow is essentially a computation graph with variables and states. The graph must be built before it can ingest and process data. Typically, a TensorFlow graph will contain a set of input nodes (called placeholders) from which data can be ingested, and a set of TensorFlow functions that take existing nodes as inputs and produces a dependent node that performs a computation on the input nodes. Each node can be referenced as an "output" node through which processed data can be read.
It is often useful to create helper functions for building a TensorFlow graphs for two reasons:
In the API we are designing, ResNet client sends a request which is an array (tensor) of JPEG-encoded images encoded as strings. For simplicity, these jpegs are all appropriately resized to 224x224x3 by the client, and do not need resizing on the server side to enter into the ResNet model. However, the ResNet50 model was trained with pixel values normalized (approximately) between -0.5 and 0.5. We will need to decode each JPEG string to extract the raw 3D tensor, and normalize the values.
Exercise: Create a helper function that builds a TensorFlow graph component to decode a jpeg image, and normalizes pixel values to be between -0.5 and 0.5. (The normalization code is already done for you below.)
Useful References:
In [0]:
def build_jpeg_to_image_graph(encoded_image):
"""Preprocesses the image by subtracting out the mean from all channels.
Args:
image: A jpeg-formatted byte stream represented as a string.
Returns:
A 3d tensor of image pixels normalized to be between -0.5 and 0.5, resized to
height x width x 3.
The normalization approximates the preprocess_for_train and preprocess_for_eval functions
in https://github.com/tensorflow/models/blob/v1.4.0/official/resnet/vgg_preprocessing.py.
"""
image = tf.image.decode_jpeg(encoded_image, channels=3) # TODO: Use a tf function to decode the jpeg into a 3d tensor.
image = tf.to_float(image) / 255.0 - 0.5 # Normalize values to be between -0.5 and 0.5.
return image
Exercise: We are going to construct an input placeholder node in our TensorFlow graph to read data into TensorFlow, and use the helper function to attach computational elements to the input node, resulting in an output node where data is collected. Next, we will then run the graph by providing sample input into the placeholder (Input data can be python floats, ints, strings, numpy arrays, ndarrays, etc.), and returning the value at the output node.
A placeholder can store a Tensor of arbitrary dimension, and arbitrary length in any dimension.
An example of a placeholder that holds a 1d tensor of floating values is:
x = tf.placeholder(dtype=tf.float32, shape=[10], 'my_input_node')
An example of a 2d tensor (matrix) of dimensions 10x20 holding string values is:
x = tf.placeholder(dtype=tf.string, shape=[10, 20], 'my_string_matrix')
Note that we assigned a Python variable x to be a pointer to the placeholder, but simply calling tf.placeholder() with a named element would create an element in the TensorFlow graph that can be referenced in a global dictionary as 'my_input_node'. However, it helps to keep a Python pointer to keep track of the element without having to and pass it into helper functions.
Any dependent node in the graph can serve as an output node. For instance, passing an input node x through y = build_jpeg_to_image_graph(x)
would return a node referenced by python variable y which is the result of processing the input through the graph built by the helper function. When we run the test graph with real data below, you will see how to return the output of y.
Remember: TensorFlow helper functions are used to help construct a computational graph! build_jpeg_to_image_graph() does not return a 3D array. It returns a graph node that returns a 3D array after processing a jpeg-encoded string!**
Useful References:
In [0]:
# Defining input test graph nodes: only needs to be run once!
test_jpeg_ph = tf.placeholder(dtype=tf.string, shape=[], name='test_jpeg_placeholder') # A placeholder for a single string, which is a dimensionless (0D) tensor.
test_decoded_tensor = build_jpeg_to_image_graph(test_jpeg_ph) # Output node, which returns a 3D tensor after processing.
# Print the graph elements to check shapes. ? indicates that TensorFlow does not know the length.
# of those dimensions.
print(test_jpeg_ph)
print(test_decoded_tensor)
Now we come to the data processing portion. To run data through a constructed TensorFlow graph, a session must be created to read input data into the graph and return output data. TensorFlow will only run a portion of the graph that is required to map a set of inputs (a dictionary of graph nodes, usually placeholders, as keys, and the input data as values) to an output graph node. This is invoked by the command:
tf.Session().run(output_node,
{input_node_1: input_data_1, input_node_2: input_data_2, ...})
To test the helper function, we assign a jpeg string to the input placeholder, and return a 3D tensor result which is the normalized image.
Exercise: Add more potentially useful assert statements to test the output.
In [0]:
# Validate the result of the function using a sample image SAMPLE_DIR/cat_sample.jpg
with open(os.path.join(SAMPLE_DIR, "cat_sample.jpg"), "rb") as imageFile:
jpeg_str = imageFile.read()
with tf.Session() as sess:
result = sess.run(test_decoded_tensor, feed_dict={test_jpeg_ph: jpeg_str})
assert result.shape == (224, 224, 3)
# TODO: Replace with assert statements to check max and min normalized pixel values
assert result.max() <= 0.5
assert result.min() >= -0.5
print('Hooray! JPEG decoding test passed!')
The approach above uses vanilla TensorFlow to perform unit testing. You may notice that the code is more verbose than ideal, since you have to create a session, feed input through a dictionary, etc. We encourage the student to investigate some options below at a later time:
TensorFlow Eager was introduced in TensorFlow 1.5 as a way to execute TensorFlow graphs in a way similar to numpy operations. After testing individual parts of the graph using Eager, you will need to rebuild a graph with the Eager option turned off in order to build a performance optimized TensorFlow graph. Also, keep in mind that you will need another virtual environment with TensorFlow 1.5 in order to run eager execution, which may not be compatible with TensorFlow Serving 1.4 used in this tutorial.
TensorFlow unit testing is a more software engineer oriented approach to run tests. By writing test classes that can be invoked individually when building the project, calling tf.test.main() will run all tests and return a list of ones that succeeded and failed, allowing you to inspect errors. Because we are in a notebook environment, such a test would not succeed due to an already running kernel that tf.test cannot access. The tests must be run from the command line, e.g. python test_my_graph.py
.
We've provided both eager execution and unit test examples in the testing directory showing how to unit test various components in this notebook. Note that because these examples contain the solution to exercises below, please complete all notebook exercises prior to reading through these examples.
Now that we know how to run TensorFlow tests, let's create and test more helper functions!
Exercise: Messages from our client arrive as a dictionary of the form {'images': array_of_jpeg_strings}. However, the ResNet network expects a 4D tensor, where dimension 0 corresponds to the index of an image, and the other dimensions correspond to pixels of each image. We will wrap our JPEG decoding helper function in another helper function that converts the client message into an array of 3D tensors, and then pack them into a 4D tensor. Follow the TODOs in the code below to complete the preprocess_input() helper function.
Note: Serving input often differs significantly from training input! For instance, training data often comes in the form of a TF Dataset with information such as labels, text, encoding, bounding boxes, etc. Our server-client architecture is very simple, since we simply want to send it JPEG images and receive classification results.
Useful References:
In [0]:
def preprocess_input(features):
'''Function to preprocess client request before feeding into the network.
Use tf.map_fn and the build_jpeg_to_image_graph() helper function to convert the
1D input tensor of jpeg strings into a list of single-precision floating
point 3D tensors, which are normalized pixel values for the images.
Then stack and reshape this list of tensors into a 4D tensor with
appropriate dimensions.
Args:
features: request received from our client,
a dictionary with a single element containing a tensor of multiple jpeg images
{'images' : 1D_tensor_of_jpeg_byte_strings}
Returns:
a 4D tensor of normalized pixel values for the input images.
'''
images = features['images'] # A tensor of tf.strings
processed_images = tf.map_fn(build_jpeg_to_image_graph, images, dtype=tf.float32) # TODO: fill in the ???
processed_images = tf.stack(processed_images) # Convert list of 3D tensors to a 4D tensor
processed_images = tf.reshape(tensor=processed_images, # Reshaping informs TensorFlow of the final dimensions of the 4D tensor
shape=[-1, _DEFAULT_IMAGE_SIZE, _DEFAULT_IMAGE_SIZE, 3])
return processed_images
Exercise: Recall that your client is sending a message of the format:
{'images': array_of_strings}
The array_of_strings can be arbitrary length, and requires an entrypoint through a placeholder that can read in an arbitrary length array of strings. Fix the shape parameter to allow for an arbitrary length string array as input.
Hint: You need to define the shape
parameter in tf.placeholder. None
inside an array indicates that the length can vary along that dimension.
In [0]:
# Build a Test Input Preprocessing Network: only needs to be run once!
test_jpeg_tensor = tf.placeholder(dtype=tf.string, shape=[None], name='test_jpeg_tensor') # A placeholder for a single string, which is a dimensionless (0D) tensor.
test_processed_images = preprocess_input({'images': test_jpeg_tensor}) # Output node, which returns a 3D tensor after processing.
# Print the graph elements to check shapes. ? indicates that TensorFlow does not know the length of those dimensions.
print(test_jpeg_tensor)
print(test_processed_images)
In [0]:
# Run test network using a sample image SAMPLE_DIR/cat_sample.jpg
with open(os.path.join(SAMPLE_DIR, "cat_sample.jpg"), "rb") as imageFile:
jpeg_str = imageFile.read()
with tf.Session() as sess:
result = sess.run(test_processed_images, feed_dict={test_jpeg_tensor: np.array([jpeg_str, jpeg_str])}) # Duplicate for length 2 array
assert result.shape == (2, 224, 224, 3) # 4D tensor with first dimension length 2, since we have 2 images
# TODO: add a test for min and max normalized pixel values
assert result.max() <= 0.5 # Normalized
assert result.min() >= -0.5 # Normalized
# TODO: add a test to verify that the resulting tensor for image 0 and image 1 are identical.
assert result[0].all() == result[1].all()
print('Hooray! Input unit test succeeded!')
Exercise: The ResNet50 model returns a Tensor of logits for each of its possible classes. The client, however, expects a response that consists of the top 5 likely classes for each image, and probabilities of each image belonging to those classes. Modify the output helper function to convert an array of logits to a dictionary that stores tensors of the top 5 classes and probabilities.
Useful References:
In [0]:
TOP_K = 5
def postprocess_output(logits, k=TOP_K):
'''Return top k classes and probabilities from class logits.'''
probs = tf.nn.softmax(logits) # Converts logits to probabilities.
top_k_probs, top_k_classes = tf.nn.top_k(probs, k=k)
return {'classes': top_k_classes, 'probabilities': top_k_probs}
In [0]:
# Build Test Output Postprocessing Network: only needs to be run once!
test_logits_ph = tf.placeholder(dtype=tf.float32, shape=[_LABEL_CLASSES], name='test_logits_placeholder')
test_prediction_output = postprocess_output(test_logits_ph)
# Print the graph elements to check shapes.
print(test_logits_ph)
print(test_prediction_output)
In [0]:
# Run test network
with tf.Session() as sess:
logits = np.ones(1001)
result = sess.run(test_prediction_output, {test_logits_ph: logits})
classes = result['classes']
probs = result['probabilities']
# Inefficient but simple element-wise check
assert probs[1:].all() == probs[:-1].all()
expected_probs = np.array(len(probs) * [1.0/_LABEL_CLASSES])
assert probs.all() == expected_probs.all()
print('Hooray! Output unit test succeeded!')
We will now integrate the input helper function, output helper function, and network together into a serving_input_to_output() function, which is called by the main model function (serving_model_fn()) above. This function defines an end-to-end graph thta takes an input jpeg tensor, converts it to a 4d floating point tensor, runs the tensor through the ResNet50 network, and postprocesses the output to return a dictionary of the top k predicted classes and probabilities.
Normally, we would want to create an integration test for this end-to-end function. However, to avoid replicating the entire ResNet50 network and causing potential memory issues in a notebook environment, we instead provide an example of integration testing in the Estimator Unit Test python file.
Exercise: Fill in the TODOs below. To setup the logits node in the computation graph, refer to how the logits are called in the training code here.
Note: You may want to change the data_format argument below depending on whether you are deploying serving in on a CPU only or GPU Kubernetes cluster. For convolutional neural nets, it has been shown that placing your color channels ('channels_first') before your pixel dimensions in the image tensor significantly improves performance over 'channels_last'. HOWEVER, in the next notebook where you will validate the servable model that you produced in this step, 'channels_last' is required due to limitations in the tf.contrib.predict package. If you want to validate your servable, we suggest you start by creating a servable with data format 'channels_last' for validation, then recreate a servable with 'channels_first' as this should also work without issues.
In [0]:
def serving_input_to_output(jpeg_tensor, mode, k=TOP_K):
# TODO: Preprocess jpeg tensors before sending tensors to the network.
preprocessed_images = preprocess_input(jpeg_tensor)
# TODO: Feel free to use 'channels_first' or 'channels_last'
network = imagenet_resnet_v2(RESNET_SIZE, _LABEL_CLASSES, data_format='channels_last')
# TODO: Connect the preprocessed images to the network
logits = network(
inputs=preprocessed_images, is_training=(mode == tf.estimator.ModeKeys.TRAIN))
# TODO: Postprocess outputs of network (logits) and send top k predictions back to client.
predictions = postprocess_output(logits, k=k)
return predictions
The last step in serving_model_fn() is to return an EstimatorSpec containing instructions for the Estimator to export a servable model. EstimatorSpec contains a field export_outputs
, which defines the dictionary of fields that the servable model will return to a client upon receiving a request. To export the predictions dictionary above using Tf serving, you will need to assign the export_outputs parameter in EstimatorSpec.
Exercise: Add a dictionary with a string key which will be the request.model_spec.signature_name that your client will call in client/resnet_client.py Add a value that is tf.estimator.export.PredictOutput(outputs=predictions)
In [0]:
def create_servable_estimator_spec(predictions, mode):
return tf.estimator.EstimatorSpec(
mode=mode,
predictions=predictions, # Note: This is not be used in serving, but must be provided for the Estimator API.
export_outputs={
'predict': tf.estimator.export.PredictOutput(outputs=predictions) # TODO: assign an appropriate dictionary to the export_outputs parameter here.
},
)
In [0]:
estimator = tf.estimator.Estimator(
model_fn=serving_model_fn,
model_dir=MODEL_DIR,
)
Finally, exporting the model requires a serving_input_receiver_fn that explicitly tells the server what message format to expect from the client.
Exercise: Replace the input to build_raw_serving_input_receiver_fn below with the expected format of data received from the client, i.e. {'images': tf.placeholder(...)}.
Hint: See the preprocess server input helper function.
In [0]:
def serving_input_receiver_fn():
return tf.estimator.export.build_raw_serving_input_receiver_fn(
{'images': tf.placeholder(dtype=tf.string, shape=[None])}
)()
Assuming all of your unit tests have succeeded, and your serving_model_fn() is implemented correctly, this step should successfully export a saved model to disk in the SERVING_DIR specified above. If not, look through the logs to find the point of failure in one of your above functions.
In [0]:
# Export the model to save the servable to disk. If this works, we're done!
# Note: most of your setup errors will show up after running this step.
estimator.export_savedmodel(export_dir_base=SERVING_DIR,
serving_input_receiver_fn=serving_input_receiver_fn)
In [0]: