Serving a TF Estimator Resnet Model

Scenario: In TensorFlow 1.3, a higher level API called Estimators was introduced and has since been a popular API of choice within the TensorFlow community. Suppose that an ML researcher has trained a Resnet model on the Imagenet dataset using TensorFlow's Estimator API, located at https://github.com/tensorflow/models/tree/v1.4.0/official/resnet. (Note that we used v1.4.0. You always want to use a stable tag for a model version to deploy as the researcher can continue to modify the model and architecture at the head of master.) Our task is to deploy this model into TensorFlow Serving. You have access to their python code as well as a saved state (checkpoint) that points to their favorite trained result.

This notebook teaches how to use the Estimator API to create a servable version of a pre-trained Resnet 50 model trained on ImageNet. The servable model can be served using TensorFlow Serving, which runs very efficiently in C++ and supports multiple platforms (different OSes, as well as hardware with different types of accelerators such as GPUs). The model will need to handle RPC prediction calls coming from a client that sends requests containing a batch of jpeg images.

Preamble

Import the required libraries.


In [0]:
import numpy as np
import os
import tensorflow as tf
import urllib.request

In [0]:
# Define a constant indicating the number of layers in our loaded model. We're loading a 
# resnet-50 model.
RESNET_SIZE = 50  

# Model and serving directories
MODEL_DIR="resnet_model_checkpoints"
SERVING_DIR="estimator_servable"
SAMPLE_DIR="../client"

Download model checkpoint

Download the estimator saved checkpoint file from http://download.tensorflow.org/models/official/resnet50_2017_11_30.tar.gz, and extract to MODEL_DIR.


In [0]:
urllib.request.urlretrieve("http://download.tensorflow.org/models/official/resnet50_2017_11_30.tar.gz ", "resnet.tar.gz")

In [0]:
#unzip the file into a directory called resnet
from subprocess import call
call(["mkdir", MODEL_DIR])
call(["tar", "-zxvf", "resnet.tar.gz", "-C", MODEL_DIR])

In [0]:
# Make sure you see model checkpoint files in this directory
os.listdir(MODEL_DIR)

Import the Model Architecture

In order to reconstruct the Resnet neural network used to train the Imagenet model, we need to load the architecture pieces. During the setup step, we checked out https://github.com/tensorflow/models/tree/v1.4.0/official/resnet into the parent directory + "/models". We can now load functions and constants from resnet_model.py into the notebook.


In [0]:
%run ../../models/official/resnet/resnet_model.py

Exercise: We also need to import some constants from imagenet_main.py, but we cannot run this file as it is a main class that will attempt to train ResNet. Open imagenet_main.py and copy over a few constants that are important--namely, the image size, channels, and number of classes--into the cell below.


In [0]:
# TODO: Copy constants from imagenet_main.py.

_DEFAULT_IMAGE_SIZE = 224
_NUM_CHANNELS = 3
_LABEL_CLASSES = 1001

Build the Servable from the Estimator API

The TensorFlow Estimator API is an abstraction that simplifies the process of training, evaluation, prediction, and serving. Central to the Estimator API is an argument called the model function (model_fn). Essentially, a model function defines which graph nodes are used in training, evaluation, and prediction. Depending on the mode (TRAIN, EVAL, PREDICT) used, the model function will return an EstimatorSpec object tell the Estimator to run different graph nodes. The typical behavior of a model function would be:

  • TRAIN mode calls an optimizer that is hooked to a loss function (e.g. cross-entropy). This loss function is hooked to a node that contains the training labels, as well as a node that computes predicted logits for each class (which is hooked to nodes in lower layers of the network, etc., and finally hooked to the input placeholder nodes).
  • EVAL mode does not call the optimizer, but calls the loss function and potentially, other evaluation metric (e.g. accuracy). These evaluation metrics will likely depend on labels as well as the node computing predicted logits for each class.
    • Additionally, researchers will often use monitors and hooks during training and evaluation to check on the progress of the model. Usually, these components are used to return summaries about different layers of the network, such as model coefficients, etc., which can be visualized using Tensorboard.
  • PREDICT mode does NOT require an optimizer as there is no training step, and no label or loss functions (which depend on the label). Instead, predictions simply try to provide clients/users with information of interest, such as the most likely label for an image, the probability of the image being of a particular class, etc., which depend on graph components such as the logits node, etc.

Exercise: Below is the training code used in the imagenet_main.py resnet_model_fn(), renamed to serving_model_fn(). Portions of the code are modified and refactored into separate helper functions for debugging purposes. Since model serving is essentially prediction, graph elements associated with TRAIN and EVAL modes are no longer relevant. Remove/shortcut graph elements that are unrelated to prediction in the code cell below (marked with TODOs).

Useful References:


In [0]:
def serving_model_fn(features, labels, mode):
    '''The main model function used by the estimator to define the TensorFlow model server API.

    Args:
      features: The client request, which is a dictionary: {'image': 1D tensor of jpeg strings}
      labels: None or not used since we are predicting only
      mode: TRAIN, EVAL, or PREDICT. Serving only uses PREDICT mode.

    Returns:
      If training or evaluating (should not happen), return a blank EstimatorSpec that does nothing.
      If predicting (always), return an EstimatorSpec that produces a response with top k classes
        and probabilities to send back to the client.
    '''

    # TODO: Remove tf.summary.image(). This is used for monitoring during training.
    #tf.summary.image('images', features, max_outputs=6)

    # Move preprocessing, network, and postprocessing into a helper function.
    # serving_input_to_output() will be defined below.
    predictions = serving_input_to_output(features, mode)

    # Create the PREDICT EstimatorSpec that will send a proper response back to the client.
    if mode == tf.estimator.ModeKeys.PREDICT:
        return create_servable_estimator_spec(predictions, mode)

    # TODO: You already returned the EstimatorSpec for predictions.
    # Training and evaluation are not needed.
    # Shortcut every graph element below here by returning a minimal EstimatorSpec.
    return tf.estimator.EstimatorSpec(mode=mode)

Helper Functions for Building a TensorFlow Graph

TensorFlow is essentially a computation graph with variables and states. The graph must be built before it can ingest and process data. Typically, a TensorFlow graph will contain a set of input nodes (called placeholders) from which data can be ingested, and a set of TensorFlow functions that take existing nodes as inputs and produces a dependent node that performs a computation on the input nodes. Each node can be referenced as an "output" node through which processed data can be read.

It is often useful to create helper functions for building a TensorFlow graphs for two reasons:

  1. Modularity: you can reuse functions in different places; for instance, a different image model or ResNet architecture can reuse functions.
  2. Testability: you can attach placeholders at the input of graph building helper functions, and read the output to ensure that your result matches expected behavior.

Helper function: convert JPEG strings to Normalized 3D Tensors

In the API we are designing, ResNet client sends a request which is an array (tensor) of JPEG-encoded images encoded as strings. For simplicity, these jpegs are all appropriately resized to 224x224x3 by the client, and do not need resizing on the server side to enter into the ResNet model. However, the ResNet50 model was trained with pixel values normalized (approximately) between -0.5 and 0.5. We will need to decode each JPEG string to extract the raw 3D tensor, and normalize the values.

Exercise: Create a helper function that builds a TensorFlow graph component to decode a jpeg image, and normalizes pixel values to be between -0.5 and 0.5. (The normalization code is already done for you below.)

Useful References:


In [0]:
def build_jpeg_to_image_graph(encoded_image):
  """Preprocesses the image by subtracting out the mean from all channels.
  Args:
    image: A jpeg-formatted byte stream represented as a string.
  Returns:
    A 3d tensor of image pixels normalized to be between -0.5 and 0.5, resized to 
      height x width x 3.
      The normalization approximates the preprocess_for_train and preprocess_for_eval functions
      in https://github.com/tensorflow/models/blob/v1.4.0/official/resnet/vgg_preprocessing.py.
  """
  image = tf.image.decode_jpeg(encoded_image, channels=3)  # TODO: Use a tf function to decode the jpeg into a 3d tensor.
  image = tf.to_float(image) / 255.0 - 0.5  # Normalize values to be between -0.5 and 0.5.
  return image

Unit test the helper function

Exercise: We are going to construct an input placeholder node in our TensorFlow graph to read data into TensorFlow, and use the helper function to attach computational elements to the input node, resulting in an output node where data is collected. Next, we will then run the graph by providing sample input into the placeholder (Input data can be python floats, ints, strings, numpy arrays, ndarrays, etc.), and returning the value at the output node.

A placeholder can store a Tensor of arbitrary dimension, and arbitrary length in any dimension.

An example of a placeholder that holds a 1d tensor of floating values is:

x = tf.placeholder(dtype=tf.float32, shape=[10], 'my_input_node')

An example of a 2d tensor (matrix) of dimensions 10x20 holding string values is:

x = tf.placeholder(dtype=tf.string, shape=[10, 20], 'my_string_matrix')

Note that we assigned a Python variable x to be a pointer to the placeholder, but simply calling tf.placeholder() with a named element would create an element in the TensorFlow graph that can be referenced in a global dictionary as 'my_input_node'. However, it helps to keep a Python pointer to keep track of the element without having to and pass it into helper functions.

Any dependent node in the graph can serve as an output node. For instance, passing an input node x through y = build_jpeg_to_image_graph(x) would return a node referenced by python variable y which is the result of processing the input through the graph built by the helper function. When we run the test graph with real data below, you will see how to return the output of y.

Remember: TensorFlow helper functions are used to help construct a computational graph! build_jpeg_to_image_graph() does not return a 3D array. It returns a graph node that returns a 3D array after processing a jpeg-encoded string!**

Useful References:

TensorFlow shapes, TensorFlow data types


In [0]:
# Defining input test graph nodes: only needs to be run once!
test_jpeg_ph = tf.placeholder(dtype=tf.string, shape=[], name='test_jpeg_placeholder')  # A placeholder for a single string, which is a dimensionless (0D) tensor.
test_decoded_tensor = build_jpeg_to_image_graph(test_jpeg_ph)  # Output node, which returns a 3D tensor after processing.

# Print the graph elements to check shapes. ? indicates that TensorFlow does not know the length.
# of those dimensions.
print(test_jpeg_ph)
print(test_decoded_tensor)

Run the Test Graph

Now we come to the data processing portion. To run data through a constructed TensorFlow graph, a session must be created to read input data into the graph and return output data. TensorFlow will only run a portion of the graph that is required to map a set of inputs (a dictionary of graph nodes, usually placeholders, as keys, and the input data as values) to an output graph node. This is invoked by the command:

tf.Session().run(output_node,
                 {input_node_1: input_data_1, input_node_2: input_data_2, ...})

To test the helper function, we assign a jpeg string to the input placeholder, and return a 3D tensor result which is the normalized image.

Exercise: Add more potentially useful assert statements to test the output.


In [0]:
# Validate the result of the function using a sample image SAMPLE_DIR/cat_sample.jpg

with open(os.path.join(SAMPLE_DIR, "cat_sample.jpg"), "rb") as imageFile:
    jpeg_str = imageFile.read()
    with tf.Session() as sess:
        result = sess.run(test_decoded_tensor, feed_dict={test_jpeg_ph: jpeg_str})
        assert result.shape == (224, 224, 3)
        # TODO: Replace with assert statements to check max and min normalized pixel values
        assert result.max() <= 0.5
        assert result.min() >= -0.5
        print('Hooray! JPEG decoding test passed!')

Remarks

The approach above uses vanilla TensorFlow to perform unit testing. You may notice that the code is more verbose than ideal, since you have to create a session, feed input through a dictionary, etc. We encourage the student to investigate some options below at a later time:

TensorFlow Eager was introduced in TensorFlow 1.5 as a way to execute TensorFlow graphs in a way similar to numpy operations. After testing individual parts of the graph using Eager, you will need to rebuild a graph with the Eager option turned off in order to build a performance optimized TensorFlow graph. Also, keep in mind that you will need another virtual environment with TensorFlow 1.5 in order to run eager execution, which may not be compatible with TensorFlow Serving 1.4 used in this tutorial.

TensorFlow unit testing is a more software engineer oriented approach to run tests. By writing test classes that can be invoked individually when building the project, calling tf.test.main() will run all tests and return a list of ones that succeeded and failed, allowing you to inspect errors. Because we are in a notebook environment, such a test would not succeed due to an already running kernel that tf.test cannot access. The tests must be run from the command line, e.g. python test_my_graph.py.

We've provided both eager execution and unit test examples in the testing directory showing how to unit test various components in this notebook. Note that because these examples contain the solution to exercises below, please complete all notebook exercises prior to reading through these examples.

Now that we know how to run TensorFlow tests, let's create and test more helper functions!

Helper Function: Preprocessing Server Input

Exercise: Messages from our client arrive as a dictionary of the form {'images': array_of_jpeg_strings}. However, the ResNet network expects a 4D tensor, where dimension 0 corresponds to the index of an image, and the other dimensions correspond to pixels of each image. We will wrap our JPEG decoding helper function in another helper function that converts the client message into an array of 3D tensors, and then pack them into a 4D tensor. Follow the TODOs in the code below to complete the preprocess_input() helper function.

Note: Serving input often differs significantly from training input! For instance, training data often comes in the form of a TF Dataset with information such as labels, text, encoding, bounding boxes, etc. Our server-client architecture is very simple, since we simply want to send it JPEG images and receive classification results.

Useful References:


In [0]:
def preprocess_input(features):
    '''Function to preprocess client request before feeding into the network.
    
    Use tf.map_fn and the build_jpeg_to_image_graph() helper function to convert the
    1D input tensor of jpeg strings into a list of single-precision floating
    point 3D tensors, which are normalized pixel values for the images.
    
    Then stack and reshape this list of tensors into a 4D tensor with
    appropriate dimensions.
    
    Args:
      features: request received from our client,
        a dictionary with a single element containing a tensor of multiple jpeg images
        {'images' : 1D_tensor_of_jpeg_byte_strings}
    
    Returns:
      a 4D tensor of normalized pixel values for the input images.
      
    '''
    images = features['images']  # A tensor of tf.strings
    processed_images = tf.map_fn(build_jpeg_to_image_graph, images, dtype=tf.float32)  # TODO: fill in the ???
    processed_images = tf.stack(processed_images)  # Convert list of 3D tensors to a 4D tensor
    processed_images = tf.reshape(tensor=processed_images,  # Reshaping informs TensorFlow of the final dimensions of the 4D tensor
                                  shape=[-1, _DEFAULT_IMAGE_SIZE, _DEFAULT_IMAGE_SIZE, 3])
    return processed_images

Unit Test the Preprocessing Helper Function

Exercise: Recall that your client is sending a message of the format:

{'images': array_of_strings}

The array_of_strings can be arbitrary length, and requires an entrypoint through a placeholder that can read in an arbitrary length array of strings. Fix the shape parameter to allow for an arbitrary length string array as input.

Hint: You need to define the shape parameter in tf.placeholder. None inside an array indicates that the length can vary along that dimension.


In [0]:
# Build a Test Input Preprocessing Network: only needs to be run once!
test_jpeg_tensor = tf.placeholder(dtype=tf.string, shape=[None], name='test_jpeg_tensor')  # A placeholder for a single string, which is a dimensionless (0D) tensor.
test_processed_images = preprocess_input({'images': test_jpeg_tensor})  # Output node, which returns a 3D tensor after processing.

# Print the graph elements to check shapes. ? indicates that TensorFlow does not know the length of those dimensions.
print(test_jpeg_tensor)
print(test_processed_images)

In [0]:
# Run test network using a sample image SAMPLE_DIR/cat_sample.jpg

with open(os.path.join(SAMPLE_DIR, "cat_sample.jpg"), "rb") as imageFile:
    jpeg_str = imageFile.read()
    with tf.Session() as sess:
        result = sess.run(test_processed_images, feed_dict={test_jpeg_tensor: np.array([jpeg_str, jpeg_str])})  # Duplicate for length 2 array
        assert result.shape == (2, 224, 224, 3)  # 4D tensor with first dimension length 2, since we have 2 images
        # TODO: add a test for min and max normalized pixel values
        assert result.max() <= 0.5  # Normalized
        assert result.min() >= -0.5  # Normalized
        # TODO: add a test to verify that the resulting tensor for image 0 and image 1 are identical.
        assert result[0].all() == result[1].all()
        print('Hooray! Input unit test succeeded!')

Helper Function: Postprocess Server Output

Exercise: The ResNet50 model returns a Tensor of logits for each of its possible classes. The client, however, expects a response that consists of the top 5 likely classes for each image, and probabilities of each image belonging to those classes. Modify the output helper function to convert an array of logits to a dictionary that stores tensors of the top 5 classes and probabilities.

Useful References:


In [0]:
TOP_K = 5

def postprocess_output(logits, k=TOP_K):
    '''Return top k classes and probabilities from class logits.'''
    probs = tf.nn.softmax(logits)  # Converts logits to probabilities.
    top_k_probs, top_k_classes = tf.nn.top_k(probs, k=k)
    return {'classes': top_k_classes, 'probabilities': top_k_probs}

Unit Test the Output Postprocessing Helper Function

Exercise: Fill in the shape field for the output logits tensor.

Hint: how many image classes are there?


In [0]:
# Build Test Output Postprocessing Network: only needs to be run once!
test_logits_ph = tf.placeholder(dtype=tf.float32, shape=[_LABEL_CLASSES], name='test_logits_placeholder')
test_prediction_output = postprocess_output(test_logits_ph)

# Print the graph elements to check shapes.
print(test_logits_ph)
print(test_prediction_output)

In [0]:
# Run test network
with tf.Session() as sess:
    logits = np.ones(1001)
    result = sess.run(test_prediction_output, {test_logits_ph: logits})
    classes = result['classes']
    probs = result['probabilities']
    # Inefficient but simple element-wise check
    assert probs[1:].all() == probs[:-1].all()
    expected_probs = np.array(len(probs) * [1.0/_LABEL_CLASSES])
    assert probs.all() == expected_probs.all()
    print('Hooray! Output unit test succeeded!')

End-to-End Helper Function

We will now integrate the input helper function, output helper function, and network together into a serving_input_to_output() function, which is called by the main model function (serving_model_fn()) above. This function defines an end-to-end graph thta takes an input jpeg tensor, converts it to a 4d floating point tensor, runs the tensor through the ResNet50 network, and postprocesses the output to return a dictionary of the top k predicted classes and probabilities.

Normally, we would want to create an integration test for this end-to-end function. However, to avoid replicating the entire ResNet50 network and causing potential memory issues in a notebook environment, we instead provide an example of integration testing in the Estimator Unit Test python file.

Exercise: Fill in the TODOs below. To setup the logits node in the computation graph, refer to how the logits are called in the training code here.

Note: You may want to change the data_format argument below depending on whether you are deploying serving in on a CPU only or GPU Kubernetes cluster. For convolutional neural nets, it has been shown that placing your color channels ('channels_first') before your pixel dimensions in the image tensor significantly improves performance over 'channels_last'. HOWEVER, in the next notebook where you will validate the servable model that you produced in this step, 'channels_last' is required due to limitations in the tf.contrib.predict package. If you want to validate your servable, we suggest you start by creating a servable with data format 'channels_last' for validation, then recreate a servable with 'channels_first' as this should also work without issues.


In [0]:
def serving_input_to_output(jpeg_tensor, mode, k=TOP_K):
    # TODO: Preprocess jpeg tensors before sending tensors to the network.
    preprocessed_images = preprocess_input(jpeg_tensor)

    # TODO: Feel free to use 'channels_first' or 'channels_last'
    network = imagenet_resnet_v2(RESNET_SIZE, _LABEL_CLASSES, data_format='channels_last')

    # TODO: Connect the preprocessed images to the network
    logits = network(
      inputs=preprocessed_images, is_training=(mode == tf.estimator.ModeKeys.TRAIN))

    # TODO: Postprocess outputs of network (logits) and send top k predictions back to client.
    predictions = postprocess_output(logits, k=k)
    return predictions

Servable Model API Definition

The last step in serving_model_fn() is to return an EstimatorSpec containing instructions for the Estimator to export a servable model. EstimatorSpec contains a field export_outputs, which defines the dictionary of fields that the servable model will return to a client upon receiving a request. To export the predictions dictionary above using Tf serving, you will need to assign the export_outputs parameter in EstimatorSpec.

Exercise: Add a dictionary with a string key which will be the request.model_spec.signature_name that your client will call in client/resnet_client.py Add a value that is tf.estimator.export.PredictOutput(outputs=predictions)


In [0]:
def create_servable_estimator_spec(predictions, mode):
  return tf.estimator.EstimatorSpec(
      mode=mode,
      predictions=predictions,  # Note: This is not be used in serving, but must be provided for the Estimator API.
      export_outputs={
          'predict': tf.estimator.export.PredictOutput(outputs=predictions)  # TODO: assign an appropriate dictionary to the export_outputs parameter here.
      },
  )

Build the Estimator

Create an estimator with the serving_model_fn defined above. The estimator will load saved checkpoint data (model parameters from training) from the model_dir directory: namely, MODEL_DIR where we downloaded and extracted checkpoint files.


In [0]:
estimator = tf.estimator.Estimator(
  model_fn=serving_model_fn,
  model_dir=MODEL_DIR,
)

Serving input receiver function

Finally, exporting the model requires a serving_input_receiver_fn that explicitly tells the server what message format to expect from the client.

Exercise: Replace the input to build_raw_serving_input_receiver_fn below with the expected format of data received from the client, i.e. {'images': tf.placeholder(...)}.

Hint: See the preprocess server input helper function.


In [0]:
def serving_input_receiver_fn():
  return tf.estimator.export.build_raw_serving_input_receiver_fn(
      {'images': tf.placeholder(dtype=tf.string, shape=[None])}
  )()

Export the servable model to disk

Assuming all of your unit tests have succeeded, and your serving_model_fn() is implemented correctly, this step should successfully export a saved model to disk in the SERVING_DIR specified above. If not, look through the logs to find the point of failure in one of your above functions.


In [0]:
# Export the model to save the servable to disk. If this works, we're done!
# Note: most of your setup errors will show up after running this step.
estimator.export_savedmodel(export_dir_base=SERVING_DIR,
                            serving_input_receiver_fn=serving_input_receiver_fn)

In [0]: