Deploy Fully Optimized Model to TensorFlow Serving

IMPORTANT: You Must STOP All Kernels and Terminal Session

The GPU is wedged at this point. We need to set it free!!

Freeze Fully Optimized Graph


In [ ]:
from tensorflow.python.tools import freeze_graph

optimize_me_parent_path = '/root/models/optimize_me/linear/cpu'

fully_optimized_model_graph_path = '%s/fully_optimized_cpu.pb' % optimize_me_parent_path
fully_optimized_frozen_model_graph_path = '%s/fully_optimized_frozen_cpu.pb' % optimize_me_parent_path

model_checkpoint_path = '%s/model.ckpt' % optimize_me_parent_path

freeze_graph.freeze_graph(input_graph=fully_optimized_model_graph_path, 
                          input_saver="",
                          input_binary=True, 
                          input_checkpoint='/root/models/optimize_me/linear/cpu/model.ckpt',
                          output_node_names="add",
                          restore_op_name="save/restore_all", 
                          filename_tensor_name="save/Const:0",
                          output_graph=fully_optimized_frozen_model_graph_path, 
                          clear_devices=True, 
                          initializer_nodes="")
print(fully_optimized_frozen_model_graph_path)

File Size


In [ ]:
%%bash

ls -l /root/models/optimize_me/linear/cpu/

Graph


In [ ]:
%%bash

summarize_graph --in_graph=/root/models/optimize_me/linear/cpu/fully_optimized_frozen_cpu.pb

In [ ]:
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import re
from google.protobuf import text_format
from tensorflow.core.framework import graph_pb2

def convert_graph_to_dot(input_graph, output_dot, is_input_graph_binary):
    graph = graph_pb2.GraphDef()
    with open(input_graph, "rb") as fh:
        if is_input_graph_binary:
            graph.ParseFromString(fh.read())
        else:
            text_format.Merge(fh.read(), graph)
    with open(output_dot, "wt") as fh:
        print("digraph graphname {", file=fh)
        for node in graph.node:
            output_name = node.name
            print("  \"" + output_name + "\" [label=\"" + node.op + "\"];", file=fh)
            for input_full_name in node.input:
                parts = input_full_name.split(":")
                input_name = re.sub(r"^\^", "", parts[0])
                print("  \"" + input_name + "\" -> \"" + output_name + "\";", file=fh)
        print("}", file=fh)
        print("Created dot file '%s' for graph '%s'." % (output_dot, input_graph))

In [ ]:
input_graph='/root/models/optimize_me/linear/cpu/fully_optimized_frozen_cpu.pb'
output_dot='/root/notebooks/fully_optimized_frozen_cpu.dot'
convert_graph_to_dot(input_graph=input_graph, output_dot=output_dot, is_input_graph_binary=True)

In [ ]:
%%bash

dot -T png /root/notebooks/fully_optimized_frozen_cpu.dot \
    -o /root/notebooks/fully_optimized_frozen_cpu.png > /tmp/a.out

In [ ]:
from IPython.display import Image

Image('/root/notebooks/fully_optimized_frozen_cpu.png')

Run Standalone Benchmarks

Note: These benchmarks are running against the standalone models on disk. We will benchmark the models running within TensorFlow Serving soon.


In [ ]:
%%bash

benchmark_model --graph=/root/models/optimize_me/linear/cpu/fully_optimized_frozen_cpu.pb \
    --input_layer=weights,bias,x_observed \
    --input_layer_type=float,float,float \
    --input_layer_shape=:: \
    --output_layer=add

Save Model for Deployment and Inference

Reset Default Graph


In [ ]:
import tensorflow as tf

tf.reset_default_graph()

Create New Session


In [ ]:
sess = tf.Session()

Generate Version Number


In [ ]:
from datetime import datetime 

version = int(datetime.now().strftime("%s"))

Load Optimized, Frozen Graph


In [ ]:
%%bash

inspect_checkpoint --file_name=/root/models/optimize_me/linear/cpu/model.ckpt

In [ ]:
saver = tf.train.import_meta_graph('/root/models/optimize_me/linear/cpu/model.ckpt.meta')
saver.restore(sess, '/root/models/optimize_me/linear/cpu/model.ckpt')

optimize_me_parent_path = '/root/models/optimize_me/linear/cpu'
fully_optimized_frozen_model_graph_path = '%s/fully_optimized_frozen_cpu.pb' % optimize_me_parent_path
print(fully_optimized_frozen_model_graph_path)

with tf.gfile.GFile(fully_optimized_frozen_model_graph_path, 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())

tf.import_graph_def(
    graph_def, 
    input_map=None, 
    return_elements=None, 
    name="", 
    op_dict=None, 
    producer_op_list=None
)

print("weights = ", sess.run("weights:0"))
print("bias = ", sess.run("bias:0"))

Create SignatureDef Asset for TensorFlow Serving


In [ ]:
from tensorflow.python.saved_model import utils
from tensorflow.python.saved_model import signature_constants
from tensorflow.python.saved_model import signature_def_utils

graph = tf.get_default_graph()

x_observed = graph.get_tensor_by_name('x_observed:0')
y_pred = graph.get_tensor_by_name('add:0')

inputs_map = {'inputs': x_observed}
outputs_map = {'outputs': y_pred}

predict_signature = signature_def_utils.predict_signature_def(
                inputs = inputs_map, 
                outputs = outputs_map)
print(predict_signature)

Save Model with Assets


In [ ]:
from tensorflow.python.saved_model import builder as saved_model_builder
from tensorflow.python.saved_model import tag_constants

fully_optimized_saved_model_path = '/root/models/linear_fully_optimized/cpu/%s' % version
print(fully_optimized_saved_model_path)

builder = saved_model_builder.SavedModelBuilder(fully_optimized_saved_model_path)
builder.add_meta_graph_and_variables(sess, 
                                     [tag_constants.SERVING],
                                     signature_def_map={'predict':predict_signature,                                     
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:predict_signature}, 
                                     clear_devices=True,
)

builder.save(as_text=False)

In [ ]:
import os
print(fully_optimized_saved_model_path)
os.listdir(fully_optimized_saved_model_path)
os.listdir('%s/variables' % fully_optimized_saved_model_path)

In [ ]:
sess.close()

Inspect with Saved Model CLI

Note: This takes a minute or two for some reason. Please be patient.


In [ ]:
import subprocess

output = subprocess.run(["saved_model_cli", "show", \
                "--dir", fully_optimized_saved_model_path, "--all"], \
                stdout=subprocess.PIPE,
                stderr=subprocess.PIPE)

print(output.stdout.decode('utf-8'))

Open a Terminal through Jupyter Notebook

(Menu Bar -> File -> New...)

Start Http-Grpc Proxy in Separate Terminal

http_grpc_proxy 9004 9000

The params are as follows:

  • 1: proxy_port for this proxy
  • 2: tf_serving_port for TensorFlow Serving

Start TensorFlow Serving in Separate Terminal

Point to the model_base_path of the fully optimized model.

The params are as follows:

  • port (int)
  • model_name (anything)
  • model_base_path (/path/to/model/ above all versioned sub-directories)
  • enable_batching (true|false)
tensorflow_model_server \
  --port=9000 \
  --model_name=linear \
  --model_base_path=/root/models/linear_fully_optimized/cpu \
  --enable_batching=false

Run the Following Command in the Terminal to Predict

The params are as follows:

  • 1: proxy_port
  • 2: x_observed feed input

Returns:

  • y_pred prediction
predict 9004 1.5

Monitor GPU in Separate Terminal

watch -n 1 nvidia-smi

Start Load Test in Separate Terminal

The params are as follows:

  • $1: amount of load low|medium|high
loadtest high

### EXPECTED OUTPUT ###
summary ... 38.6/s Avg:  1175 Min:    26 Max:  2320 Err:     0 (0.00%)
summary ... 37.3/s Avg:  2586 Min:  2331 Max:  2729 Err:     0 (0.00%)