Optimising a TensorFlow SavedModel for Serving

This notebooks shows how to optimise the TensorFlow exported SavedModel by shrinking its size (to have less memory and disk footprints), and improving prediction latency. This can be accopmlished by applying the following:

  • Freezing: That is, converting the variables stored in a checkpoint file of the SavedModel into constants stored directly in the model graph.
  • Pruning: That is, stripping unused nodes during the prediction path of the graph, merging duplicate nodes, as well as removing other node ops like summary, identity, etc.
  • Quantisation: That is, converting any large float Const op into an eight-bit equivalent, followed by a float conversion op so that the result is usable by subsequent nodes.
  • Other refinements: That includes constant folding, batch_norm folding, fusing convolusion, etc.

The optimisation operations we apply in this example are from the TensorFlow Graph Conversion Tool, which is a c++ command-line tool. We use the Python APIs to call the c++ libraries.

The Graph Transform Tool is designed to work on models that are saved as GraphDef files, usually in a binary protobuf format. However, the model exported after training and estimator is in SavedModel format (saved_model.pb file + variables folder with variables.data-* and variables.index files).

We need to optimise the mode and keep it the SavedModel format. Thus, the optimisation steps will be:

  1. Freeze the SavedModel: SavedModel -> GraphDef
  2. Optimisae the freezed model: GraphDef -> GraphDef
  3. Convert the optimised freezed model to SavedModel: GraphDef -> SavedModel

In [1]:
import os
import numpy as np
from datetime import datetime

import tensorflow as tf

print "TensorFlow : {}".format(tf.__version__)


TensorFlow : 1.10.0

1. Train and Export a Keras Model

1.1 Import Data


In [2]:
(train_data, train_labels), (eval_data, eval_labels) = tf.keras.datasets.mnist.load_data()
NUM_CLASSES = 10

In [3]:
print "Train data shape: {}".format(train_data.shape)
print "Eval data shape: {}".format(eval_data.shape)


Train data shape: (60000, 28, 28)
Eval data shape: (10000, 28, 28)

1.2 Estimator

1.2.1 Keras Model Function


In [4]:
def keras_model_fn(params):
    
    inputs = tf.keras.layers.Input(shape=(28, 28), name='input_image')
    input_layer = tf.keras.layers.Reshape(target_shape=(28, 28, 1), name='reshape')(inputs)
    
    # convolutional layers
    conv_inputs = input_layer
    for i in range(params.num_conv_layers):
        
        filters = params.init_filters * (2**i)
        conv = tf.keras.layers.Conv2D(kernel_size=3, filters=filters, strides=1, padding='SAME', activation='relu')(conv_inputs)
        max_pool = tf.keras.layers.MaxPool2D(pool_size=2, strides=2, padding='SAME')(conv)
        batch_norm = tf.keras.layers.BatchNormalization()(max_pool)
        conv_inputs = batch_norm

    flatten = tf.keras.layers.Flatten(name='flatten')(conv_inputs)
    
    # fully-connected layers
    dense_inputs = flatten
    for i in range(len(params.hidden_units)):
        
        dense = tf.keras.layers.Dense(units=params.hidden_units[i], activation='relu')(dense_inputs)
        dropout = tf.keras.layers.Dropout(params.dropout)(dense)
        dense_inputs = dropout
        
    # softmax classifier
    logits = tf.keras.layers.Dense(units=NUM_CLASSES, name='logits')(dense_inputs)
    softmax = tf.keras.layers.Activation('softmax', name='softmax')(logits)

    # keras model
    model = tf.keras.models.Model(inputs, softmax)
    return model

1.2.2 Convert Keras model to Estimator


In [5]:
def create_estimator(params, run_config):
    
    keras_model = keras_model_fn(params)
    print keras_model.summary()
    
    optimizer = tf.keras.optimizers.Adam(lr=params.learning_rate)
    keras_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    mnist_classifier = tf.keras.estimator.model_to_estimator(
        keras_model=keras_model,
        config=run_config
    )
    
    return mnist_classifier

1.3 Train and Evaluate

1.3.1 Experiment Function


In [6]:
def run_experiment(params, run_config):
    
    train_spec = tf.estimator.TrainSpec(
        input_fn = tf.estimator.inputs.numpy_input_fn(
            x={"input_image": train_data},
            y=train_labels,
            batch_size=params.batch_size,
            num_epochs=None,
            shuffle=True),
        max_steps=params.max_traning_steps
    )

    eval_spec = tf.estimator.EvalSpec(
        input_fn = tf.estimator.inputs.numpy_input_fn(
            x={"input_image": eval_data},
            y=eval_labels,
            batch_size=params.batch_size,
            num_epochs=1,
            shuffle=False),
        steps=None,
        throttle_secs=params.eval_throttle_secs
    )

    tf.logging.set_verbosity(tf.logging.INFO)

    time_start = datetime.utcnow() 
    print("Experiment started at {}".format(time_start.strftime("%H:%M:%S")))
    print(".......................................") 

    estimator = create_estimator(params, run_config)

    tf.estimator.train_and_evaluate(
        estimator=estimator,
        train_spec=train_spec, 
        eval_spec=eval_spec
    )

    time_end = datetime.utcnow() 
    print(".......................................")
    print("Experiment finished at {}".format(time_end.strftime("%H:%M:%S")))
    print("")
    time_elapsed = time_end - time_start
    print("Experiment elapsed time: {} seconds".format(time_elapsed.total_seconds()))
    
    return estimator

1.3.2 Experiment Parameters


In [7]:
MODELS_LOCATION = 'models/mnist'
MODEL_NAME = 'keras_classifier'
model_dir = os.path.join(MODELS_LOCATION, MODEL_NAME)

print model_dir

params  = tf.contrib.training.HParams(
    batch_size=100,
    hidden_units=[512, 512],
    num_conv_layers=3, 
    init_filters=64,
    dropout=0.2,
    max_traning_steps=50,
    eval_throttle_secs=10,
    learning_rate=1e-3,
    debug=True
)

run_config = tf.estimator.RunConfig(
    tf_random_seed=19830610,
    save_checkpoints_steps=1000,
    keep_checkpoint_max=3,
    model_dir=model_dir
)


models/mnist/keras_classifier

TensorFlow Graph

1.3.3 Run Experiment


In [8]:
if tf.gfile.Exists(model_dir):
    print("Removing previous artifacts...")
    tf.gfile.DeleteRecursively(model_dir)

os.makedirs(model_dir)

estimator = run_experiment(params, run_config)


Removing previous artifacts...
Experiment started at 16:40:08
.......................................
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_image (InputLayer)     (None, 28, 28)            0         
_________________________________________________________________
reshape (Reshape)            (None, 28, 28, 1)         0         
_________________________________________________________________
conv2d (Conv2D)              (None, 28, 28, 64)        640       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 64)        0         
_________________________________________________________________
batch_normalization (BatchNo (None, 14, 14, 64)        256       
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 128)       73856     
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 128)         0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 7, 7, 128)         512       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 256)         295168    
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 256)         0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 4, 4, 256)         1024      
_________________________________________________________________
flatten (Flatten)            (None, 4096)              0         
_________________________________________________________________
dense (Dense)                (None, 512)               2097664   
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 512)               262656    
_________________________________________________________________
dropout_1 (Dropout)          (None, 512)               0         
_________________________________________________________________
logits (Dense)               (None, 10)                5130      
_________________________________________________________________
softmax (Activation)         (None, 10)                0         
=================================================================
Total params: 2,736,906
Trainable params: 2,736,010
Non-trainable params: 896
_________________________________________________________________
None
INFO:tensorflow:Using the Keras model provided.
INFO:tensorflow:Using config: {'_save_checkpoints_secs': None, '_global_id_in_cluster': 0, '_session_config': None, '_keep_checkpoint_max': 3, '_tf_random_seed': 19830610, '_task_type': 'worker', '_train_distribute': None, '_is_chief': True, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x127887a50>, '_model_dir': 'models/mnist/keras_classifier', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 100, '_master': '', '_save_checkpoints_steps': 1000, '_keep_checkpoint_every_n_hours': 10000, '_evaluation_master': '', '_service': None, '_device_fn': None, '_save_summary_steps': 100, '_num_ps_replicas': 0}
INFO:tensorflow:Running training and evaluation locally (non-distributed).
INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after every checkpoint. Checkpoint frequency is determined based on RunConfig arguments: save_checkpoints_steps 1000 or save_checkpoints_secs None.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from models/mnist/keras_classifier/keras_model.ckpt
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into models/mnist/keras_classifier/model.ckpt.
INFO:tensorflow:loss = 2.892544, step = 1
INFO:tensorflow:Saving checkpoints for 50 into models/mnist/keras_classifier/model.ckpt.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2018-10-07-16:40:39
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from models/mnist/keras_classifier/model.ckpt-50
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Finished evaluation at 2018-10-07-16:40:52
INFO:tensorflow:Saving dict for global step 50: accuracy = 0.03279998, global_step = 50, loss = 1.1221329
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 50: models/mnist/keras_classifier/model.ckpt-50
INFO:tensorflow:Loss for final step: 0.14231189.
.......................................
Experiment finished at 16:40:52

Experiment elapsed time: 43.937862 seconds

1.4 Export the model


In [9]:
def make_serving_input_receiver_fn():
    inputs = {'input_image': tf.placeholder(shape=[None,28,28], dtype=tf.float32, name='serving_input_image')}
    return tf.estimator.export.build_raw_serving_input_receiver_fn(inputs)

export_dir = os.path.join(model_dir, 'export')

if tf.gfile.Exists(export_dir):
    tf.gfile.DeleteRecursively(export_dir)
        
estimator.export_savedmodel(
    export_dir_base=export_dir,
    serving_input_receiver_fn=make_serving_input_receiver_fn()
)


INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Signatures INCLUDED in export for Eval: None
INFO:tensorflow:Signatures INCLUDED in export for Classify: None
INFO:tensorflow:Signatures INCLUDED in export for Regress: None
INFO:tensorflow:Signatures INCLUDED in export for Predict: ['serving_default']
INFO:tensorflow:Signatures INCLUDED in export for Train: None
INFO:tensorflow:Restoring parameters from models/mnist/keras_classifier/model.ckpt-50
INFO:tensorflow:Assets added to graph.
INFO:tensorflow:No assets to write.
INFO:tensorflow:SavedModel written to: models/mnist/keras_classifier/export/temp-1538930452/saved_model.pb
Out[9]:
'models/mnist/keras_classifier/export/1538930452'

2. Inspect the Exported SavedModel


In [10]:
%%bash

saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}
saved_model_cli show --dir=${saved_model_dir} --all


models/mnist/keras_classifier/export/1538930452
saved_model.pb
variables

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['input_image'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 28, 28)
        name: serving_input_image:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['softmax'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: softmax/Softmax:0
  Method name is: tensorflow/serving/predict

Prediction with SavedModel


In [33]:
def inference_test(saved_model_dir, signature="serving_default", input_name='input_image', batch=300, repeat=100):

    tf.logging.set_verbosity(tf.logging.ERROR)
    
    time_start = datetime.utcnow() 
    
    predictor = tf.contrib.predictor.from_saved_model(
        export_dir = saved_model_dir,
        signature_def_key=signature
    )
    time_end = datetime.utcnow() 
        
    time_elapsed = time_end - time_start
   
    print ""
    print("Model loading time: {} seconds".format(time_elapsed.total_seconds()))
    print ""
    
    time_start = datetime.utcnow() 
    output = None
    for i in range(repeat):
        predictions = predictor(
            {
                input_name: eval_data[:batch]
            }
        )
        
        output=[np.argmax(prediction) for prediction in predictions['softmax']]
    
    time_end = datetime.utcnow() 

    time_elapsed_sec = (time_end - time_start).total_seconds()
    
    print "Inference elapsed time: {} seconds".format(time_elapsed_sec)
    print ""
    
    print "Prediction produced for {} instances batch, repeated {} times".format(len(output), repeat)
    print "Average latency per batch: {} seconds".format(time_elapsed_sec/repeat)
    print ""

3. Test Prediction with SavedModel


In [34]:
saved_model_dir = os.path.join(
    export_dir, [f for f in os.listdir(export_dir) if f.isdigit()][0])
print(saved_model_dir)
inference_test(saved_model_dir)


models/mnist/keras_classifier/export/1538930452

Model loading time: 0.390716 seconds

Inference elapsed time: 36.744143 seconds

Prediction produced for 300 instances batch, repeated 100 times
Average latency per batch: 0.36744143 seconds

Describe GraphDef


In [13]:
def describe_graph(graph_def, show_nodes=False):
    
    print 'Input Feature Nodes: {}'.format([node.name for node in graph_def.node if node.op=='Placeholder'])
    print ""
    print 'Unused Nodes: {}'.format([node.name for node in graph_def.node if 'unused'  in node.name])
    print ""
    print 'Output Nodes: {}'.format( [node.name for node in graph_def.node if 'softmax' in node.name])
    print ""
    print 'Quanitization Nodes: {}'.format( [node.name for node in graph_def.node if 'quant' in node.name])
    print ""
    print 'Constant Count: {}'.format( len([node for node in graph_def.node if node.op=='Const']))
    print ""
    print 'Variable Count: {}'.format( len([node for node in graph_def.node if 'Variable' in node.op]))
    print ""
    print 'Identity Count: {}'.format( len([node for node in graph_def.node if node.op=='Identity']))
    print ""
    print 'Total nodes: {}'.format( len(graph_def.node))
    print ''
    
    if show_nodes==True:
        for node in graph_def.node:
            print 'Op:{} - Name: {}'.format(node.op, node.name)

4. Describe the SavedModel Graph (before optimisation)

Load GraphDef from a SavedModel Directory


In [14]:
def get_graph_def_from_saved_model(saved_model_dir):
    
    print saved_model_dir
    print ""
    
    from tensorflow.python.saved_model import tag_constants
    
    with tf.Session() as session:
        meta_graph_def = tf.saved_model.loader.load(
            session,
            tags=[tag_constants.SERVING],
            export_dir=saved_model_dir
        )
        
    return meta_graph_def.graph_def

In [15]:
describe_graph(get_graph_def_from_saved_model(saved_model_dir))


models/mnist/keras_classifier/export/1538930452

Input Feature Nodes: [u'serving_input_image', u'input_image']

Unused Nodes: []

Output Nodes: [u'softmax/Softmax']

Quanitization Nodes: []

Constant Count: 61

Variable Count: 97

Identity Count: 30

Total nodes: 308

Get model size


In [16]:
def get_size(model_dir):
    
    print model_dir
    print ""
    
    pb_size = os.path.getsize(os.path.join(model_dir,'saved_model.pb'))
    
    variables_size = 0
    if os.path.exists(os.path.join(model_dir,'variables/variables.data-00000-of-00001')):
        variables_size = os.path.getsize(os.path.join(model_dir,'variables/variables.data-00000-of-00001'))
        variables_size += os.path.getsize(os.path.join(model_dir,'variables/variables.index'))

    print "Model size: {} KB".format(round(pb_size/(1024.0),3))
    print "Variables size: {} KB".format(round( variables_size/(1024.0),3))
    print "Total Size: {} KB".format(round((pb_size + variables_size)/(1024.0),3))

In [17]:
get_size(saved_model_dir)


models/mnist/keras_classifier/export/1538930452

Model size: 57.457 KB
Variables size: 10691.978 KB
Total Size: 10749.435 KB

5. Freeze SavedModel

This function will convert the SavedModel into a GraphDef file (freezed_model.pb), and storing the variables as constrant to the freezed_model.pb

You need to define the graph output nodes for freezing. We are only interested in the output of softmax/Softmax node


In [18]:
def freeze_graph(saved_model_dir):
    
    from tensorflow.python.tools import freeze_graph
    from tensorflow.python.saved_model import tag_constants
    
    output_graph_filename = os.path.join(saved_model_dir, "freezed_model.pb")
    output_node_names = "softmax/Softmax"
    initializer_nodes = ""

    freeze_graph.freeze_graph(
        input_saved_model_dir=saved_model_dir,
        output_graph=output_graph_filename,
        saved_model_tags = tag_constants.SERVING,
        output_node_names=output_node_names,
        initializer_nodes=initializer_nodes,

        input_graph=None, 
        input_saver=False,
        input_binary=False, 
        input_checkpoint=None, 
        restore_op_name=None, 
        filename_tensor_name=None, 
        clear_devices=False,
        input_meta_graph=False,
    )
    
    print "SavedModel graph freezed!"

In [19]:
freeze_graph(saved_model_dir)


SavedModel graph freezed!

In [20]:
%%bash
saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}


models/mnist/keras_classifier/export/1538930452
freezed_model.pb
saved_model.pb
variables

6. Describe the freezed_model.pb Graph (after freezing)

Load GraphDef from GraphDef File


In [21]:
def get_graph_def_from_file(graph_filepath):
    
    print graph_filepath
    print ""
    
    from tensorflow.python import ops
    
    with ops.Graph().as_default():
        with tf.gfile.GFile(graph_filepath, "rb") as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
            
            return graph_def

In [22]:
freezed_filepath=os.path.join(saved_model_dir,'freezed_model.pb')
describe_graph(get_graph_def_from_file(freezed_filepath))


models/mnist/keras_classifier/export/1538930452/freezed_model.pb

Input Feature Nodes: [u'serving_input_image']

Unused Nodes: []

Output Nodes: [u'softmax/Softmax']

Quanitization Nodes: []

Constant Count: 34

Variable Count: 0

Identity Count: 27

Total nodes: 94

8. Optimise the freezed_model.pb

Optimise GraphDef


In [23]:
def optimize_graph(model_dir, graph_filename, transforms):
    
    from tensorflow.tools.graph_transforms import TransformGraph
    
    input_names = []
    output_names = ['softmax/Softmax']
    
    graph_def = get_graph_def_from_file(os.path.join(model_dir, graph_filename))
    optimised_graph_def = TransformGraph(graph_def, 
                                         input_names,
                                         output_names,
                                         transforms 
                                        )
    tf.train.write_graph(optimised_graph_def,
                        logdir=model_dir,
                        as_text=False,
                        name='optimised_model.pb')
    
    print "Freezed graph optimised!"

In [24]:
transforms = [
    'remove_nodes(op=Identity)', 
    'fold_constants(ignore_errors=true)',
    'fold_batch_norms',
#    'fuse_resize_pad_and_conv',
#    'quantize_weights',
#    'quantize_nodes',
    'merge_duplicate_nodes',
    'strip_unused_nodes', 
    'sort_by_execution_order'
]

optimize_graph(saved_model_dir, 'freezed_model.pb', transforms)


models/mnist/keras_classifier/export/1538930452/freezed_model.pb

Freezed graph optimised!

In [25]:
%%bash
saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)
echo ${saved_model_dir}
ls ${saved_model_dir}


models/mnist/keras_classifier/export/1538930452
freezed_model.pb
optimised_model.pb
saved_model.pb
variables

8. Describe the Optimised Graph


In [26]:
optimised_filepath=os.path.join(saved_model_dir,'optimised_model.pb')
describe_graph(get_graph_def_from_file(optimised_filepath))


models/mnist/keras_classifier/export/1538930452/optimised_model.pb

Input Feature Nodes: [u'serving_input_image']

Unused Nodes: []

Output Nodes: [u'softmax/Softmax']

Quanitization Nodes: []

Constant Count: 29

Variable Count: 0

Identity Count: 0

Total nodes: 62

9. Convert Optimised graph (GraphDef) to SavedModel


In [27]:
def convert_graph_def_to_saved_model(graph_filepath):

    from tensorflow.python import ops
    export_dir=os.path.join(saved_model_dir,'optimised')

    if tf.gfile.Exists(export_dir):
        tf.gfile.DeleteRecursively(export_dir)

    graph_def = get_graph_def_from_file(graph_filepath)
    
    with tf.Session(graph=tf.Graph()) as session:
        tf.import_graph_def(graph_def, name="")
        tf.saved_model.simple_save(session,
                export_dir,
                inputs={
                    node.name: session.graph.get_tensor_by_name("{}:0".format(node.name)) 
                    for node in graph_def.node if node.op=='Placeholder'},
                outputs={
                    "softmax": session.graph.get_tensor_by_name("softmax/Softmax:0"),
                }
            )

        print "Optimised graph converted to SavedModel!"

In [28]:
optimised_filepath=os.path.join(saved_model_dir,'optimised_model.pb')
convert_graph_def_to_saved_model(optimised_filepath)


models/mnist/keras_classifier/export/1538930452/optimised_model.pb

Optimised graph converted to SavedModel!

Optimised SavedModel Size


In [29]:
optimised_saved_model_dir = os.path.join(saved_model_dir,'optimised') 
get_size(optimised_saved_model_dir)


models/mnist/keras_classifier/export/1538930452/optimised

Model size: 10701.56 KB
Variables size: 0.0 KB
Total Size: 10701.56 KB

In [30]:
%%bash

saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)/optimised
ls ${saved_model_dir}
saved_model_cli show --dir ${saved_model_dir} --all


saved_model.pb
variables

MetaGraphDef with tag-set: 'serve' contains the following SignatureDefs:

signature_def['serving_default']:
  The given SavedModel SignatureDef contains the following input(s):
    inputs['serving_input_image'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 28, 28)
        name: serving_input_image:0
  The given SavedModel SignatureDef contains the following output(s):
    outputs['softmax'] tensor_info:
        dtype: DT_FLOAT
        shape: (-1, 10)
        name: softmax/Softmax:0
  Method name is: tensorflow/serving/predict

10. Prediction with the Optimised SavedModel


In [35]:
optimized_saved_model_dir = os.path.join(saved_model_dir,'optimised') 
print(optimized_saved_model_dir)
inference_test(saved_model_dir=optimized_saved_model_dir, signature='serving_default', input_name='serving_input_image')


models/mnist/keras_classifier/export/1538930452/optimised

Model loading time: 0.103157 seconds

Inference elapsed time: 31.120613 seconds

Prediction produced for 300 instances batch, repeated 100 times
Average latency per batch: 0.31120613 seconds

Cloud ML Engine Deployment and Prediction


In [ ]:
PROJECT = 'ksalama-gcp-playground'
BUCKET = 'ksalama-gcs-cloudml'
REGION = 'europe-west1'
MODEL_NAME = 'mnist_classifier'

os.environ['BUCKET'] = BUCKET
os.environ['PROJECT'] = PROJECT
os.environ['REGION'] = REGION
os.environ['MODEL_NAME'] = MODEL_NAME

1. Upload the model artefacts to Google Cloud Storage bucket


In [ ]:
%%bash

gsutil -m rm -r gs://${BUCKET}/tf-model-optimisation

In [ ]:
%%bash

saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)

echo ${saved_model_dir}

gsutil -m cp -r ${saved_model_dir} gs://${BUCKET}/tf-model-optimisation/original

In [ ]:
%%bash

saved_models_base=models/mnist/keras_classifier/export/
saved_model_dir=${saved_models_base}$(ls ${saved_models_base} | tail -n 1)/optimised

echo ${saved_model_dir}

gsutil -m cp -r ${saved_model_dir} gs://${BUCKET}/tf-model-optimisation

2. Deploy models to Cloud ML Engine

Don't forget to delete the model and the model version if they were previously deployed!


In [ ]:
%%bash

echo ${MODEL_NAME}

gcloud ml-engine models create ${MODEL_NAME} --regions=${REGION}

Version: v_org is the original SavedModel (before optimisation)


In [ ]:
%%bash

MODEL_VERSION='v_org'
MODEL_ORIGIN=gs://${BUCKET}/tf-model-optimisation/original

gcloud ml-engine versions create ${MODEL_VERSION}\
            --model=${MODEL_NAME} \
            --origin=${MODEL_ORIGIN} \
            --runtime-version=1.10

Version: v_opt is the optimised SavedModel (after optimisation)


In [ ]:
%%bash

MODEL_VERSION='v_opt'
MODEL_ORIGIN=gs://${BUCKET}/tf-model-optimisation/optimised

gcloud ml-engine versions create ${MODEL_VERSION}\
            --model=${MODEL_NAME} \
            --origin=${MODEL_ORIGIN} \
            --runtime-version=1.10

3. Cloud ML Engine online predictions


In [ ]:
from googleapiclient import discovery
from oauth2client.client import GoogleCredentials

credentials = GoogleCredentials.get_application_default()
api = discovery.build(
    'ml', 'v1', 
    credentials=credentials, 
    discoveryServiceUrl='https://storage.googleapis.com/cloud-ml/discovery/ml_v1_discovery.json'
)

    
def predict(version, instances):

    request_data = {'instances': instances}

    model_url = 'projects/{}/models/{}/versions/{}'.format(PROJECT, MODEL_NAME, version)
    response = api.projects().predict(body=request_data, name=model_url).execute()

    class_ids = None
    
    try:
        class_ids = [item["class_ids"] for item in response["predictions"]]
    except:
        print response
    
    return class_ids

In [ ]:
def inference_cmle(version, batch=100, repeat=10):
    
    instances = [
            {'input_image_3': [float(i) for i in list(eval_data[img])] }
        for img in range(batch)
    ]

    #warmup request
    predict(version, instances[0])
    print 'Warm up request performed!'
    print 'Timer started...'
    print ''
    
    time_start = datetime.utcnow() 
    output = None
    
    for i in range(repeat):
        output = predict(version, instances)
    
    time_end = datetime.utcnow() 

    time_elapsed_sec = (time_end - time_start).total_seconds()
    
    print "Inference elapsed time: {} seconds".format(time_elapsed_sec)
    print ""
    
    print "Prediction produced for {} instances batch, repeated {} times".format(len(output), repeat)
    print "Average latency per batch: {} seconds".format(time_elapsed_sec/repeat)
    print ""
    
    print "Prediction output for the last instance: {}".format(output[0])

In [ ]:
version='v_org'
inference_cmle(version)

In [ ]:
version='v_opt'
inference_cmle(version)

Happy serving!


In [ ]: