Optimize Trained CPU Model

Types of Optimizations Applied for Inference

  • Remove training-only operations (checkpoint saving, drop out)
  • Strip out unused nodes
  • Remove debug operations
  • Fold batch normalization ops into weights (super cool)
  • Round weights
  • Quantize weights

Optimize Models

Summarize Graph Utility


In [8]:
%%bash 

which summarize_graph


/root/tensorflow/bazel-bin/tensorflow/tools/graph_transforms/summarize_graph

In [10]:
%%bash

## TODO: /root/models/linear/cpu/metagraph
## ls -l /root/models/optimize_me/

ls -l /root/models/linear/cpu/unoptimized


total 36
-rw-r--r-- 1 root root 33509 May  8 01:55 metagraph.pb

In [14]:
%%bash

freeze_graph


Input checkpoint '' doesn't exist!

In [13]:
from tensorflow.python.tools import freeze_graph

checkpoint_prefix = os.path.join(self.get_temp_dir(), "saved_checkpoint")
checkpoint_state_name = "checkpoint_state"
input_graph_name = "input_graph.pb"
output_graph_name = "output_graph.pb"
    
input_graph_path = os.path.join(self.get_temp_dir(),
                                input_graph_name)
input_saver_def_path = ""
input_binary = False
output_node_names = "output_node"
restore_op_name = "save/restore_all"
filename_tensor_name = "save/Const:0"
output_graph_path = os.path.join(self.get_temp_dir(), output_graph_name)
clear_devices = False
    
freeze_graph.freeze_graph(input_graph_path,
                          input_saver_def_path,
                          input_binary, 
                          checkpoint_path,
                          output_node_names,
                          restore_op_name,
                          filename_tensor_name,
                          output_graph_path,
                          clear_devices, "")


Out[13]:
<module 'tensorflow.python.tools.freeze_graph' from '/opt/conda/lib/python3.5/site-packages/tensorflow/python/tools/freeze_graph.py'>

In [11]:
%%bash

## TODO: /root/models/linear/cpu/unoptimized/metagraph.pb
## summarize_graph --in_graph=/root/models/optimize_me/unoptimized_cpu.pb

summarize_graph --in_graph=/root/models/linear/cpu/unoptimized/metagraph.pb


[libprotobuf ERROR external/protobuf/src/google/protobuf/wire_format_lite.cc:621] String field 'tensorflow.NodeDef.op' contains invalid UTF-8 data when parsing a protocol buffer. Use the 'bytes' type if you intend to send raw bytes. 
[libprotobuf ERROR external/protobuf/src/google/protobuf/text_format.cc:299] Error parsing text-format tensorflow.GraphDef: 2:1: Interpreting non ascii codepoint 148.
[libprotobuf ERROR external/protobuf/src/google/protobuf/text_format.cc:299] Error parsing text-format tensorflow.GraphDef: 2:1: Expected identifier, got: �
2017-05-08 01:55:37.834259: E tensorflow/tools/graph_transforms/summarize_graph_main.cc:266] Loading graph '/root/models/linear/cpu/unoptimized/metagraph.pb' failed with Can't parse /root/models/linear/cpu/unoptimized/metagraph.pb as binary proto
	 (both text and binary parsing failed for file /root/models/linear/cpu/unoptimized/metagraph.pb)
2017-05-08 01:55:37.834328: E tensorflow/tools/graph_transforms/summarize_graph_main.cc:268] usage: summarize_graph
Flags:
	--in_graph=""                    	string	input graph file name

Strip Unused Nodes


In [ ]:
%%bash

# TODO:  shuffle_batch??  x_observed_batch??
transform_graph \
--in_graph=/root/models/optimize_me/unoptimized_cpu.pb \
--out_graph=/root/models/optimize_me/strip_unused_optimized_cpu.pb \
--inputs='x_observed,weights,bias' \
--outputs='add' \
--transforms='
strip_unused_nodes'

In [ ]:
%%bash

ls -l /root/models/optimize_me/

In [ ]:
%%bash

summarize_graph --in_graph=/root/models/optimize_me/strip_unused_optimized_cpu.pb

In [ ]:
%%bash

benchmark_model --graph=/root/models/optimize_me/strip_unused_optimized_cpu.pb --input_layer=weights,bias,x_observed --input_layer_type=float,float,float --input_layer_shape=:: --output_layer=add

Fold Constants


In [ ]:
%%bash

transform_graph \
--in_graph=/root/models/optimize_me/unoptimized_cpu.pb \
--out_graph=/root/models/optimize_me/fold_constants_optimized_cpu.pb \
--inputs='x_observed,weights,bias' \
--outputs='add' \
--transforms='
fold_constants(ignore_errors=true)'

In [ ]:
%%bash

ls -l /root/models/optimize_me/

In [ ]:
%%bash

summarize_graph --in_graph=/root/models/optimize_me/fold_constants_optimized_cpu.pb

In [ ]:
%%bash

benchmark_model --graph=/root/models/optimize_me/fold_constants_optimized_cpu.pb --input_layer=x_observed,bias,weights --input_layer_type=float,float,float --input_layer_shape=:: --output_layer=add

Fold Batch Normalizations

Must run Fold Constants first!


In [ ]:
%%bash

transform_graph \
--in_graph=/root/models/optimize_me/fold_constants_optimized_cpu.pb \
--out_graph=/root/models/optimize_me/fold_batch_norms_optimized_cpu.pb \
--inputs='x_observed,weights,bias' \
--outputs='add' \
--transforms='
fold_batch_norms
fold_old_batch_norms'

In [ ]:
%%bash

ls -l /root/models/optimize_me/

In [ ]:
%%bash

summarize_graph --in_graph=/root/models/optimize_me/fold_batch_norms_optimized_cpu.pb

In [ ]:
%%bash

benchmark_model --graph=/root/models/optimize_me/fold_batch_norms_optimized_cpu.pb --input_layer=x_observed,bias,weights --input_layer_type=float,float,float --input_layer_shape=:: --output_layer=add

Quantize Weights

Should run Fold Batch Norms first!


In [ ]:
%%bash

transform_graph \
--in_graph=/root/models/optimize_me/fold_batch_norms_optimized_cpu.pb \
--out_graph=/root/models/optimize_me/quantized_optimized_cpu.pb \
--inputs='x_observed,weights,bias' \
--outputs='add' \
--transforms='quantize_weights'

In [ ]:
%%bash

ls -l /root/models/optimize_me/

In [ ]:
%%bash

summarize_graph --in_graph=/root/models/optimize_me/quantized_optimized_cpu.pb

In [ ]:
%%bash

benchmark_model --graph=/root/models/optimize_me/quantized_optimized_cpu.pb --input_layer=x_observed,bias,weights --input_layer_type=float,float,float --input_layer_shape=:: --output_layer=add

Perform All Common Optimizations


In [ ]:
%%bash

transform_graph \
--in_graph=/root/models/optimize_me/unoptimized_cpu.pb \
--out_graph=/root/models/optimize_me/fully_optimized_cpu.pb \
--inputs='x_observed,weights,bias' \
--outputs='add' \
--transforms='
add_default_attributes
remove_nodes(op=Identity, op=CheckNumerics)
fold_constants(ignore_errors=true)
fold_batch_norms
fold_old_batch_norms
quantize_weights
quantize_nodes
strip_unused_nodes
obfuscate_names'

In [ ]:
%%bash

ls -l /root/models/optimize_me/

In [ ]:
%%bash

summarize_graph --in_graph=/root/models/optimize_me/fully_optimized_cpu.pb

In [ ]:
%%bash

benchmark_model --graph=/root/models/optimize_me/fully_optimized_cpu.pb --input_layer=weights,x_observed,bias --input_layer_type=float,float,float --input_layer_shape=:: --output_layer=add

Sort by Execution Order (DAG Topological Order)

  • Minimizes inference overhead
  • Inputs for a node guaranteed to be available

In [ ]:
%%bash

transform_graph \
--in_graph=/root/models/optimize_me/fully_optimized_cpu.pb \
--out_graph=/root/models/optimize_me/sort_by_execution_order_optimized_cpu.pb \
--inputs='x_observed,weights,bias' \
--outputs='add' \
--transforms='
sort_by_execution_order'

In [ ]:
%%bash

ls -l /root/models/optimize_me/

In [ ]:
%%bash

summarize_graph --in_graph=/root/models/optimize_me/sort_by_execution_order_optimized_cpu.pb

In [ ]:
%%bash

benchmark_model --graph=/root/models/optimize_me/sort_by_execution_order_optimized_cpu.pb --input_layer=weights,x_observed,bias --input_layer_type=float,float,float --input_layer_shape=:: --output_layer=add

In [ ]: