A short tutorial how to use the mlperf inference reference benchmark

We wrapped all inference models into a single benchmark app. The benchmark app will read the propper dataset, preprocesses it and interface with the backend. Traffic is generated by loadgen, which depending on the desired mode drives the desired traffic to the benchmark app.

To run this notebook, pick a directory and clone the mlperf source tree:

cd /tmp
git clone https://github.com/mlperf/inference.git
cd inference/v0.5/classification_and_detection
jupyter notebook



In [1]:

    
import os
root = os.getcwd()



In [2]:

    
!cd ../../loadgen; CFLAGS="-std=c++14" python setup.py develop; cd {root}
!python setup.py develop









    



running develop
running egg_info
writing mlperf_loadgen.egg-info/PKG-INFO
writing dependency_links to mlperf_loadgen.egg-info/dependency_links.txt
writing top-level names to mlperf_loadgen.egg-info/top_level.txt
reading manifest file 'mlperf_loadgen.egg-info/SOURCES.txt'
writing manifest file 'mlperf_loadgen.egg-info/SOURCES.txt'
running build_ext
building 'mlperf_loadgen' extension
gcc -pthread -B /opt/anaconda3/compiler_compat -Wl,--sysroot=/ -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -std=c++14 -fPIC -DMAJOR_VERSION=0 -DMINOR_VERSION=5 -I. -I../third_party/pybind/include -I/opt/anaconda3/include/python3.7m -c loadgen.cc -o build/temp.linux-x86_64-3.7/loadgen.o
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
...
Using /opt/anaconda3/lib/python3.7/site-packages
Finished processing dependencies for mlperf-inference==0.1.0

The benchmark app uses a shell script to simplify command line options and the user can pick backend, model and device:



In [3]:

    
!./run_local.sh









    



usage: ./run_local.sh tf|onnxruntime|pytorch|tflite [resnet50|mobilenet|ssd-mobilenet|ssd-resnet34|ssd-resnet34] [cpu|gpu]

Before running the benchmark, device on model and dataset and set the environment variable MODEL_DIR and DATA_DIR.

For this tutorial we use onnxruntime (tensorflow and pytorch will work as well), mobilenet and a fake imagetnet dataset with a few images.



In [4]:

    
!pip install onnxruntime









    



Requirement already satisfied: onnxruntime in /opt/anaconda3/lib/python3.7/site-packages (0.4.0)

Step 1 - download the model. You find the links to the models here.



In [5]:

    
!wget -q https://zenodo.org/record/3157894/files/mobilenet_v1_1.0_224.onnx

Step 2 - download the dataset. For this tutorial we create a small, fake dataset that pretends to be imagenet.

Normally you'd need to download imagenet2012/valiation for image classification or coco2017/valiation for object detections.

Links and instructions how to download the datasets can be found in the README



In [6]:

    
!tools/make_fake_imagenet.sh

Step 3 - tell the benchmark where to find model and data



In [9]:

    
import os
os.environ['MODEL_DIR'] = root
os.environ['DATA_DIR'] = os.path.join(root, "fake_imagenet")

For mlperf submission number of queries, time, latencies and percentiles are given and we default to those settings. But for this tuturial we pass in some extra options to make things go quicker. run_local.sh will look for the evironment variable EXTRA_OPS and add this to the arguments. You can also add additional arguments in the command line. The options below will limit the time that the benchmarks run to 10 seconds and adds accuracy reporting.



In [10]:

    
os.environ['EXTRA_OPS'] ="--queries-offline 20 --time 10 --max-latency 0.2"

Step 4 - run the benchmark.



In [11]:

    
!./run_local.sh onnxruntime mobilenet cpu --accuracy









    



INFO:main:Namespace(accuracy=True, backend='onnxruntime', cache=0, count=None, data_format=None, dataset='imagenet_mobilenet', dataset_list=None, dataset_path='/home/gs/inference/v0.5/classification_and_detection/fake_imagenet', inputs=None, max_batchsize=32, max_latency=[0.2], model='/home/gs/inference/v0.5/classification_and_detection/mobilenet_v1_1.0_224.onnx', output='/home/gs/inference/v0.5/classification_and_detection/output/mobilenet-onnxruntime-cpu/results.json', outputs=['MobilenetV1/Predictions/Reshape_1:0'], profile='mobilenet-onnxruntime', qps=10, queries_multi=24576, queries_offline=20, queries_single=1024, scenario=[TestScenario.SingleStream], threads=8, time=10)
INFO:imagenet:loaded 8 images, cache=0, took=0.2sec
INFO:main:starting TestScenario.SingleStream
TestScenario.SingleStream qps=69.75, mean=0.0115, time=0.11, acc=62.50, queries=8, tiles=50.0:0.0112,80.0:0.0115,90.0:0.0121,95.0:0.0129,99.0:0.0135,99.9:0.0137

The line Accuracy reports accuracy or mAP together with some latencies in various percentiles so you can insight how this run was. Above accuracy was 87.5%.

The line TestScenario.SingleStream-1.0 reports the latency and qps seen during the benchmark.

For submission the official logging is found in mlperf_log_summary.txt and mlperf_log_detail.txt.

If you read over the mlperf inference rules guide you'll find multiple scenarios to be run for the inference benchmarks:

scenario	description
SingleStream	The LoadGen sends the next query as soon as the SUT completes the previous one
MultiStream	The LoadGen sends a new query every Latency Constraint, if the SUT has completed the prior query. Otherwise, the new query is dropped. Such an event is one overtime query.
Server	The LoadGen sends new queries to the SUT according to a Poisson distribution. Overtime queries must not exceed 2x the latency bound.
Offline	The LoadGen sends all queries to the SUT at one time.

We can run those scenario using the --scenario option in the command line, for example:



In [12]:

    
!./run_local.sh onnxruntime mobilenet cpu --scenario Offline









    



INFO:main:Namespace(accuracy=False, backend='onnxruntime', cache=0, count=None, data_format=None, dataset='imagenet_mobilenet', dataset_list=None, dataset_path='/home/gs/inference/v0.5/classification_and_detection/fake_imagenet', inputs=None, max_batchsize=32, max_latency=[0.2], model='/home/gs/inference/v0.5/classification_and_detection/mobilenet_v1_1.0_224.onnx', output='/home/gs/inference/v0.5/classification_and_detection/output/mobilenet-onnxruntime-cpu/results.json', outputs=['MobilenetV1/Predictions/Reshape_1:0'], profile='mobilenet-onnxruntime', qps=10, queries_multi=24576, queries_offline=20, queries_single=1024, scenario=[TestScenario.Offline], threads=8, time=10)
INFO:imagenet:loaded 8 images, cache=0, took=0.0sec
INFO:main:starting TestScenario.Offline
TestScenario.Offline qps=44.11, mean=2.3486, time=2.49, queries=110, tiles=50.0:2.4500,80.0:2.4687,90.0:2.4687,95.0:2.4687,99.0:2.4687,99.9:2.4687

Additional logfiles

We log some additional information here which can be used to plot graphs.

Under the hood

In case you wonder what the run_local.sh does, it only assembles the command line for the python based benchmark app. Command ine options for the app are documented here

Calling

!bash -x ./run_local.sh onnxruntime mobilenet cpu  --accuracy

will results in the following command line:

python python/main.py --profile mobilenet-onnxruntime --model /tmp/inference/cloud/image_classification/mobilenet_v1_1.0_224.onnx --dataset-path /tmp/inference/cloud/image_classification/fake_imagenet --output /tmp/inference/cloud/image_classification/output/mobilenet-onnxruntime-cpu/results.json --queries-offline 20 --time 10 --max-latency 0.2 --accuracy

During testing you can change some of the options to have faster test cycles but for final submission use the defaults.

Using docker

Instead of run_local.sh you can use run_and_time.sh which does have the same options but instead of running local will run the benchmark under docker.



In [13]:

    
!./run_and_time.sh onnxruntime mobilenet cpu









    



Sending build context to Docker daemon  18.54MB
Step 1/15 : FROM ubuntu:16.04
 ---> bd3d4369aebc
Step 2/15 : ENV PYTHON_VERSION=3.7
 ---> Using cache
 ---> e25f214201a2
Step 3/15 : ENV LANG C.UTF-8
 ---> Using cache
 ---> 12986ee696e1
Step 4/15 : ENV LC_ALL C.UTF-8
 ---> Using cache
 ---> 1460535b24e1
Step 5/15 : ENV PATH /opt/anaconda3/bin:$PATH
 ---> Using cache
 ---> f4c922578fdf
Step 6/15 : WORKDIR /root
 ---> Using cache
 ---> fb0ec9a436a5
Step 7/15 : ENV HOME /root
 ---> Using cache
 ---> edeb7c15ebfb
Step 8/15 : RUN apt-get update
 ---> Using cache
 ---> 42da1a4fa3fd
Step 9/15 : RUN apt-get install -y --no-install-recommends       git       build-essential       software-properties-common       ca-certificates       wget       curl       htop       zip       unzip
 ---> Using cache
 ---> a1de66a3c7bd
Step 10/15 : RUN cd /opt &&     wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-4.6.14-Linux-x86_64.sh -O miniconda.sh &&     /bin/bash ./miniconda.sh -b -p /opt/anaconda3 &&     rm miniconda.sh &&     /opt/anaconda3/bin/conda clean -tipsy &&     ln -s /opt/anaconda3/etc/profile.d/conda.sh /etc/profile.d/conda.sh &&     echo ". /opt/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc &&     echo "conda activate base" >> ~/.bashrc &&     conda config --set always_yes yes --set changeps1 no
 ---> Using cache
 ---> b3a1fa068421
Step 11/15 : RUN conda install pytorch-cpu torchvision-cpu -c pytorch
 ---> Using cache
 ---> 0f7c294fe4c8
Step 12/15 : RUN pip install future pillow onnx opencv-python-headless tensorflow onnxruntime
 ---> Using cache
 ---> 160977b84ece
Step 13/15 : RUN pip install Cython && pip install pycocotools
 ---> Using cache
 ---> ffc479fc7d11
Step 14/15 : RUN cd /tmp &&     git clone https://github.com/mlperf/inference &&     cd inference/loadgen &&     pip install pybind11 &&     CFLAGS="-std=c++14" python setup.py install &&     rm -rf mlperf
 ---> Using cache
 ---> 20eb0ce678b0
Step 15/15 : ENTRYPOINT ["/bin/bash"]
 ---> Using cache
 ---> 9440a8884457
Successfully built 9440a8884457
Successfully tagged mlperf-infer-imgclassify-cpu:latest
Clearing caches.
3
STARTING RUN AT 2019-07-23 04:09:29 PM
INFO:main:Namespace(accuracy=False, backend='onnxruntime', cache=0, count=None, data_format=None, dataset='imagenet_mobilenet', dataset_list=None, dataset_path='/home/gs/inference/v0.5/classification_and_detection/fake_imagenet', inputs=None, max_batchsize=32, max_latency=[0.2], model='/home/gs/inference/v0.5/classification_and_detection/mobilenet_v1_1.0_224.onnx', output='/output/results.json', outputs=['MobilenetV1/Predictions/Reshape_1:0'], profile='mobilenet-onnxruntime', qps=10, queries_multi=24576, queries_offline=20, queries_single=1024, scenario=[TestScenario.SingleStream], threads=8, time=10)
INFO:imagenet:loaded 8 images, cache=0, took=0.3sec
INFO:main:starting TestScenario.SingleStream
TestScenario.SingleStream qps=37.18, mean=0.0268, time=10.09, queries=375, tiles=50.0:0.0261,80.0:0.0262,90.0:0.0266,95.0:0.0271,99.0:0.0385,99.9:0.0823
ENDING RUN AT 2019-07-23 04:09:45 PM

Preparing for offical submision

TODO.



In [ ]: