We wrapped all inference models into a single benchmark app. The benchmark app will read the propper dataset, preprocesses it and interface with the backend. Traffic is generated by loadgen, which depending on the desired mode drives the desired traffic to the benchmark app.
To run this notebook, pick a directory and clone the mlperf source tree:
cd /tmp
git clone https://github.com/mlperf/inference.git
cd inference/v0.5/classification_and_detection
jupyter notebook
In [1]:
import os
root = os.getcwd()
In [2]:
!cd ../../loadgen; CFLAGS="-std=c++14" python setup.py develop; cd {root}
!python setup.py develop
The benchmark app uses a shell script to simplify command line options and the user can pick backend, model and device:
In [3]:
!./run_local.sh
Before running the benchmark, device on model and dataset and set the environment variable MODEL_DIR
and DATA_DIR
.
For this tutorial we use onnxruntime (tensorflow and pytorch will work as well), mobilenet and a fake imagetnet dataset with a few images.
In [4]:
!pip install onnxruntime
In [5]:
!wget -q https://zenodo.org/record/3157894/files/mobilenet_v1_1.0_224.onnx
Normally you'd need to download imagenet2012/valiation for image classification or coco2017/valiation for object detections.
Links and instructions how to download the datasets can be found in the README
In [6]:
!tools/make_fake_imagenet.sh
In [9]:
import os
os.environ['MODEL_DIR'] = root
os.environ['DATA_DIR'] = os.path.join(root, "fake_imagenet")
For mlperf submission number of queries, time, latencies and percentiles are given and we default to those settings. But for this tuturial we pass in some extra options to make things go quicker. run_local.sh will look for the evironment variable EXTRA_OPS and add this to the arguments. You can also add additional arguments in the command line. The options below will limit the time that the benchmarks run to 10 seconds and adds accuracy reporting.
In [10]:
os.environ['EXTRA_OPS'] ="--queries-offline 20 --time 10 --max-latency 0.2"
In [11]:
!./run_local.sh onnxruntime mobilenet cpu --accuracy
The line Accuracy
reports accuracy or mAP together with some latencies in various percentiles so you can insight how this run was. Above accuracy was 87.5%.
The line TestScenario.SingleStream-1.0
reports the latency and qps seen during the benchmark.
For submission the official logging is found in mlperf_log_summary.txt and mlperf_log_detail.txt.
If you read over the mlperf inference rules guide you'll find multiple scenarios to be run for the inference benchmarks:
scenario | description |
---|---|
SingleStream | The LoadGen sends the next query as soon as the SUT completes the previous one |
MultiStream | The LoadGen sends a new query every Latency Constraint, if the SUT has completed the prior query. Otherwise, the new query is dropped. Such an event is one overtime query. |
Server | The LoadGen sends new queries to the SUT according to a Poisson distribution. Overtime queries must not exceed 2x the latency bound. |
Offline | The LoadGen sends all queries to the SUT at one time. |
We can run those scenario using the --scenario
option in the command line, for example:
In [12]:
!./run_local.sh onnxruntime mobilenet cpu --scenario Offline
We log some additional information here which can be used to plot graphs.
In case you wonder what the run_local.sh does, it only assembles the command line for the python based benchmark app. Command ine options for the app are documented here
Calling
!bash -x ./run_local.sh onnxruntime mobilenet cpu --accuracy
will results in the following command line:
python python/main.py --profile mobilenet-onnxruntime --model /tmp/inference/cloud/image_classification/mobilenet_v1_1.0_224.onnx --dataset-path /tmp/inference/cloud/image_classification/fake_imagenet --output /tmp/inference/cloud/image_classification/output/mobilenet-onnxruntime-cpu/results.json --queries-offline 20 --time 10 --max-latency 0.2 --accuracy
During testing you can change some of the options to have faster test cycles but for final submission use the defaults.
In [13]:
!./run_and_time.sh onnxruntime mobilenet cpu
In [ ]: