End to End Mlperf Submission example

This is following the General MLPerf Submission Rules.

Get the mlperf source code

You run this notebook from the root of the mlperf inference tree that you cloned with

git clone https://github.com/mlperf/inference.git --depth 1

Build loadgen


In [ ]:
# build loadgen
!pip install pybind11
!cd loadgen; CFLAGS="-std=c++14 -O3" python setup.py develop

In [ ]:
!cd v0.5/classification_and_detection; python setup.py develop

Set Working Directory


In [15]:
%cd v0.5/classification_and_detection


/home/gs/inference/v0.5/classification_and_detection

Download data

We need to download imagenet and/or coco for the benchmark. In our setup we keep our data in /data and can symlink to /data. You might need to change this to the location in your setup.


In [ ]:
%%bash

mkdir data
ln -s  /data/imagenet2012 data/
ln -s  /data/coco data/

Download models


In [ ]:
%%bash

mkdir models

# resnet50
wget -q https://zenodo.org/record/2535873/files/resnet50_v1.pb -O models/resnet50_v1.pb 
wget -q https://zenodo.org/record/2592612/files/resnet50_v1.onnx -O models/resnet50_v1.onnx

# mobilenet
wget -q https://zenodo.org/record/2269307/files/mobilenet_v1_1.0_224.tgz -O models/mobilenet_v1_1.0_224.tgz
cd models; tar zxvf mobilenet_v1_1.0_224.tgz
wget -q https://zenodo.org/record/3157894/files/mobilenet_v1_1.0_224.onnx -O models/mobilenet_v1_1.0_224.onnx

# ssd-mobilenet
wget -q http://download.tensorflow.org/models/object_detection/ssd_mobilenet_v1_coco_2018_01_28.tar.gz -O models/ssd_mobilenet_v1_coco_2018_01_28.tar.gz
cd models; tar zxvf ssd_mobilenet_v1_coco_2018_01_28.tar.gz; mv ssd_mobilenet_v1_coco_2018_01_28/frozen_inference_graph.pb ssd_mobilenet_v1_coco_2018_01_28.pb
wget -q https://zenodo.org/record/3163026/files/ssd_mobilenet_v1_coco_2018_01_28.onnx -O models/ssd_mobilenet_v1_coco_2018_01_28.onnx 

# ssd-resnet34
wget -q https://zenodo.org/record/3345892/files/tf_ssd_resnet34_22.1.zip -O models/tf_ssd_resnet34_22.1.zip
cd models; unzip tf_ssd_resnet34_22.1.zip; mv tf_ssd_resnet34_22.1/resnet34_tf.22.1.pb .
wget -q https://zenodo.org/record/3228411/files/resnet34-ssd1200.onnx -O models/resnet34-ssd1200.onnx

Run benchmarks using the reference implementation

Lets prepare a submission for mobilenet on a desktop machine with a NVidia gtx-1080 gpu using tensorflow.

The following script will run those combinations and prepare a submission directory, following the general submission rules documented here.


In [1]:
import logging
import os
logger = logging.getLogger()
logger.setLevel(logging.CRITICAL)

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' 
os.environ['CUDA_VISIBLE_DEVICES'] = "0"

# final results go here
ORG = "mlperf-org"
DIVISION = "closed"
SUBMISSION_ROOT = "/tmp/mlperf-submission"
SUBMISSION_DIR = os.path.join(SUBMISSION_ROOT, DIVISION, ORG)
os.environ['SUBMISSION_ROOT'] = SUBMISSION_ROOT
os.environ['SUBMISSION_DIR'] = SUBMISSION_DIR
os.makedirs(SUBMISSION_DIR, exist_ok=True)
os.makedirs(os.path.join(SUBMISSION_DIR, "measurements"), exist_ok=True)
os.makedirs(os.path.join(SUBMISSION_DIR, "code"), exist_ok=True)

In [ ]:
%%bash

# where to find stuff
export DATA_ROOT=`pwd`/data
export MODEL_DIR=`pwd`/models

# options for official runs
gopt="--max-batchsize 8 --samples-per-query 40 --threads 2 --qps 145"


function one_run {
    # args: mode count framework device model ...
    scenario=$1; shift
    count=$1; shift
    framework=$1
    device=$2
    model=$3
    system_id=$framework-$device
    echo "====== $model/$scenario ====="

    case $model in 
    mobilenet)
        cmd="tools/accuracy-imagenet.py --imagenet-val-file $DATA_ROOT/imagenet2012/val_map.txt"
        offical_name="mobilenet";;
    resnet50) 
        cmd="tools/accuracy-imagenet.py --imagenet-val-file $DATA_ROOT/imagenet2012/val_map.txt"
        offical_name="resnet";;
    ssd-mobilenet) 
        cmd="tools/accuracy-coco.py --coco-dir $DATA_ROOT/coco"
        offical_name="ssd-small";;
    ssd-resnet34) 
        cmd="tools/accuracy-coco.py --coco-dir $DATA_ROOT/coco"
        offical_name="ssd-large";;
    esac
    output_dir=$SUBMISSION_DIR/results/$system_id/$offical_name
    
    # accuracy run
    ./run_local.sh $@ --scenario $scenario --accuracy --output $output_dir/$scenario/accuracy
    python $cmd --mlperf-accuracy-file $output_dir/$scenario/accuracy/mlperf_log_accuracy.json \
            >  $output_dir/$scenario/accuracy/accuracy.txt
    cat $output_dir/$scenario/accuracy/accuracy.txt

    # performance run
    cnt=0
    while [ $cnt -le $count ]; do
        let cnt=cnt+1
        ./run_local.sh $@ --scenario $scenario --output $output_dir/$scenario/performance/$cnt
    done
    
    # setup the measurements directory
    mdir=$SUBMISSION_DIR/measurements/$system_id/$offical_name/$scenario
    mkdir -p $mdir
    cp ../mlperf.conf $mdir

    # reference app uses command line instead of user.conf
    echo "# empty" > $mdir/user.conf
    touch $mdir/README.md
    impid="reference"
    cat > $mdir/$system_id"_"$impid"_"$scenario".json" <<EOF
    {
        "input_data_types": "fp32",
        "retraining": "none",
        "starting_weights_filename": "https://zenodo.org/record/2269307/files/mobilenet_v1_1.0_224.tgz",
        "weight_data_types": "fp32",
        "weight_transformations": "none"
    }
    EOF
}

function one_model {
    # args: framework device model ...
    one_run SingleStream 1 $@ --max-latency 0.0005
    one_run Server 5 $@
    one_run Offline 1 $@ --qps 1000
    one_run MultiStream 1 $@
}


# run image classifier benchmarks 
export DATA_DIR=$DATA_ROOT/imagenet2012
one_model tf gpu mobilenet $gopt

There might be large trace files in the submission directory - we can delete them.


In [33]:
!find {SUBMISSION_DIR}/ -name mlperf_log_trace.json -delete

Complete submission directory

Add the required meta data to the submission. A template for the meta fields can be found here.


In [34]:
%%bash

#
# setup systems directory
#
if [ ! -d ${SUBMISSION_DIR}/systems ]; then
    mkdir ${SUBMISSION_DIR}/systems
fi

cat > ${SUBMISSION_DIR}/systems/tf-gpu.json <<EOF
{
        "division": "closed",
        "status": "available",
        "submitter": "mlperf-org",
        "system_name": "tf-gpu",
        
        "number_of_nodes": 1,
        "host_memory_capacity": "32GB",
        "host_processor_core_count": 1,
        "host_processor_frequency": "3.50GHz",
        "host_processor_model_name": "Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz",
        "host_processors_per_node": 1,
        "host_storage_capacity": "512GB",
        "host_storage_type": "SSD",
        
        "accelerator_frequency": "-",
        "accelerator_host_interconnect": "-",
        "accelerator_interconnect": "-",
        "accelerator_interconnect_topology": "-",
        "accelerator_memory_capacity": "8GB",
        "accelerator_memory_configuration": "none",
        "accelerator_model_name": "gtx-1080",
        "accelerator_on-chip_memories": "-",
        "accelerators_per_node": 1,

        "framework": "v1.14.0-rc1-22-gaf24dc9",
        "operating_system": "ubuntu-16.04",
        "other_software_stack": "cuda-10.1",
        "sw_notes": ""
}
EOF

In [35]:
%%bash

#
# setup code directory
#
dir=${SUBMISSION_DIR}/code/mobilenet/reference
mkdir -p $dir
echo "git clone https://github.com/mlperf/inference.git" > $dir/VERSION.txt
git rev-parse HEAD >> $dir/VERSION.txt

What's in the submission directory now ?


In [36]:
!find {SUBMISSION_ROOT}/ -type f


/tmp/mlperf-submission/closed/mlperf-org/systems/tf-gpu.json
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/Offline/tf-gpu_reference.json
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/Offline/README.md
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/Offline/mlperf.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/Offline/user.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/Server/tf-gpu_reference.json
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/Server/README.md
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/Server/mlperf.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/Server/user.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/SingleStream/tf-gpu_reference.json
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/SingleStream/README.md
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/SingleStream/mlperf.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/SingleStream/user.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/MultiStream/tf-gpu_reference.json
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/MultiStream/README.md
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/MultiStream/mlperf.conf
/tmp/mlperf-submission/closed/mlperf-org/measurements/tf-gpu/mobilenet/MultiStream/user.conf
/tmp/mlperf-submission/closed/mlperf-org/code/mobilenet/reference/VERSION.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Offline/accuracy/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Offline/accuracy/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Offline/accuracy/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Offline/accuracy/results.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Offline/accuracy/accuracy.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Offline/performance/run_1/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Offline/performance/run_1/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Offline/performance/run_1/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Offline/performance/run_1/results.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/accuracy/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/accuracy/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/accuracy/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/accuracy/results.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/accuracy/accuracy.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_1/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_1/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_1/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_1/results.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_2/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_2/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_2/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_2/results.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_3/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_3/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_3/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_3/results.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_4/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_4/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_4/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_4/results.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_5/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_5/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_5/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/Server/performance/run_5/results.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/SingleStream/accuracy/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/SingleStream/accuracy/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/SingleStream/accuracy/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/SingleStream/accuracy/results.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/SingleStream/accuracy/accuracy.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/SingleStream/performance/run_1/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/SingleStream/performance/run_1/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/SingleStream/performance/run_1/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/SingleStream/performance/run_1/results.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/MultiStream/accuracy/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/MultiStream/accuracy/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/MultiStream/accuracy/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/MultiStream/accuracy/results.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/MultiStream/accuracy/accuracy.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/MultiStream/performance/run_1/mlperf_log_accuracy.json
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/MultiStream/performance/run_1/mlperf_log_summary.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/MultiStream/performance/run_1/mlperf_log_detail.txt
/tmp/mlperf-submission/closed/mlperf-org/results/tf-gpu/mobilenet/MultiStream/performance/run_1/results.json

If we look at some files:


In [37]:
!echo "-- SingleStream Accuracy"; head {SUBMISSION_DIR}/results/tf-gpu/mobilenet/SingleStream/accuracy/accuracy.txt
!echo "\n-- SingleStream Summary"; head {SUBMISSION_DIR}/results/tf-gpu/mobilenet/SingleStream/performance/run_1/mlperf_log_summary.txt
!echo "\n-- Server Summary"; head {SUBMISSION_DIR}/results/tf-gpu/mobilenet/Server/performance/run_1/mlperf_log_summary.txt


-- SingleStream Accuracy
accuracy=71.676%, good=35838, total=50000

-- SingleStream Summary
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Single Stream
Mode     : Performance
90th percentile latency (ns) : 3405359
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Yes

-- Server Summary
================================================
MLPerf Results Summary
================================================
SUT name : PySUT
Scenario : Server
Mode     : Performance
Scheduled samples per second : 145.35
Result is : VALID
  Performance constraints satisfied : Yes
  Min duration satisfied : Yes

Run the submission checker

Finally, run the submission checker tool that does some sanity checking on your submission. We run it at the end and attach the output to the submission.


In [39]:
!python ../tools/submission/submission-checker.py --input {SUBMISSION_ROOT} > {SUBMISSION_DIR}/submission-checker.log 2>&1 
!cat {SUBMISSION_DIR}/submission-checker.log


WARNING:main:closed/mlperf-org/results/tf-gpu/mobilenet/Offline/performance/run_1/mlperf_log_detail.txt contains errors
INFO:main:closed/mlperf-org/results/tf-gpu/mobilenet/Offline OK
INFO:main:closed/mlperf-org/results/tf-gpu/mobilenet/Server OK
INFO:main:closed/mlperf-org/results/tf-gpu/mobilenet/SingleStream OK
INFO:main:closed/mlperf-org/results/tf-gpu/mobilenet/MultiStream OK
INFO:main:closed/mlperf-org/systems/tf-gpu.json OK
INFO:main:closed/mlperf-org/measurements/tf-gpu/mobilenet/Offline OK
INFO:main:closed/mlperf-org/measurements/tf-gpu/mobilenet/Server OK
INFO:main:closed/mlperf-org/measurements/tf-gpu/mobilenet/SingleStream OK
INFO:main:closed/mlperf-org/measurements/tf-gpu/mobilenet/MultiStream OK
INFO:main:SUMMARY: submission looks OK

In [ ]: