Tune the Model Server

Open a Terminal through Jupyter Notebook

(Menu Bar -> Terminal -> New Terminal)

Load Test CPU Model

Start Load Test in the Terminal

loadtest high

The params are as follows:

  • 1: amount of load low|medium|high

Notice the throughput and avg/min/max latencies:

summary ... =  400.2/s Avg:   249 Min:   230 Max:   286 Err:     0 (0.00%)

Load Test GPU Model

Restart Model Server with GPU Model

<ctrl-c> in the load test terminal

<ctrl-c> in the model server terminal

serve 9000 linear /root/models/linear/gpu/ false

The params are as follows:

  • 1: port number (int)

  • 2: model_name (anything)

  • 3: /path/to/model (base path above all version sub-directories)

  • 4: request batching (true|false)

Start Load Test in the Terminal

loadtest high

Notice the throughput and avg/min/max latencies:

summary ... =  318.5/s Avg:   313 Min:   287 Max:   409 Err:     0 (0.00%)

Watch GPU During Load Test

watch -n 1 nvidia-smi

Load Test CPU Model + Request Batching

Restart Model Server with GPU Model

<ctrl-c> in the load test terminal

<ctrl-c> in the model server terminal

serve 9000 linear /root/models/linear/cpu/ true

The params are as follows:

  • 1: port number (int)

  • 2: model_name (anything)

  • 3: /path/to/model (base path above all version sub-directories)

  • 4: request batching (true|false)

Start Load Test in the Terminal

loadtest high

Notice the throughput and avg/min/max latencies:

summary ... =  301.1/s Avg:   227 Min:     3 Max:   456 Err:     0 (0.00%)

Load Test GPU Model + Request Batching

Restart Model Server with GPU Model

<ctrl-c> in the load test terminal

<ctrl-c> in the model server terminal

serve 9000 linear /root/models/linear/gpu/ true

The params are as follows:

  • 1: port number (int)

  • 2: model_name (anything)

  • 3: /path/to/model (base path above all version sub-directories)

  • 4: request batching (true|false)

Start Load Test in the Terminal

loadtest high

Notice the throughput and avg/min/max latencies:

summary ... =  260.1/s Avg:   264 Min:     5 Max:   508 Err:     0 (0.00%)