Tune the Model Server

Open a Terminal through Jupyter Notebook

loadtest high

The params are as follows:

Notice the throughput and avg/min/max latencies:

summary ... =  400.2/s Avg:   249 Min:   230 Max:   286 Err:     0 (0.00%)

<ctrl-c> in the load test terminal

<ctrl-c> in the model server terminal

serve 9000 linear /root/models/linear/gpu/ false

The params are as follows:

loadtest high

Notice the throughput and avg/min/max latencies:

summary ... =  318.5/s Avg:   313 Min:   287 Max:   409 Err:     0 (0.00%)

watch -n 1 nvidia-smi

<ctrl-c> in the load test terminal

<ctrl-c> in the model server terminal

serve 9000 linear /root/models/linear/cpu/ true

The params are as follows:

loadtest high

Notice the throughput and avg/min/max latencies:

summary ... =  301.1/s Avg:   227 Min:     3 Max:   456 Err:     0 (0.00%)

<ctrl-c> in the load test terminal

<ctrl-c> in the model server terminal

serve 9000 linear /root/models/linear/gpu/ true

The params are as follows:

loadtest high

Notice the throughput and avg/min/max latencies:

summary ... =  260.1/s Avg:   264 Min:     5 Max:   508 Err:     0 (0.00%)