Calling SCS in Parallel

In this notebook, we set up a list of several SCS problems and map scs.solve over that list to solve each of the problems.

  • Our first attempt uses Python's builtin map function, which operates in serial, solving one problem at a time.
  • The second attempt uses concurrent.futures.ProcessPoolExecutor to solve the problems in parallel, using separate Python processes.
  • The final attempt uses concurrent.futures.ThreadPoolExecutor to solve in parallel, using separate threads.

When running arbitrary Python code, the ThreadPoolExecutor approach may suffer due to the Python GIL, which prevents multiple threads from executing at the same time. However, SCS is able to release the GIL when running its underlying C code, allowing it to achieve true parallelism and performance similar to ProcessPoolExecutor.

The ThreadPool approach may be preferable to ProcessPool because it doesn't require launching separate Python interpreters for each process, and does not need to serialize data to communicate it between processes.

This notebook uses the concurrent.futures library, which is new to Python 3.2, but has been backported to Python 2.5 and above through the futures libray on PyPi.

Generate data

We first generate a number of SCS problems.


In [1]:
import scs
from concurrent import futures

num_problems = 20
m = 1000 # size of L1 problem

data = [scs.examples.l1(m, seed=i) for i in range(num_problems)]

We define a solve function to map over our problem data. We set verbose=False because verbose printing can hinder performance, because the GIL needs to be reacquired for each print. We define a function instead of a lambda because ProcessPoolExecutor can't serialize lambdas.

We set the number of workers to 4 in this example, which will set the number of threads or processes in the parallel examples. Setting the number of workers to be the number of processors on your system is a good first guess, but some experimentation may be required to find the optimal setting.


In [2]:
workers = 4 # number of threads/processes

def solve(x):
    return scs.solve(*x, verbose=False)

Serial solve with map

We observe the solvetime in serial, using the builtin Python map function.


In [3]:
%%time
a = list(map(solve, data))


CPU times: user 36.3 s, sys: 201 ms, total: 36.5 s
Wall time: 36.6 s

Parallel solve with processes

We observe the parallel solvetime, using ProcessPoolExecutor.map().


In [4]:
%%time
with futures.ProcessPoolExecutor(workers) as ex:
    a = list(ex.map(solve, data))


CPU times: user 128 ms, sys: 154 ms, total: 282 ms
Wall time: 23.4 s

Parallel solve with threads

We observe the parallel solvetime, using ThreadPoolExecutor.map().

We achieve similar performance to the processes example because SCS releases the GIL when calling its underlying C solver code. Threads can be more lightweight than processes because they do not need to launch separate Python interpreters, and do not need to serialize data to communicate between processes. However, in this case, it doesn't seem to help much.


In [5]:
%%time
with futures.ThreadPoolExecutor(workers) as ex:
    a = list(ex.map(solve, data))


CPU times: user 1min 25s, sys: 538 ms, total: 1min 25s
Wall time: 22.6 s

SCS Workspace in parallel

We can also form scs.Workspace objects in parallel, and use them to solve problems in parallel.

Below, we define two functions to form and solve with Workspace, which we'll use in our serial and parallel maps.


In [6]:
def form_workspace(x):
    return scs.Workspace(*x, verbose=False)

def workspace_solve(work):
    return work.solve()

Initialize Workspace

We can compare the intialization time (which involves a matrix factorization) for the Workspace objects when we perform it in serial and in parallel.


In [7]:
%%time
workspaces = list(map(form_workspace, data))


CPU times: user 9.36 s, sys: 169 ms, total: 9.53 s
Wall time: 9.54 s

In [8]:
%%time
with futures.ThreadPoolExecutor(workers) as ex:
    workspaces = list(ex.map(form_workspace, data))


CPU times: user 24.2 s, sys: 496 ms, total: 24.7 s
Wall time: 6.45 s

Workspace.solve()

We can also compare serial and parallel calls of workspace.solve().

Note that we can't use ProcessPoolExecutor here, because it can't serialize SCS Workspace objects. ThreadPoolExecutor eliminates the need for serialization.


In [9]:
%%time
results = list(map(workspace_solve, workspaces))


CPU times: user 26.5 s, sys: 87.4 ms, total: 26.6 s
Wall time: 26.9 s

In [10]:
%%time
with futures.ThreadPoolExecutor(workers) as ex:
    results = list(ex.map(workspace_solve, workspaces))


CPU times: user 1min, sys: 215 ms, total: 1min 1s
Wall time: 16.6 s