In this notebook, we set up a list of several SCS problems and map scs.solve
over that list
to solve each of the problems.
map
function, which operates in serial, solving one problem at a time.concurrent.futures.ProcessPoolExecutor
to solve the problems in parallel, using separate Python processes.concurrent.futures.ThreadPoolExecutor
to solve in parallel, using separate threads.When running arbitrary Python code, the ThreadPoolExecutor
approach may suffer due to the Python GIL, which prevents multiple threads from executing at the same time. However, SCS is able to release the GIL when running its underlying C code, allowing it to achieve true parallelism and performance similar to ProcessPoolExecutor
.
The ThreadPool approach may be preferable to ProcessPool because it doesn't require launching separate Python interpreters for each process, and does not need to serialize data to communicate it between processes.
This notebook uses the concurrent.futures
library, which is new to Python 3.2, but has been backported to Python 2.5 and above through the futures
libray on PyPi.
In [1]:
import scs
from concurrent import futures
num_problems = 20
m = 1000 # size of L1 problem
data = [scs.examples.l1(m, seed=i) for i in range(num_problems)]
We define a solve
function to map over our problem data.
We set verbose=False
because verbose printing can hinder performance, because the GIL needs to be reacquired for each print.
We define a function instead of a lambda because ProcessPoolExecutor
can't serialize lambdas.
We set the number of workers to 4 in this example, which will set the number of threads or processes in the parallel examples. Setting the number of workers to be the number of processors on your system is a good first guess, but some experimentation may be required to find the optimal setting.
In [2]:
workers = 4 # number of threads/processes
def solve(x):
return scs.solve(*x, verbose=False)
In [3]:
%%time
a = list(map(solve, data))
In [4]:
%%time
with futures.ProcessPoolExecutor(workers) as ex:
a = list(ex.map(solve, data))
We observe the parallel solvetime, using ThreadPoolExecutor.map()
.
We achieve similar performance to the processes example because SCS releases the GIL when calling its underlying C solver code. Threads can be more lightweight than processes because they do not need to launch separate Python interpreters, and do not need to serialize data to communicate between processes. However, in this case, it doesn't seem to help much.
In [5]:
%%time
with futures.ThreadPoolExecutor(workers) as ex:
a = list(ex.map(solve, data))
In [6]:
def form_workspace(x):
return scs.Workspace(*x, verbose=False)
def workspace_solve(work):
return work.solve()
In [7]:
%%time
workspaces = list(map(form_workspace, data))
In [8]:
%%time
with futures.ThreadPoolExecutor(workers) as ex:
workspaces = list(ex.map(form_workspace, data))
In [9]:
%%time
results = list(map(workspace_solve, workspaces))
In [10]:
%%time
with futures.ThreadPoolExecutor(workers) as ex:
results = list(ex.map(workspace_solve, workspaces))