A short tutorial for parallel programming in Python

Parallel hardware: nodes and cores

Nowadays, the hardware is increasingly parallel:

  • Computers contain several cores, which can perform computation independently.
  • Large computer clusters contain several computers (often thousands), a.k.a nodes, connected by a fast network.

Using several cores and/or several nodes can often:

  • Speed up the execution of time-consuming computations
  • Allow to work on data that is too large to fit in the RAM of one computer

On the software side, this requires however to share the computational work among processes or threads.

Parallel execution: processes and threads

Processes and threads are independent execution of code.
Each process or thread can be performed by a separate core, and can thus run in parallel.
The difference between them is that:

  • Threads can simulteaneously access and modify the same memory space in RAM.
  • Processes can only access and modify their own, exclusive memory space.

Thus, programming with threads is more flexible (e.g. two threads can simultaneously work on one same array) but also more more dangerous (e.g. two threads can simultaneously try to modify one given number in memory - in this case the result is undetermined).

For this reason, Python forbids threads to simultaneously execute Python code (this is known as the Global Interpreter Lock, or GIL). There are however exceptions: I/O operations and numpy number crunching release the GIL, and can thus be executed simultaneously.

Packages for parallel programming in Python

There are many Python packages that handle parallel execution on processes or threads, both for general purpose (e.g. celery, ipyparallel, multiprocessing, ...) and more specialized purpose (e.g. dask, tensorflow, some numpy functions).
Here we will introduce two very different, and complementary, general-purpose packages: mpi4py and concurrent.futures.

mpi4py concurrent.futures
Can run on several nodes Runs only on one node
Only handles processes Can handle processes and threads
No interactivity Some interactivity (e.g. integrates in Jupyter notebook)
Allows elaborate communications between processes No communication between processes or threads

Outline of this tutorial

To illustrate parallel programming, we will look at the problem of classification of hand-written digits, and use three different methods