Parallelization

Another nice thing about Python is, how easily you can parallelize your code. Here comes one example for multiprocessing.

map

An often used function in Python is map. It mapps a given function to every item in a list.


In [ ]:
def f(x):
    return x**2

l = range(8)
s = map(f, l)

print 'input: ', l
print 'output:', s

parallel map

Such a map function can be parallelized over all cores of your machine very easily. This of course is most useful when the function to call needs heavy computations. Like this one:


In [ ]:
import time

def g(x):
    time.sleep(1) # simulate heavy computations
    return x**2

Unfortunately, the parallelization doesn't work in IPython notebooks. So you need to run the following code from it's own file:

import multiprocessing

if __name__ == '__main__':

    # a list of 'problems' to solve in parallel
    problems = range(8)

    # starting a multiprocessing pool and measure time
    pool = multiprocessing.Pool()
    time_start = time.time()
    result = pool.map(g, problems)
    time_stop = time.time()

    # print result
    execution_time = time_stop - time_start
    print 'executed %d problems in %d seconds' % (len(problems), execution_time)

MapReduce

A simple map may seem like a quite limited approach to parallelization at first. However, every problem that can be re-formulated as a combination of a map and a subsequent reduce function can be parallelized easily. In fact, this scheme is known as MapReduce and used for large-scale parallelization on the cloud-computing clusters from Google, Amazon and co.

Other approaches

There are many other approaches and libraries to parallelization as well. For instance there is joblib parallel and IPython parallel. The latter is pretty powerful and even lets you execute your jobs on different machines via SSH.

Caching

Another thing that can make your code much faster is caching, especially when you run scientific experiments repeatedly.

Note: If a library (in this case joblib) is not installed, you can install it in the CIP cluster via conda install joblib from the command prompt. On your own system, libraries are usually installed via pip install <library_name>.


In [ ]:
import joblib
import time

mem = joblib.Memory(cachedir='C:\Windows\Temp')

@mem.cache
def my_cached_function(x):
    time.sleep(5)
    return x**2

print my_cached_function(2)
print my_cached_function(2)
print my_cached_function(2)
print my_cached_function(2)