In [ ]:
def f(x):
return x**2
l = range(8)
s = map(f, l)
print 'input: ', l
print 'output:', s
In [ ]:
import time
def g(x):
time.sleep(1) # simulate heavy computations
return x**2
Unfortunately, the parallelization doesn't work in IPython notebooks. So you need to run the following code from it's own file:
import multiprocessing
if __name__ == '__main__':
# a list of 'problems' to solve in parallel
problems = range(8)
# starting a multiprocessing pool and measure time
pool = multiprocessing.Pool()
time_start = time.time()
result = pool.map(g, problems)
time_stop = time.time()
# print result
execution_time = time_stop - time_start
print 'executed %d problems in %d seconds' % (len(problems), execution_time)
A simple map
may seem like a quite limited approach to parallelization at first. However, every problem that can be re-formulated as a combination of a map and a subsequent reduce function can be parallelized easily. In fact, this scheme is known as MapReduce and used for large-scale parallelization on the cloud-computing clusters from Google, Amazon and co.
There are many other approaches and libraries to parallelization as well. For instance there is joblib parallel and IPython parallel. The latter is pretty powerful and even lets you execute your jobs on different machines via SSH.
Another thing that can make your code much faster is caching, especially when you run scientific experiments repeatedly.
Note: If a library (in this case
joblib
) is not installed, you can install it in the CIP cluster viaconda install joblib
from the command prompt. On your own system, libraries are usually installed viapip install <library_name>
.
In [ ]:
import joblib
import time
mem = joblib.Memory(cachedir='C:\Windows\Temp')
@mem.cache
def my_cached_function(x):
time.sleep(5)
return x**2
print my_cached_function(2)
print my_cached_function(2)
print my_cached_function(2)
print my_cached_function(2)