Various ways to do this in Python, we went with the in built mltiprocessing module
threading- passing blocks of code and memory to particular threads and managing memory/data I/O can be a mess as memory is shared in this case
multiprocessing- passing sub-processes or pseudo-code to a routine that passes/distributes the work for you and is local to the thread on which it gets processed
map: a versatile way to do multiprocessing, less to worry about. sub-code is distributed over avaialble threads and output is returned like you'd expect
process: defines a process or processes that you can iterate over but you have to take care of memory I/O and output issues
async: if you dont care about the order in which your sub-code is processed, use this. the process that gets done first gets returned first, so the order of input is not same as the order of the output and you must use get() to make sense of the output
In [16]:
import multiprocessing as mp
import time
import numpy as np
npoints = 5
input_arr = np.random.rand(npoints)
nproc=int(np.floor(npoints/2.))
def makedistance(i):
output_arr = i**i
return output_arr
t1 = time.time()
pool = mp.Pool(processes=nproc)
out = pool.map(makedistance, input_arr)
pool.close()
pool.join()
print out
t2 = time.time()
print t2 - t1
In [17]:
import multiprocessing as mp
from multiprocessing import Pool
import time
import numpy as np
npoints = 10
input_arr = np.arange(0,npoints)
nproc=int(np.floor(npoints/2.))
def makedistance(i):
output_arr = i*i
return output_arr
t1 = time.time()
p = Pool(processes=nproc)
results = p.apply_async(makedistance, args=(input_arr,))
results_out = results.get()
print results_out
t2 = time.time()
print t2 - t1
In [18]:
import multiprocessing as mp
from multiprocessing import Pool
import time
import numpy as np
npoints = 10
input_arr = np.arange(0,npoints)
nproc=int(np.floor(npoints/2.))
output = mp.Queue()
def makedistance(i,output):
output_arr = i*i
output.put(output_arr)
t1 = time.time()
procs = mp.Process(target=makedistance, args=(input_arr,output))
procs.start()
procs.join()
results = [output.get()]
print results
t2 = time.time()
print t2 - t1
in a sample of n haloes, compute the distance of all haloes from all other haloes
scipy vs Numba vs 'pure for loop'
scipy and Numbda are close and as expected, the pure for loop is excruciatingly slow
We went with scipy, just because.
Of course one also can do this in C or Cython.
https://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/
Below is the graph of how we could present their scaling statistics
In [20]:
from IPython.display import Image
Image('distances_720.png')
Out[20]:
Chunking is where we divide the operation into sub processes and pass those to be multiprocessed individually. this works great but as one can expect, in the long run with larger data sets, memory allocation will be a problem as even after division, there wont be enough memory to hold these processes and the code could crash
Here we let the array be sort of static in the memory and pass several workers that are essentially each enrty of the array to be worked upon one at a time. this could be slow but in the long run, better with memory allocation
In [3]:
from IPython.display import Image
Image(filename='time.png')
Out[3]:
In [4]:
from IPython.display import Image
Image(filename='memory.png')
Out[4]:
In [ ]: