Parallel Distributed Processing with numba in Python3

  • njit is nopython jit
  • 8-node cluster, 4 cores per node

James Gaboardi


In [1]:
import numpy as np
import math
from numba import njit, vectorize
import multiprocessing as mp
cores = mp.cpu_count()
print('Using up to', cores, 'cores.')


Using up to 32 cores.

Create 2 numpy.array objects


In [2]:
array_dimensions = 15000
np.random.seed(352)
thing_1 = np.random.random((array_dimensions,
                            array_dimensions))
np.random.seed(850)
thing_2 = np.random.random((array_dimensions,
                            array_dimensions))

Define a function without using numpy or the njit decorator


In [3]:
def no_njit(a, b):
    return math.sin(a**2) * math.exp(b)

This will not work: TypeError: only length-1 arrays can be converted to Python scalars


In [4]:
%timeit -o no_njit(thing_1, thing_2)



TypeErrorTraceback (most recent call last)
<ipython-input-4-a6ffe2e3e7df> in <module>()
----> 1 get_ipython().magic('timeit -o no_njit(thing_1, thing_2)')

/opt/conda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in magic(self, arg_s)
   2161         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2162         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2163         return self.run_line_magic(magic_name, magic_arg_s)
   2164 
   2165     #-------------------------------------------------------------------------

/opt/conda/lib/python3.5/site-packages/IPython/core/interactiveshell.py in run_line_magic(self, magic_name, line)
   2082                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2083             with self.builtin_trap:
-> 2084                 result = fn(*args,**kwargs)
   2085             return result
   2086 

<decorator-gen-59> in timeit(self, line, cell)

/opt/conda/lib/python3.5/site-packages/IPython/core/magic.py in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/opt/conda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, line, cell)
   1039             number = 1
   1040             for _ in range(1, 10):
-> 1041                 time_number = timer.timeit(number)
   1042                 worst_tuning = max(worst_tuning, time_number / number)
   1043                 if time_number >= 0.2:

/opt/conda/lib/python3.5/site-packages/IPython/core/magics/execution.py in timeit(self, number)
    135         gc.disable()
    136         try:
--> 137             timing = self.inner(it, self.timer)
    138         finally:
    139             if gcold:

<magic-timeit> in inner(_it, _timer)

<ipython-input-3-50a3406b3f27> in no_njit(a, b)
      1 def no_njit(a, b):
----> 2     return math.sin(a**2) * math.exp(b)

TypeError: only length-1 arrays can be converted to Python scalars

So decorate the function with njit and change math to numpy for a 'faster' implementation.


In [5]:
@njit
def njit(a, b):
    return np.sin(a**2) * np.exp(b)

In [6]:
%timeit -o njit(thing_1, thing_2)


1 loop, best of 3: 10.9 s per loop
Out[6]:
<TimeitResult : 1 loop, best of 3: 10.9 s per loop>

In order to take advantage of all the cores at disposal, vectorize the no_njit function declare the basic type of variable and the two types of arguments being passed in to no_njit, the set the keyword argument target='parallel.


In [7]:
vect_no_njit = vectorize('float64(float64, float64)', target='parallel')(no_njit)

In this example we are utilizing 32 cores, so it is significantly faster the the njit function above.


In [8]:
%timeit -o vect_no_njit(thing_1, thing_2)


1 loop, best of 3: 658 ms per loop
Out[8]:
<TimeitResult : 1 loop, best of 3: 658 ms per loop>