Parallel Distributed Processing with numba in Python3

  • njit is nopython jit
  • 8-node cluster, 4 cores per node

James Gaboardi

In [1]:
import numpy as np
import math
from numba import njit, vectorize
import multiprocessing as mp
cores = mp.cpu_count()
print('Using up to', cores, 'cores.')

Using up to 32 cores.

Create 2 numpy.array objects

In [2]:
array_dimensions = 15000
thing_1 = np.random.random((array_dimensions,
thing_2 = np.random.random((array_dimensions,

Define a function without using numpy or the njit decorator

In [3]:
def no_njit(a, b):
    return math.sin(a**2) * math.exp(b)

This will not work: TypeError: only length-1 arrays can be converted to Python scalars

In [4]:
%timeit -o no_njit(thing_1, thing_2)

TypeErrorTraceback (most recent call last)
<ipython-input-4-a6ffe2e3e7df> in <module>()
----> 1 get_ipython().magic('timeit -o no_njit(thing_1, thing_2)')

TypeError: only length-1 arrays can be converted to Python scalars

So decorate the function with njit and change math to numpy for a 'faster' implementation.

In [5]:
def njit(a, b):
    return np.sin(a**2) * np.exp(b)

In [6]:
%timeit -o njit(thing_1, thing_2)

1 loop, best of 3: 10.9 s per loop
<TimeitResult : 1 loop, best of 3: 10.9 s per loop>

In order to take advantage of all the cores at disposal, vectorize the no_njit function declare the basic type of variable and the two types of arguments being passed in to no_njit, the set the keyword argument target='parallel.

In [7]:
vect_no_njit = vectorize('float64(float64, float64)', target='parallel')(no_njit)

In this example we are utilizing 32 cores, so it is significantly faster the the njit function above.

In [8]:
%timeit -o vect_no_njit(thing_1, thing_2)

1 loop, best of 3: 658 ms per loop
<TimeitResult : 1 loop, best of 3: 658 ms per loop>