Numba

Numba provides a Just In Time (JIT) compiler to generate optimized machine code to speed up computations in Python. It uses LLVM compiler infrastructure. One advantage of Numba over Cython is that there is no need to modify the Python code. Execution speed up is achieved using decorators, and if decorators are removed, we get back the pure Python code.


In [1]:
import numpy as np

def f(x, y):
    return x + y # A trivial example

print f(1, 2)      # int
print f(1.0, 2.0)  # float
print f(1j, 2)     # cmplx
print f([1, 2, 3], [4, 5, 6])  # list
print f(np.arange(1.0, 4.0), np.arange(4.0, 7.0))  # numpy.ndarray


3
3.0
(2+1j)
[1, 2, 3, 4, 5, 6]
[ 5.  7.  9.]

In [2]:
a = np.array(np.arange(0, 100.1, 0.1), dtype=float)
print a.shape, a.size

%timeit a + a
%timeit f(a, a)


(1001,) 1001
1000000 loops, best of 3: 1.14 µs per loop
1000000 loops, best of 3: 1.43 µs per loop

In [3]:
from numba import jit

@jit
def fast_f(x, y):
    return x + y # A trivial example

%timeit fast_f(a, a)


1 loops, best of 3: 3.1 µs per loop

In [5]:
from numba import f8, jit

def sum1d(x):
    s = 0.0
    for i in range(x.shape[0]):
        s += x[i]
    return s

@jit(f8(f8[:]))
def fast_sum1d(x):
    s = 0.0
    for i in range(x.shape[0]):
        s += x[i]
    return s

x = np.linspace(0, 100, 1001)

%timeit sum1d(x)
%timeit fast_sum1d(x)


10000 loops, best of 3: 177 µs per loop
1000000 loops, best of 3: 1.25 µs per loop

Let us compute the approximate value of $\pi$ using the following sum:

$$\pi \approx \sqrt{6 \sum_{k=1}^{n+1} \frac{1}{k^2}}$$

In [6]:
def approx_pi(n=10000):
    s = 0
    for k in range(1, n+1):
        s += 1.0 / k**2
    return (6 * s) ** 0.5

%timeit approx_pi(1000)
%timeit approx_pi(10000)

print(approx_pi(1000))
print(approx_pi(10000))


10000 loops, best of 3: 160 µs per loop
1000 loops, best of 3: 1.56 ms per loop
3.14063805621
3.14149716395

In [7]:
from numba.decorators import autojit

@autojit
def fast_approx_pi(n):
    s = 0
    for k in range(1, n+1):
        s += 1.0 / k**2
    return (6 * s) ** 0.5

n = 10000
%timeit approx_pi(n)
%timeit fast_approx_pi(n)


1000 loops, best of 3: 1.57 ms per loop
10000 loops, best of 3: 45.5 µs per loop

Vectorize Functions

Numba can vectorize a function so that it can handle


In [8]:
from numba import vectorize, float64

@jit(float64(float64, float64))
def f(x, y):
    return x + y

%timeit f(10, 20)
%timeit f(10.0, 20.0)

# ERROR - Will not work for arrays
a = np.array(np.arange(0, 100.1, 0.1), dtype=float)
%timeit f(a, a)


1000000 loops, best of 3: 387 ns per loop
1000000 loops, best of 3: 296 ns per loop
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-2a53adf9c2bd> in <module>()
     10 # ERROR - Will not work for arrays
     11 a = np.array(np.arange(0, 100.1, 0.1), dtype=float)
---> 12 get_ipython().magic(u'timeit f(a, a)')

/home/satish/miniconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in magic(self, arg_s)
   2203         magic_name, _, magic_arg_s = arg_s.partition(' ')
   2204         magic_name = magic_name.lstrip(prefilter.ESC_MAGIC)
-> 2205         return self.run_line_magic(magic_name, magic_arg_s)
   2206 
   2207     #-------------------------------------------------------------------------

/home/satish/miniconda/lib/python2.7/site-packages/IPython/core/interactiveshell.pyc in run_line_magic(self, magic_name, line)
   2124                 kwargs['local_ns'] = sys._getframe(stack_depth).f_locals
   2125             with self.builtin_trap:
-> 2126                 result = fn(*args,**kwargs)
   2127             return result
   2128 

/home/satish/miniconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)

/home/satish/miniconda/lib/python2.7/site-packages/IPython/core/magic.pyc in <lambda>(f, *a, **k)
    191     # but it's overkill for just that one bit of state.
    192     def magic_deco(arg):
--> 193         call = lambda f, *a, **k: f(*a, **k)
    194 
    195         if callable(arg):

/home/satish/miniconda/lib/python2.7/site-packages/IPython/core/magics/execution.pyc in timeit(self, line, cell)
   1011             number = 1
   1012             for _ in range(1, 10):
-> 1013                 if timer.timeit(number) >= 0.2:
   1014                     break
   1015                 number *= 10

/home/satish/miniconda/lib/python2.7/timeit.pyc in timeit(self, number)
    193         gc.disable()
    194         try:
--> 195             timing = self.inner(it, self.timer)
    196         finally:
    197             if gcold:

<magic-timeit> in inner(_it, _timer)

TypeError: No matching definition

In [9]:
@jit(float64[:](float64[:], float64[:]))
def vf(x, y):
    return x + y

%timeit vf(a, a)
%timeit vf(10, 20)


100000 loops, best of 3: 1.94 µs per loop
1000000 loops, best of 3: 314 ns per loop

In [10]:
@vectorize([float64(float64, float64)])
def vfast_f(x, y):
    return x + y # A trivial example

%timeit vfast_f(a, a)
%timeit vfast_f(10, 20)


1000000 loops, best of 3: 1.12 µs per loop
100000 loops, best of 3: 2.04 µs per loop

Other Projects

  • PyPy is a fast, compliant and alternate implementation of Python language (Python 2.7.x and Python 3.2.x). It is fast and uses less memory compared to CPython (Python implementation in C and the de facto standard). It is now mostly compatible with Python, but not all packages are available on PyPy. Work on implementing NumPy on PyPy is in progress and when completed, it will be a good alternative to Numba.
  • Pyston is based on LLVM and is in very early stage of development
  • Nuitka is a Python compiler for Python 2.7, 2.7, 3.2, 3.3 and 3.4 and attempts to translate Python to C++ in order to speed up execution

References