I always enjoy showing people how much easier Numba makes it to speed up their NumPy-based technical codes. With Numba, you usually can just write the code with loops and then add a decorator to your function and get speed-ups equivalent to having written the code in another compiled language (like C or Fortran).

Tonight when I saw this question on Stack Exchange: http://scicomp.stackexchange.com/questions/5473/how-to-express-this-complicated-expression-using-numpy-slices it looked like a perfect opportunity to test Numba again.

So, I copied the looped_ver code from Nat Wilson (modified it slightly to make x[0] = 0) and then decorated it to let Numba compile the code. The result continues to impress me about the code that Mark Florisson, Jon Riehl, and Siu Kwan Lam have put together. Here is the equation that is being solved:

$$\displaystyle x_i = \sum_{j=0}^{i-1} k_{i-j} a_{i-j} a_{j}$$

In [1]:
import numpy as np
from numba import jit, autojit

def looped_ver(k, a):
    x = np.empty_like(a)
    x[0] = 0.0
    for i in range(1, x.size):
        sm = 0.0
        for j in range(0, i):
            sm += k[i-j,j] * a[i-j] * a[j]
        x[i] = sm
    return x

typed_ver = jit('f8[:](f8[:,:],f8[:])')(looped_ver)
auto_ver = autojit(looped_ver)

In [2]:
import time
import numpy as np
repeat = 3

def getbest(func, *args):
    import time
    best = 1e12
    for i in range(repeat):
        start = time.time()
        func(*args)
        current = time.time() - start
        if current < best:
            best = current
    return best
    

def timeit(N):
    res = {'looped':[], 'auto':[], 'typed':[]}
    for n in N:
        k = np.random.rand(n,n)
        a = np.random.rand(n)
        for version in ['looped', 'auto', 'typed']:
            func = eval('%s_ver' % version)
            res[version].append(getbest(func, k, a))
    return res

In [3]:
N = [100,200,500,1000,5000]
res = timeit(N)

In [4]:
%pylab inline


Welcome to pylab, a matplotlib-based Python environment [backend: module://IPython.zmq.pylab.backend_inline].
For more information, type 'help(pylab)'.

In [5]:
plot(N, log10(res['typed']), N, log10(res['auto']), N, log10(res['looped']))
legend(['Typed', 'Autojit', 'Python'], loc='upper left')
ylabel(r'$\log_{10}$(time) in seconds')
xlabel('Size (N)')


Out[5]:
<matplotlib.text.Text at 0x1099c0bd0>

In [6]:
[res['looped'][i]/res['auto'][i] for i in range(len(N))]


Out[6]:
[1127.4761904761904,
 1435.7611940298507,
 1317.400260756193,
 256.7791019225705,
 137.61178578348566]

In [10]:
from numba import _version
print _version.version_version


0.7.0

This was run on a Macbook Air. Running sysctl -n machdep.cpu.brand_string resulted in:

Intel(R) Core(TM) i7-2677M CPU @ 1.80GHz