I would be happy to hear your comments and suggestions.
Please feel free to drop me a note via twitter, email, or google+.


Day 4 - One Python Benchmark per Day

(C)Python vs. Cython vs. Numba



Quick note about bubble sort

I don't want to get into the details about sorting algorithms here, but there is a great report
"Sorting in the Presence of Branch Prediction and Caches - Fast Sorting on Modern Computers" written by Paul Biggar and David Gregg, where they describe and analyze elementary sorting algorithms in very nice detail (see chapter 4).

And for a quick reference, this website has a nice animation of this algorithm.

A long story short: The "worst-case" complexity of the bubble sort algorithm (i.e., "Big-O")
$\Rightarrow \pmb O(n^2)$



Bubble sort implemented in (C)Python


In [1]:
def python_bubblesort(a_list):
    """ Plain Python implementation of bubble sort (sorts in place) """
    count = len(a_list)
    for i in range(count):
        for j in range(1, count):
            if a_list[j] < a_list[j-1]:
                a_list[j-1], a_list[j] = a_list[j], a_list[j-1]
    return a_list

In [2]:
import random
random.seed(12345)

l = [random.randint(1,100) for num in range(1, 100)]
assert(sorted(l) == python_bubblesort(l))
print('bubble sort works correctly')


bubble sort works correctly



Bubble sort implemented in Cython

Maybe we can speed things up a little bit via Cython's C-extensions for Python. Cython is basically a hybrid between C and Python and can be pictured as compiled Python code with type declarations.
Since we are working in an IPython notebook here, we can make use of the very convenient IPython magic: It will take care of the conversion to C code, the compilation, and eventually the loading of the function.

If you want to know how to use Cython without the IPython magic, I have a short outline of the procedure here.

Note that the static type declarations that we add via cdef are note required for Cython to work, but it will speed things up tremendously.


In [3]:
%load_ext cythonmagic

In [4]:
%%cython
cimport cython
cpdef cython_bubblesort_notypes(a_list):
    """ 
    The Python implementation of bubble sort compiled
    via Cython without C type definitions.
    
    """
    count = len(a_list)
    for i in range(count):
        for j in range(1, count):
            if a_list[j] < a_list[j-1]:
                a_list[j-1], a_list[j] = a_list[j], a_list[j-1]
    return a_list

In [5]:
%%cython
cimport cython
cpdef cython_bubblesort_ctypes(a_list):
    """ 
    Cython implementation of bubble sort with static
    type declarations.
    
    """
    cdef int count, i, j # static type declarations
    count = len(a_list)
    for i in range(count):
        for j in range(1, count):
            if a_list[j] < a_list[j-1]:
                a_list[j-1], a_list[j] = a_list[j], a_list[j-1]
    return a_list

In [6]:
%%cython
cimport cython
from libc.stdlib cimport malloc, free

cpdef cython_bubblesort_clist(a_list):
    """ 
    The Cython implementation of bubble sort with internal
    conversion between Python list objects and C arrays.
    
    """
    cdef int *c_list
    c_list = <int *>malloc(len(a_list)*cython.sizeof(int))
    cdef int count, i, j # static type declarations
    count = len(a_list)
    
    # convert Python list to C array
    for i in range(count):
        c_list[i] = a_list[i]

    for i in range(count):
        for j in range(1, count):
            if c_list[j] < c_list[j-1]:
                c_list[j-1], c_list[j] = c_list[j], c_list[j-1]

    # convert C array back to Python list
    for i in range(count):
        a_list[i] = c_list[i]
        
    free(c_list)
    return a_list

In [7]:
%%cython
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False) 
@cython.wraparound(False)
cpdef cython_bubblesort_numpy(inp_ary):
    """ The Cython implementation of Bubblesort with NumPy memoryview."""
    cdef unsigned long length, i, swapped, ele, temp
    cdef long[:] np_ary = inp_ary
    length = np_ary.shape[0]
    swapped = 1
    for i in range(0, length):
        if swapped: 
            swapped = 0
            for ele in range(0, length-i-1):
                if np_ary[ele] > np_ary[ele + 1]:
                    temp = np_ary[ele + 1]
                    np_ary[ele + 1] = np_ary[ele]
                    np_ary[ele] = temp
                    swapped = 1
    return inp_ary



Bubble sort implemented in Numba

Numba is using the LLVM compiler infrastructure for compiling Python code to machine code. Its strength is to work with NumPy arrays to speed-up the code. If you want to read more about Numba, please see refer to the original website and documentation.


In [8]:
from numba import jit
@jit
def numba_bubblesort(a_list):
    """ Numba implementation of bubble sort (sorts in place) """
    count = len(a_list)
    for i in range(count):
        for j in range(1, count):
            if a_list[j] < a_list[j-1]:
                a_list[j-1], a_list[j] = a_list[j], a_list[j-1]
    return a_list

In [9]:
from numba import jit
@jit
def numba_bubblesort_numpy(np_ary):
    """ 
    Numba implementation of bubble sort (sorts in place),
    It is identical to numba_bubble_sort(), but we will feed
    a numpy array type to this function.
    
    """
    count = np_ary.shape[0]
    for i in range(count):
        for j in range(1, count):
            if np_ary[j] < np_ary[j-1]:
                np_ary[j-1], np_ary[j] = np_ary[j], np_ary[j-1]
    return np_ary



Verifying that all implementations work correctly


In [10]:
import random
import copy
import numpy as np
random.seed(12345)

l = np.asarray([random.randint(1,1000) for num in range(1, 1000)])
l_sorted = np.sort(l)
for f in [python_bubblesort, cython_bubblesort_ctypes, 
          cython_bubblesort_notypes, cython_bubblesort_clist, 
          cython_bubblesort_numpy, numba_bubblesort, 
          numba_bubblesort_numpy]:
    assert(l_sorted.all() == f(copy.copy(l)).all())
print('Bubblesort works correctly')


Bubblesort works correctly



Timing


In [11]:
import timeit
import random
import copy
import numpy as np
random.seed(4567)

def make_copy(l):
    return copy.deepcopy(l)

funcs = ['python_bubblesort', 'numba_bubblesort', 
         'cython_bubblesort_notypes', 'cython_bubblesort_ctypes', 
         'cython_bubblesort_clist', 'cython_bubblesort_numpy', 
         'numba_bubblesort_numpy'
         ]

orders_n = [10**n for n in range(1, 5)]
timings = {f:[] for f in funcs}


for n in orders_n:
    l = [np.random.randint(n) for num in range(n)]
    for f in funcs:
        l_copy = make_copy(l)
        if f in ['numba_bubblesort_numpy', 'cython_bubblesort_numpy']:
            l_copy = np.asarray(l_copy)
        timings[f].append(min(timeit.Timer('%s(l_copy)' %f, 
                      'from __main__ import %s, l_copy' %f)
                              .repeat(repeat=5, number=10)))



Preparing plots


In [12]:
import platform
import multiprocessing
from llvm import __version__ as llvm__version__
from numba import __version__ as numba__version__
from cython import __version__ as cython__version__

def print_sysinfo():
    
    print('\nPython version:', platform.python_version())
    print('compiler        :', platform.python_compiler())
    print('Cython version  :', cython__version__)
    print('NumPy version   :', np.__version__)
    print('Numba version   :', numba__version__)
    print('llvm version    :', llvm__version__)
    
    print('\nsystem   :', platform.system())
    print('release  :', platform.release())
    print('machine  :', platform.machine())
    print('processor:', platform.processor())
    print('CPU count  :', multiprocessing.cpu_count())
    print('interpreter:', platform.architecture()[0])
    print('\n\n')

In [13]:
%matplotlib inline

In [14]:
import matplotlib.pyplot as plt

def plot(timings, title, ranked_labels, labels, orders_n):
    plt.rcParams.update({'font.size': 12})

    fig = plt.figure(figsize=(11,10))
    for lb in ranked_labels:
        plt.plot(orders_n, timings[lb], alpha=0.5, label=labels[lb], 
                 marker='o', lw=3)
    plt.xlabel('sample size n (items in the list)')
    plt.ylabel('time per computation for 10 loops in seconds')
    plt.xlim([min(orders_n) / 10, max(orders_n)* 10])
    plt.legend(loc=2)
    plt.grid()
    plt.xscale('log')
    plt.yscale('log')
    plt.title(title)
    plt.show()

In [15]:
import prettytable

def summary_table(ranked_labels):
    fit_table = prettytable.PrettyTable(['n=%s' %orders_n[-1], 
                                         'bubblesort function' ,
                                         'time in sec.',
                                         'rel. performance gain'])
    fit_table.align['bubblesort function'] = 'l'
    for entry in ranked_labels:
        fit_table.add_row(['', labels[entry[1]], round(entry[0], 2), 
                           round(ranked_labels[0][0]/entry[0], 2)])
    print(fit_table)


In [16]:
title = 'Performance of bubblesort in Python, Cython, and Numba'

labels = {'python_bubblesort':'(C)Python - Python lists', 
          'numba_bubblesort': 'Numba - Python lists', 
          'cython_bubblesort_ctypes': 'Cython - with C types, Python lists',
          'cython_bubblesort_notypes': 'Cython - no type declarations, Python lists',
          'cython_bubblesort_clist': 'Cython - Python lists with to-C-array-conversion',
          'cython_bubblesort_numpy': 'Cython - NumPy arrays memoryview',
          'numba_bubblesort_numpy': 'Numba - NumPy arrays'
          }

ranked_by_time = sorted([(time[1][-1],time[0]) for time in timings.items()], reverse=True)



Results


In [17]:
print_sysinfo()
plot(timings, title, [l for t,l in ranked_by_time], labels, orders_n)
summary_table(ranked_by_time)


Python version: 3.4.1
compiler        : GCC 4.2.1 (Apple Inc. build 5577)
Cython version  : 0.20.1
NumPy version   : 1.8.1
Numba version   : 0.13.1
llvm version    : 0.12.4

system   : Darwin
release  : 13.1.0
machine  : x86_64
processor: i386
CPU count  : 2
interpreter: 64bit



+---------+--------------------------------------------------+--------------+-----------------------+
| n=10000 | bubblesort function                              | time in sec. | rel. performance gain |
+---------+--------------------------------------------------+--------------+-----------------------+
|         | (C)Python - Python lists                         |    258.22    |          1.0          |
|         | Numba - Python lists                             |    207.95    |          1.24         |
|         | Cython - no type declarations, Python lists      |    166.52    |          1.55         |
|         | Cython - with C types, Python lists              |     51.9     |          4.98         |
|         | Numba - NumPy arrays                             |     2.56     |         101.03        |
|         | Cython - Python lists with to-C-array-conversion |     0.91     |         283.61        |
|         | Cython - NumPy arrays memoryview                 |     0.0      |       1106177.69      |
+---------+--------------------------------------------------+--------------+-----------------------+

Note that the relative results also depend on what version of Python, Cython, and NumPy you are using. Also, the compiler choice for installing NumPy can account for differences in the results.