I would be happy to hear your comments and suggestions.
Please feel free to drop me a note via twitter, email, or google+.


Day 4.2 - One Python Benchmark per Day

Comparing (C)Python compilers - Cython vs. Numba vs. Parakeet on Bubblesort



I made some significant changes to the previous Day 4 benchmark, thus I decided to make it a separate article:

  • added the parakeet compiler
  • improved the Bubblesort algorithm to avoid comparing already-sorted pairs
  • improved the Cython implementation (avoiding redundant conversion from memory view to array object)
  • focussed on only the optimized ("best") Cython and Numba implementations of Bubblesort
  • used Python 2.7.6 instead of Python 3.4.0 (because of parakeet)



Quick note about Bubblesort

I don't want to get into the details about sorting algorithms here, but there is a great report
"Sorting in the Presence of Branch Prediction and Caches - Fast Sorting on Modern Computers" written by Paul Biggar and David Gregg, where they describe and analyze elementary sorting algorithms in very nice detail (see chapter 4).

And for a quick reference, this website has a nice animation of this algorithm.

A long story short: The "worst-case" complexity of the Bubblesort algorithm (i.e., "Big-O")
$\Rightarrow \pmb O(n^2)$



Bubble sort implemented in (C)Python


In [1]:
def python_bubblesort(a_list):
    """ Bubblesort in Python for list objects. """
    length = len(a_list)
    swapped = 1
    for i in xrange(0, length):
        if swapped: 
            swapped = 0
            for ele in xrange(0, length-i-1):
                if a_list[ele] > a_list[ele + 1]:
                    temp = a_list[ele + 1]
                    a_list[ele + 1] = a_list[ele]
                    a_list[ele] = temp
                    swapped = 1
    return a_list

In [2]:
def python_bubblesort_ary(np_ary):
    """ Bubblesort in Python for NumPy arrays. """
    length = np_ary.shape[0]
    swapped = 1
    for i in xrange(0, length):
        if swapped: 
            swapped = 0
            for ele in xrange(0, length-i-1):
                if np_ary[ele] > np_ary[ele + 1]:
                    temp = np_ary[ele + 1]
                    np_ary[ele + 1] = np_ary[ele]
                    np_ary[ele] = temp
                    swapped = 1
    return np_ary



Bubble sort implemented in Cython

Maybe we can speed things up a little bit via Cython's C-extensions for Python. Cython is basically a hybrid between C and Python and can be pictured as compiled Python code with type declarations.
Since we are working in an IPython notebook here, we can make use of the very convenient IPython magic: It will take care of the conversion to C code, the compilation, and eventually the loading of the function.

Note that the static type declarations that we add via cdef are note required for Cython to work, but it will speed things up tremendously.


In [3]:
%load_ext cythonmagic

In [4]:
%%cython
import numpy as np
cimport numpy as np
cimport cython
@cython.boundscheck(False) 
@cython.wraparound(False)
cpdef cython_bubblesort(inp_ary):
    """ The Cython implementation of Bubblesort with NumPy memoryview."""
    cdef unsigned long length, i, swapped, ele, temp
    cdef long[:] np_ary = inp_ary
    length = np_ary.shape[0]
    swapped = 1
    for i in xrange(0, length):
        if swapped: 
            swapped = 0
            for ele in xrange(0, length-i-1):
                if np_ary[ele] > np_ary[ele + 1]:
                    temp = np_ary[ele + 1]
                    np_ary[ele + 1] = np_ary[ele]
                    np_ary[ele] = temp
                    swapped = 1
    return inp_ary



Bubble sort implemented in Numba

Numba is using the LLVM compiler infrastructure for compiling Python code to machine code. Its strength is to work with NumPy arrays to speed-up the code. If you want to read more about Numba, please see refer to the original website and documentation.


In [5]:
from numba import jit as numba_jit
@numba_jit
def numba_bubblesort(np_ary):
    """ The NumPy implementation of Bubblesort on NumPy arrays."""
    length = np_ary.shape[0]
    swapped = 1
    for i in xrange(0, length):
        if swapped: 
            swapped = 0
            for ele in xrange(0, length-i-1):
                if np_ary[ele] > np_ary[ele + 1]:
                    temp = np_ary[ele + 1]
                    np_ary[ele + 1] = np_ary[ele]
                    np_ary[ele] = temp
                    swapped = 1
    return np_ary



Bubble sort implemented in parakeet

Similar to Numba, parakeet is a Python compiler that optimizes the runtime of numerical computations based on the NumPy data types, such as NumPy arrays.

The usage is also similar to Numba where we just have to put the jit decorator on top of the function we want to optimize.


In [6]:
from parakeet import jit as para_jit
@para_jit
def parakeet_bubblesort(np_ary):
    """ The parakeet implementation of Bubblesort on NumPy arrays."""
    length = np_ary.shape[0]
    swapped = 1
    for i in xrange(0, length):
        if swapped: 
            swapped = 0
            for ele in xrange(0, length-i-1):
                if np_ary[ele] > np_ary[ele + 1]:
                    temp = np_ary[ele + 1]
                    np_ary[ele + 1] = np_ary[ele]
                    np_ary[ele] = temp
                    swapped = 1
    return np_ary



Verifying that all implementations work correctly


In [7]:
import random
import copy
import numpy as np
random.seed(4354353)

l = np.asarray([random.randint(1,1000) for num in xrange(1, 1000)])
l_sorted = np.sort(l)
for f in [python_bubblesort, python_bubblesort_ary, cython_bubblesort, 
          numba_bubblesort, parakeet_bubblesort]:
    assert(l_sorted.all() == f(copy.copy(l)).all())
print('Bubblesort works correctly')


Bubblesort works correctly



Timing


In [8]:
import timeit
import copy
import numpy as np

funcs = ['python_bubblesort',
         'python_bubblesort_ary',
         'cython_bubblesort',
         'numba_bubblesort',
         'parakeet_bubblesort'
         ]

orders_n = [10**n for n in range(1, 6)]
timings = {f:[] for f in funcs}

for n in orders_n:
    l = [np.random.randint(n) for num in range(n)]
    for f in funcs:
        l_copy = copy.deepcopy(l)
        if f != 'python_bubblesort':
            l_copy = np.asarray(l_copy)
        timings[f].append(min(timeit.Timer('%s(l_copy)' %f, 
                      'from __main__ import %s, l_copy' %f)
                              .repeat(repeat=3, number=10)))



Setting up the plots


In [13]:
import platform
import multiprocessing
from cython import __version__ as cython__version__
from llvm import __version__ as llvm__version__
from numba import __version__ as numba__version__
from parakeet import __version__ as parakeet__version__

def print_sysinfo():
    
    print '\nPython version  :', platform.python_version()
    print 'compiler        :', platform.python_compiler()
    print 'Cython version  :', cython__version__
    print 'NumPy version   :', np.__version__
    print 'Numba version   :', numba__version__
    print 'llvm version    :', llvm__version__
    print 'parakeet version:', parakeet__version__
    
    print '\nsystem     :', platform.system()
    print 'release    :', platform.release()
    print 'machine    :', platform.machine()
    print 'processor  :', platform.processor()
    print 'CPU count  :', multiprocessing.cpu_count()
    print 'interpreter:', platform.architecture()[0]
    print '\n\n'

In [14]:
%matplotlib inline

In [15]:
import matplotlib.pyplot as plt

def plot(timings, title, ranked_labels, labels, orders_n):
    plt.rcParams.update({'font.size': 12})

    fig = plt.figure(figsize=(11,10))
    for lb in ranked_labels:
        plt.plot(orders_n, timings[lb], alpha=0.5, label=labels[lb], 
                 marker='o', lw=3)
    plt.xlabel('sample size n (items in the list)')
    plt.ylabel('time per computation in milliseconds')
    plt.xlim([min(orders_n) / 10, max(orders_n)* 10])
    plt.legend(loc=2)
    plt.grid()
    plt.xscale('log')
    plt.yscale('log')
    plt.title(title)
    plt.show()

In [16]:
import prettytable

def summary_table(ranked_labels):
    fit_table = prettytable.PrettyTable(['n=%s' %orders_n[-1], 
                                         'bubblesort function' ,
                                         'time in millisec.',
                                         'rel. performance gain'])
    fit_table.align['bubblesort function'] = 'l'
    for entry in ranked_labels:
        fit_table.add_row(['', labels[entry[1]], round(entry[0]*100, 3), 
                           round(ranked_labels[0][0]/entry[0], 2)])
        # times 100 for converting from seconds to milliseconds: (time*1000 / 10-loops)
    print(fit_table)



Results


In [17]:
title = 'Performance of Bubblesort in Python, Cython, parakeet, and Numba'

labels = {'python_bubblesort':'(C)Python Bubblesort - Python lists', 
          'python_bubblesort_ary':'(C)Python Bubblesort - NumPy arrays',  
          'cython_bubblesort': 'Cython Bubblesort - NumPy arrays',
          'numba_bubblesort': 'Numba Bubblesort - NumPy arrays',
          'parakeet_bubblesort': 'parakeet Bubblesort - NumPy arrays'
          }

ranked_by_time = sorted([(time[1][-1],time[0]) for time in timings.items()], reverse=True)

print_sysinfo()
plot(timings, title, [l for t,l in ranked_by_time], labels, orders_n)
summary_table(ranked_by_time)


Python version  : 2.7.6
compiler        : GCC 4.0.1 (Apple Inc. build 5493)
Cython version  : 0.20.1
NumPy version   : 1.8.0
Numba version   : 0.12.1
llvm version    : 0.12.3
parakeet version: 0.23.2

system     : Darwin
release    : 13.2.0
machine    : x86_64
processor  : i386
CPU count  : 4
interpreter: 64bit



+----------+-------------------------------------+-------------------+-----------------------+
| n=100000 | bubblesort function                 | time in millisec. | rel. performance gain |
+----------+-------------------------------------+-------------------+-----------------------+
|          | (C)Python Bubblesort - NumPy arrays |       42.019      |          1.0          |
|          | (C)Python Bubblesort - Python lists |       13.26       |          3.17         |
|          | Numba Bubblesort - NumPy arrays     |       0.307       |         136.69        |
|          | parakeet Bubblesort - NumPy arrays  |       0.277       |         151.63        |
|          | Cython Bubblesort - NumPy arrays    |       0.141       |         297.2         |
+----------+-------------------------------------+-------------------+-----------------------+

Note that the relative results also depend on what version of Python, Cython, Numba, parakeet, and NumPy you are using. Also, the compiler choice for installing NumPy can account for differences in the results.