This notebook was put together by [Jake Vanderplas]( for UW's [Astro 599]( course. Source and license info is on [GitHub](

Extending Python with Compiled Code

In a previous session, we talked about ways to get performance out of Python using vectorization strategies within NumPy. Today we're going to talk about approaches which involve interfacing Python to external compiled code, or converting Python code to compilable code. There are several different approaches. We'll give examples of solving a problem by several means:

  • Ctypes: the C-linking utilities included in the Python standard library
  • F2Py: the Fortran to Python utility which is bundled with NumPy
  • Cython: a module that allows conversion of Python and Python-like code to C
  • Numba: a Python to LLVM bytecode converter

There is no way to go into depth on all of these, but I mainly just want to show some examples so that you know the types of problems they solve. At the above links, you'll find more in-depth documentation and tutorials.

There are other options we won't talk about here, because they're not as useful for our purposes:

  • Python C API: Python is implemented in C, and you can actually write C code which will compile to a Python module! This is extremely low-level, and I would only recommend it if you enjoy pain (see Dan Foreman-Mackie's blog post for a concise-as-possible introduction)
  • SWIG: the Simplified Wrapper Interface Generator, can generate wrappers from a variety of low-level languages to a variety of high-level languages. It's used heavily by LSST and other projects.
  • Weave: included in SciPy, weave is a method of putting C snippets within a Python program. It's largely been superseded by Cython in practice.
  • PyPy: a JIT compiler for Python written in Python. PyPy doesn't support the C backend which most scientific tools rely on, so it's not extremely useful for scientific computing.

The optimal approach depends on the desired use-case. We'll go through a few examples, from lowest to highest level:

Why Extend Python?

There are a few reasons you might wish to use the tools outlined below:

  1. Sometimes vectorization of algorithms can lead to excessive memory usage, because it relies on temporary arrays to hold intermediate results. Directly implementing the routine in compiled code can lead to more efficient computation.

  2. Sometimes problems cannot be implemented as vectorized NumPy operations. An example is in tree-based algorithms. This is why I used cython when I wrote the k-neighbors and kernel density estimation code within sklearn.neighbors.

  3. Sometimes there are legacy code-bases that you don't want to have to re-implement: you'd much rather just write some sort of bindings to call the old code from within Python.

We'll do a brief demonstration of some of these tools. These tools do have a bit of a learning curve, so I'll provide references for where you might look to go deeper. The purpose of this lecture is to get a taste of some of the better options for addressing these problems.


Ctypes is part of the standard library, and offers tools to call routines from compiled libraries with Python. These can either be routines in your system libraries, or in shared libraries you compile yourself. We'll see examples of both.

One weakness of ctypes is that it can be very platform-specific. For example, shared-library extensions are different from platform to platform (e.g. *.so on Linux, *.dll on Windows, *.dylib on OSX). Other platform-by-platform issues like 32/64 bit can also make things difficult.

Interfacing to System Libraries

In Linux & Mac, you'll usually find the standard libraries here (remember that ! lets us execute shell commands within IPython):

In [1]:
!ls /usr/lib/

arc                               libmecabra.dylib
arch_tool                         libmenu.5.4.dylib
bundle1.o                         libmenu.dylib
c++                               libmx.A.dylib
charset.alias                     libmx.dylib
clang                             libncurses.5.4.dylib
cron                              libncurses.5.dylib
crt1.10.5.o                       libncurses.dylib
crt1.10.6.o                       libneon.27.dylib
crt1.o                            libneon.dylib
dtrace                            libnetsnmp.15.1.2.dylib
dyld                              libnetsnmp.15.dylib
dylib1.10.5.o                     libnetsnmp.25.dylib
dylib1.o                          libnetsnmp.5.2.1.dylib
freeradius                        libnetsnmp.5.dylib
gcrt1.o                           libnetsnmp.dylib
groff                             libnetsnmpagent.25.dylib
lazydylib1.o                      libnetsnmpagent.dylib
libACSClient.dylib                libnetsnmphelpers.25.dylib
libBSDPClient.A.dylib             libnetsnmphelpers.dylib
libBSDPClient.dylib               libnetsnmpmibs.25.dylib
libCRFSuite.dylib                 libnetsnmpmibs.dylib
libCRFSuite0.12.dylib             libnetsnmptrapd.25.dylib
libCoreStorage.dylib              libnetsnmptrapd.dylib
libDHCPServer.A.dylib             libobjc.A.dylib
libDHCPServer.dylib               libobjc.dylib
libDiagnosticMessagesClient.dylib libodbc.a
libIOKit.A.dylib                  libodfde.dylib
libIOKit.dylib                    libpam.1.dylib
libLTO.dylib                      libpam.2.dylib
libMatch.1.dylib                  libpam.dylib
libMatch.dylib                    libpanel.5.4.dylib
libOpenScriptingUtil.dylib        libpanel.dylib
libScreenReader.dylib             libpcap.A.dylib
libSystem.B.dylib                 libpcap.dylib
libSystem.B_debug.dylib           libpcre.0.dylib
libSystem.dylib                   libpcre.dylib
libSystem_debug.dylib             libpcreposix.0.dylib
libUniversalAccess.dylib          libpcreposix.dylib
libXplugin.1.dylib                libpgtypes.3.2.dylib
libXplugin.dylib                  libpgtypes.3.dylib
libalias.A.dylib                  libpgtypes.dylib
libalias.dylib                    libpoll.dylib
libapr-              libpq.5.4.dylib
libapr-1.0.dylib                  libpq.5.dylib
libapr-1.dylib                    libpq.dylib
libaprutil-         libproc.dylib
libaprutil-1.0.dylib              libprofile_rt.a
libaprutil-1.dylib                libprofile_rt.dylib
libarchive.2.dylib                libpthread.dylib
libarchive.dylib                  libpython.dylib
libatlas.dylib                    libpython2.5.dylib
libauditd.0.dylib                 libpython2.6.dylib
libauditd.dylib                   libpython2.7.dylib
libauto.dylib                     libquit.dylib
libblas.dylib                     libreadline.dylib
libbsm.0.dylib                    libresolv.9.dylib
libbsm.dylib                      libresolv.dylib
libbz2.1.0.5.dylib                librpcsvc.dylib
libbz2.1.0.dylib                  libruby.1.dylib
libbz2.dylib                      libruby.dylib
libc++.1.dylib                    libsandbox.1.dylib
libc++.dylib                      libsandbox.dylib
libc++abi.dylib                   libsasl2.2.0.1.dylib
libc.dylib                        libsasl2.2.0.15.dylib
libcblas.dylib                    libsasl2.2.0.21.dylib
libcharset.1.0.0.dylib            libsasl2.2.0.22.dylib
libcharset.1.dylib                libsasl2.2.dylib
libcharset.dylib                  libsasl2.dylib
libclang.dylib                    libsqlite3.0.dylib
libclapack.dylib                  libsqlite3.dylib
libcom_err.dylib                  libssl.0.9.7.dylib
libcpp_kext.a                     libssl.0.9.8.dylib
libcrypto.0.9.7.dylib             libssl.dylib
libcrypto.0.9.8.dylib             libstdc++.6.0.9.dylib
libcrypto.dylib                   libstdc++.6.dylib
libcsfde.dylib                    libstdc++.dylib
libcups.2.dylib                   libsvn_client-
libcups.dylib                     libsvn_client-1.0.dylib
libcupscgi.1.dylib                libsvn_client-1.dylib
libcupscgi.dylib                  libsvn_delta-
libcupsimage.2.dylib              libsvn_delta-1.0.dylib
libcupsimage.dylib                libsvn_delta-1.dylib
libcupsmime.1.dylib               libsvn_diff-
libcupsmime.dylib                 libsvn_diff-1.0.dylib
libcupsppdc.1.dylib               libsvn_diff-1.dylib
libcupsppdc.dylib                 libsvn_fs-
libcurl.3.dylib                   libsvn_fs-1.0.dylib
libcurl.4.dylib                   libsvn_fs-1.dylib
libcurl.dylib                     libsvn_fs_fs-
libcurses.dylib                   libsvn_fs_fs-1.0.dylib
libdbm.dylib                      libsvn_fs_fs-1.dylib
libdes425.dylib                   libsvn_fs_util-
libdl.dylib                       libsvn_fs_util-1.0.dylib
libdtrace.dylib                   libsvn_fs_util-1.dylib

libecpg.6.dylib                   libsvn_ra-1.0.dylib
libecpg.dylib                     libsvn_ra-1.dylib
libecpg_compat.3.3.dylib          libsvn_ra_local-
libecpg_compat.3.dylib            libsvn_ra_local-1.0.dylib
libecpg_compat.dylib              libsvn_ra_local-1.dylib
libedit.2.dylib                   libsvn_ra_neon-
libedit.3.0.dylib                 libsvn_ra_neon-1.0.dylib
libedit.3.dylib                   libsvn_ra_neon-1.dylib
libedit.dylib                     libsvn_ra_svn-
libexpat.1.5.2.dylib              libsvn_ra_svn-1.0.dylib
libexpat.1.dylib                  libsvn_ra_svn-1.dylib
libexpat.dylib                    libsvn_repos-
libexslt.0.dylib                  libsvn_repos-1.0.dylib
libexslt.dylib                    libsvn_repos-1.dylib
libf77lapack.dylib                libsvn_subr-
libffi.dylib                      libsvn_subr-1.0.dylib
libform.5.4.dylib                 libsvn_subr-1.dylib
libform.dylib                     libsvn_wc-
libgcc_s.1.dylib                  libsvn_wc-1.0.dylib
libgcc_s.10.4.dylib               libsvn_wc-1.dylib
libgcc_s.10.5.dylib               libsysmon.dylib
libgermantok.dylib                libtcl.dylib
libgmalloc.B.dylib                libtcl8.5.dylib
libgmalloc.dylib                  libtclstub8.5.a
libgssapi_krb5.dylib              libtermcap.dylib
libhunspell-       libtidy.A.dylib
libhunspell-1.2.0.dylib           libtidy.dylib
libhunspell-1.2.dylib             libtk.dylib
libiconv.2.4.0.dylib              libtk8.5.dylib
libiconv.2.dylib                  libtkstub8.5.a
libiconv.dylib                    libutil.dylib
libicucore.A.dylib                libutil1.0.dylib
libicucore.dylib                  libxar.1.dylib
libinfo.dylib                     libxar.dylib
libiodbc.2.1.18.dylib             libxml2.2.dylib
libiodbc.2.dylib                  libxml2.dylib
libiodbc.a                        libxsanmgrcommon.dylib
libiodbc.dylib                    libxslt.1.dylib
libiodbcinst.2.1.18.dylib         libxslt.dylib
libiodbcinst.2.dylib              liby.a
libiodbcinst.a                    libz.1.1.3.dylib
libiodbcinst.dylib                libz.1.2.5.dylib
libipconfig.dylib                 libz.1.dylib
libipsec.A.dylib                  libz.dylib
libipsec.dylib                    pam
libk5crypto.dylib                 php
libkmod.a                         pkgconfig
libkmodc++.a                      postgresql
libkrb4.dylib                     python2.5
libkrb5.dylib                     python2.6
libkrb524.dylib                   python2.7
libkrb5support.dylib              rpcsvc
libl.a                            ruby
liblangid.dylib                   sa
liblapack.dylib                   sasl2
liblber.dylib                     sqlite3
libldap.dylib                     system
libmecab.1.0.0.dylib              zsh

CTypes lets you link to any of these libraries and call the functions directly:

In [2]:
from ctypes import CDLL

libc_name = 'libc.dylib'  # OSX
# libc_name = ''  # Linux
# libc_name = 'libc.dll'  # Windows

libc = CDLL(libc_name)

In [3]:

<_FuncPtr object at 0x101f3d940>

In [ ]:
print "Seconds since January 1, 1970:"
print libc.time()

Wrapping your own functions

You can do more than simply wrapping the functionality of system libraries: you can also create your own C functions and wrap them with CTypes (this will require having a C compiler installed):

In [1]:
%%file my_sum.c
#include <stdio.h>

// sum all the values in the array x
// x is a pointer to a memory block 
// of length n
int sum(int *x, int n)
  int i, counter;
  counter = 0;
  for(i=0; i<n; i++)
      counter += x[i];
  return counter;

In [2]:
gcc -c my_sum.c
gcc -shared -o my_sum.o

In [3]:
!ls my_sum.*

my_sum.c  my_sum.o

Now we'll use the CTypes API to create Python objects that can be passed to this function:

In [4]:
from ctypes import CDLL, c_void_p
import numpy as np


a = np.arange(10, dtype=np.int32)
adata = a.ctypes.data_as(c_void_p)
asize = a.size

print my_sum.sum(adata, asize)
print a.sum()


User-defined Types

Here we've used just simple built-in numerical types: we can also do more complicated things like defining C Structures within Python:

In [5]:
%%file my_sum2.c
#include <stdio.h>

// Define a simple array struct, containing
// a pointer and a length
struct Array{
   int *x;
   int n;

// Sum the values in the array struct
int sum(struct Array a)
  int counter, i;
  counter = 0;
  for(i=0; i<a.n; i++)
      counter += a.x[i];
  return counter;

Writing my_sum2.c

In [6]:
gcc -c my_sum2.c
gcc -shared -o my_sum2.o

Now the function requires us to pass a structure as an argument. To handle this, we need to create a mirror of this structure within our Python script. This can be done via the CTypes API:

In [7]:
from ctypes import Structure, POINTER, c_int

# Here's our C structure defined in Python
class Array(Structure):
    _fields_ = ("x", c_void_p), ("n", c_int)
a = np.arange(10, dtype=np.int32)
arr = Array(a.ctypes.data_as(c_void_p), a.size)

print my_sum2.sum(arr)
print a.sum()


We're starting to see how this can be helpful, but what if we want something more involved?

Interfacing to LAPACK functions

Let's interface to the system LAPACK (Linear Algebra Package) routines and solve a linear equation. This is the type of function you might write in order to interface with a code someone else has written.

We'll use the DGESV routine from LAPACK: the Double precision GEneral SolVer for linear equations. This is a standard routine found on most computers. See for a description of the call signature.

In [8]:
from ctypes import CDLL, c_int, c_void_p, byref
import numpy as np

def solve(M, b):
    """Solve the linear equation M x = b using LAPACK's DGESV"""
    # Interface to the LAPACK system library
    # This may be different depending on your operating system
    lapack = CDLL('liblapack.dylib')
    # Make sure M is double precision fortran-ordered
    M = np.asarray(M, dtype=np.float64, order='F')
    # Make a copy of b.  It will be overwritten with the result
    b = np.array(b, dtype=np.float64, copy=True)
    n = b.size
    # We could generalize this, but we'll do a simple problem for now
    assert M.shape == (n, n)
    assert b.shape == (n,)
    # prepare the variables for dgesv.
    size = c_int(n)
    one = c_int(1)
    info = c_int(0)
    lapack.dgesv(byref(size), # size of the problem
                 byref(one), # number of columns of b
                 M.ctypes.data_as(c_void_p), # M data array
                 byref(size), # size of the problem
                 (n * c_int)(), # integer work array
                 b.ctypes.data_as(c_void_p), # b data array
                 byref(size), # size of b
                 byref(info)) # exit flag
    return b

In [9]:
X = np.random.random((5, 5))
b = np.random.random(5)

print solve(X, b)

[-0.65867455  5.16904569 -1.20729377  2.32694744 -2.23503501]

In [10]:
print np.linalg.solve(X, b)

[-0.65867455  5.16904569 -1.20729377  2.32694744 -2.23503501]

The routine np.linalg.solve actually uses this same routine (depending on your install options), so it should give precisely the same results. You'd never want to do this for LAPACK routines (they're already in numpy, after all), but this sort of thing can come in handy to call legacy C/Fortran code from Python.


F2Py is a submodule of NumPy, and is designed to create easy interfaces to Fortran code (though it will also work with C). We can do something similar to above, but use f2py rather than gcc to compile it (note that this will require you to have a Fortran compiler installed).

A simple example

We'll start with a simple example that computes the first N fibonacci numbers. Here's a Python version of the function:

In [45]:
import numpy as np

def fib_py(N):
    x = np.zeros(N, dtype=float)
    for i in range(N):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
            x[i] = x[i - 1] + x[i - 2]
    return x

In [46]:
print fib_py(10)

[  0.   1.   1.   2.   3.   5.   8.  13.  21.  34.]

In [47]:
%timeit fib_py(10000)

100 loops, best of 3: 3.95 ms per loop

Now we'll write the same routine in Fortran:

In [48]:
%%file fib.f

      subroutine fib(A,N)
      integer N
      double precision A(N)
      do i=1,N
         if (i.EQ.1) then
            A(i) = 0.0D0
         elseif (i.EQ.2) then
            A(i) = 1.0D0
            A(i) = A(i-1) + A(i-2)

Overwriting fib.f

We'll compile this from the command line using f2py:

In [49]:
!f2py -c -m fib fib.f

running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
building extension "fib" sources
f2py options: []
f2py:> /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fibmodule.c
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7
Reading fortran codes...
	Reading file 'fib.f' (format:fix,strict)
	Block: fib
			Block: fib
Post-processing (stage 2)...
Building modules...
	Building module "fib"...
		Constructing wrapper function "fib"...
	Wrote C/API module "fib" to file "/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fibmodule.c"
  adding '/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fortranobject.c' to sources.
  adding '/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7' to include_dirs.
copying /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/f2py/src/fortranobject.c -> /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7
copying /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/f2py/src/fortranobject.h -> /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7
build_src: building npy-pkg config files
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
customize Gnu95FCompiler
Found executable /usr/local/bin/gfortran
customize Gnu95FCompiler
customize Gnu95FCompiler using build_ext
building 'fib' extension
compiling C sources
C compiler: /usr/bin/clang -fno-strict-aliasing -I/Users/jakevdp/anaconda/include -arch x86_64 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes

creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/var
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/var/folders
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/var/folders/_q
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p
creating /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7
compile options: '-I/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7 -I/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include -I/Users/jakevdp/anaconda/include/python2.7 -c'
clang: /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fortranobject.c
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fortranobject.c:2:
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fortranobject.h:13:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1804:
/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
1 warning generated.
clang: /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fibmodule.c
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fibmodule.c:18:
In file included from /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fortranobject.h:13:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/arrayobject.h:4:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:17:
In file included from /Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1804:
/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:15:2: warning: "Using deprecated NumPy API, disable it by "          "#defining NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings]
#warning "Using deprecated NumPy API, disable it by " \
/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fibmodule.c:111:12: warning: unused function 'f2py_size' [-Wunused-function]
static int f2py_size(PyArrayObject* var, ...)
2 warnings generated.
compiling Fortran sources
Fortran f77 compiler: /usr/local/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -m64 -fPIC -O3 -funroll-loops
Fortran f90 compiler: /usr/local/bin/gfortran -Wall -g -fno-second-underscore -m64 -fPIC -O3 -funroll-loops
Fortran fix compiler: /usr/local/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -Wall -g -fno-second-underscore -m64 -fPIC -O3 -funroll-loops
compile options: '-I/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7 -I/Users/jakevdp/anaconda/lib/python2.7/site-packages/numpy/core/include -I/Users/jakevdp/anaconda/include/python2.7 -c'
gfortran:f77: fib.f
/usr/local/bin/gfortran -Wall -g -m64 -Wall -g -undefined dynamic_lookup -bundle /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fibmodule.o /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/src.macosx-10.5-x86_64-2.7/fortranobject.o /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p/fib.o -L/usr/local/lib/gcc/x86_64-apple-darwin12.4.0/4.8.1 -L/Users/jakevdp/anaconda/lib -lgfortran -o ./
Removing build directory /var/folders/_q/9lrsb4wj1b5gy71vm32kqjy80000gn/T/tmpZRXw2p

In [51]:
import numpy as np
import fib
a = np.zeros(10, dtype='d')
print a

[  0.   1.   1.   2.   3.   5.   8.  13.  21.  34.]

In [52]:
a = np.zeros(10000, dtype='d')
%timeit fib.fib(a)

10000 loops, best of 3: 26.6 µs per loop

The Fortran version is about 300x faster!

Using an interface file

But this is a bit awkward that we have to create the array that will hold the results... It would be nice if this could happen automatically. We can make this happen through the use of interface files, which have a .pyf extension.

An interface template can be generated automatically with f2py. We'll tell it that we want a module called fib2:

In [18]:
!rm -f _fib2.pyf
!f2py fib.f -m fib2 -h _fib2.pyf

Reading fortran codes...
	Reading file 'fib.f' (format:fix,strict)
	Block: fib2
			Block: fib
Post-processing (stage 2)...
Saving signatures to file "./_fib2.pyf"

In [19]:
!cat _fib2.pyf

!    -*- f90 -*-
! Note: the context of this file is case sensitive.

python module fib2 ! in 
    interface  ! in :fib2
        subroutine fib(a,n) ! in :fib2:fib.f
            double precision dimension(n) :: a
            integer, optional,check(len(a)>=n),depend(a) :: n=len(a)
        end subroutine fib
    end interface 
end python module fib2

! This file was auto-generated with f2py (version:2).
! See

Take a look at the default settings. We have our subroutine called fib(a, n), which lists the following:

double precision dimension(n) :: a

This lists a variable which is an input into the routine: an array of length n

integer, optional,check(len(a)>=n),depend(a) :: n=len(a)

This tells us what the routine expects n to be. It is optionally specified, with a default value which is the length of a. Also, there is a check in here that makes sure a is long enough. Let's check this out:

In [20]:
from fib import fib
a = np.zeros(10)
fib(a, 10)  # specify the optional keyword
print a

[  0.   1.   1.   2.   3.   5.   8.  13.  21.  34.]

In [21]:
a = np.zeros(10)
fib(a, 5)  # specify only part of the array
print a

[ 0.  1.  1.  2.  3.  0.  0.  0.  0.  0.]

In [22]:
a = np.zeros(10)
fib(a, 20)  # specify a length which is too long

error                                     Traceback (most recent call last)
<ipython-input-22-3aa9413db322> in <module>()
      1 a = np.zeros(10)
----> 2 fib(a, 20)  # specify a length which is too long

error: (len(a)>=n) failed for 1st keyword n: fib:n=20

What we want to do is to modify this interface so that the length n is an input, and the array a is an (automatically constructed) output. We'll do this below; for more details on the options available in F2Py interface files, see the F2Py documentation.

In [ ]:
%%file fib2.pyf
!    -*- f90 -*-
! Note: the context of this file is case sensitive.

python module fib2 ! in 
    interface  ! in :fib2
        subroutine fib(a,n) ! in :fib2:fib.f
            double precision dimension(n), intent(out), depend(n) :: a
            integer intent(in) :: n
        end subroutine fib
    end interface 
end python module fib2

We've specified the intent and the dependencies of each variable using the f2py specifications.

Here we compile fib2 using the interface file we've created:

In [ ]:
!f2py -c -m fib2 fib2.pyf fib.f

Now we can import and call the function like this:

In [ ]:
import fib2
print fib2.fib(10)

In [ ]:

In [ ]:
%timeit fib2.fib(10000)

From our simple FORTRAN function, along with an interface file and a compilation with f2py, we now have a Python module which is callable in a very intuitive way.

If you look into the source code of SciPy, you'll see that this is how much of its functionality is implemented, through wrapping pieces of the NETLIB repository.


CTypes and F2Py provide the ability to wrap Fortran, C, and C++ code so that it can be imported into Python. Cython enables this as well, though we will not focus on that part of it here. The biggest part of Cython is that it lets you convert Python code and Python-like code into compiled C code, which can run many times faster than the original code.

Let's see a quick example. Here's a Python function which computes the N^th fibonacci number:

In [53]:
def nth_fib(n):
    a, b = 0, 1
    for i in range(n):
        b, a = a + b, b
    return a

In [54]:
[nth_fib(i) for i in range(10)]

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

In [55]:
%timeit nth_fib(10000)

100 loops, best of 3: 2.8 ms per loop

Now we'll take the exact same code, and compile it with Cython. In general, this will be done by saving the code to file, and running cython on the command line. You can read about that in the documentation. Here we'll use IPython's Cython magic to streamline the process:

In [56]:
%load_ext cythonmagic

The cythonmagic extension is already loaded. To reload it, use:
  %reload_ext cythonmagic

In [57]:
def nth_fib2(n):
    a, b = 0, 1
    for i in range(n):
        b, a = a + b, b
    return a

In [25]:
[nth_fib2(i) for i in range(10)]

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

In [58]:
%timeit nth_fib2(10000)

100 loops, best of 3: 2.05 ms per loop

Just compiling the code in Cython gave us a ~10% speedup. But we can do better by adding type annotations.

See, the main reason Python is slow is because it has to do dynamic type checking each time it evaluates an expression. If we can tell Cython what the types are from the beginning, this step can be skipped, and we have large time savings. We do this through a cdef command. We also do the temporary assignment explicitly to remove the Python tuple assignment:

In [59]:
def nth_fib3(int n):
    cdef int a = 0
    cdef int b = 1
    cdef int tmp
    for i in range(n):
        tmp = b
        b = a + b
        a = tmp
    return a

In [60]:
[nth_fib3(i) for i in range(10)]

[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

In [61]:
print "Python only:"
%timeit nth_fib(10000)
print "Bare Cython:"
%timeit nth_fib(10000)
print "Typed Cython:"
%timeit nth_fib3(10000)

Python only:
100 loops, best of 3: 2.82 ms per loop
Bare Cython:
100 loops, best of 3: 2.88 ms per loop
Typed Cython:
100000 loops, best of 3: 5.99 µs per loop

By adding some type information to our script, we sped up the execution by several orders of magnitude! This shows how easy Cython is for simple problems.

One very useful trick to be aware of is cython -a. This will produce an annotated html document showing which lines of the program are causing problems. You can run it like this:

In [62]:
%%file nth_fib.pyx
# Slow version:
def nth_fib2(n):
    a, b = 0, 1
    for i in range(n):
        b, a = a + b, b
    return a

# Fast Version:
def nth_fib3(int n):
    cdef int a = 0
    cdef int b = 1
    cdef int tmp
    for i in range(n):
        tmp = b
        b = a + b
        a = tmp
    return a

Overwriting nth_fib.pyx

In [63]:
!cython -a nth_fib.pyx

If we open the resulting HTML file, we'll see a highlighted version of our code. The darkness of the highlight shows how many lines of C code were generated by the line of Python. More yellow lines generally means slower code. This can be very helpful when your code is not running as quickly as you'd expected. To see the results, open nth_fib.html after running the above code.

Using Cython with NumPy

Cython provides a really nice interface to numpy arrays via the Typed Memoryview syntax. Let's implement the same fib function as above, but using Cython.

First we'll simply compile our Python function again:

In [32]:
import numpy as np

def fib_cy(N):
    x = np.zeros(N, dtype=float)
    for i in range(N):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
            x[i] = x[i - 1] + x[i - 2]
    return x

In [33]:
print np.allclose(fib_py(10000), fib_cy(10000))
%timeit fib_py(10000)
%timeit fib_cy(10000)

100 loops, best of 3: 4.07 ms per loop
-c:1: RuntimeWarning: overflow encountered in double_scalars
-c:257: RuntimeWarning: overflow encountered in double_scalars
100 loops, best of 3: 3 ms per loop

Again, a very small improvement. Let's add some type information and see how we do:

In [34]:
import numpy as np
from numpy cimport float_t

def fib_cy2(int N):
    cdef int i
    cdef float_t[::1] x = np.zeros(N, dtype=float)
    for i in range(N):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
            x[i] = x[i - 1] + x[i - 2]
    return x

warning: /Users/jakevdp/anaconda/lib/python2.7/site-packages/Cython/Includes/numpy.pxd:869:17: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.
warning: /Users/jakevdp/anaconda/lib/python2.7/site-packages/Cython/Includes/numpy.pxd:869:24: Non-trivial type declarators in shared declaration (e.g. mix of pointers and values). Each pointer declaration should be on its own line.

In [35]:
%timeit fib_cy(10000)
%timeit fib_cy2(10000)

100 loops, best of 3: 2.89 ms per loop
10000 loops, best of 3: 36.6 µs per loop

Wow! Simply adding some type information for the Cython compiler made our code orders of magnitude faster! This is because the array can now be treated as a contiguous memory block rather than a Python object. This makes each of the indexing operations much more efficient, because they no longer have to go through the Python interface layer.

Learning More

There's a whole lot more to know about Cython, especially the details of working with numpy arrays. Here are some more resources:


You might wonder whether some of that type information could be inferred by the compiler, or whether the compilation phase is really necessary. This is where Numba comes in. Numba is an LLVM-based Just In Time (JIT) compiler for Python. It allows simple annotations of Python code which can lead to huge speedups for certain operations.

Numba comes from Continuum Analytics, a company founded by the creator of NumPy and SciPy, and the same people who are behind the Anaconda Python distribution. Numba is still relatively young, and can be a little finnicky for some problems, but the approach is really clean.

You can install numba from scratch, but because of the dependencies on LLVM and other difficult-to-install tools, I'd highly recommend just going with Anaconda.

Let's go back to our Python fib() function:

In [65]:
def fib(N):
    x = np.zeros(N, dtype=np.float64)
    for i in range(N):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
            x[i] = x[i - 1] + x[i - 2]
    return x

To speed this up with Numba, we simply use the jit decorator:

In [67]:
from numba import autojit
fib_nb = autojit(fib)

In [68]:

array([  0.,   1.,   1.,   2.,   3.,   5.,   8.,  13.,  21.,  34.])

In [69]:
%timeit fib(10000)
%timeit fib_nb(10000)

100 loops, best of 3: 3.85 ms per loop
10000 loops, best of 3: 31.2 µs per loop

Usually, numba is used as a decorator, which essentially does the same as we saw above. You'd write a decorated function this way:

In [70]:
def fib_nb(N):
    x = np.zeros(N, dtype=np.float64)
    for i in range(N):
        if i == 0:
            x[i] = 0
        elif i == 1:
            x[i] = 1
            x[i] = x[i - 1] + x[i - 2]
    return x

In [71]:
%timeit fib_nb(10000)

10000 loops, best of 3: 31.4 µs per loop

You can see that simply by passing our fib function through the autojit decorator, we've attained execution which is a few percent faster than what we got with Cython - and there's no fancy compilation stage required! The funcion is compiled on demand through LLVM, which makes numba very convenient to use.

Now for the bad news: Numba is still young, and there are a lot of corner cases where the compiler will just break without any useful error message. For more complicated problems and functions, the output code is not always faster than Python+NumPy, and can sometimes even be slower!

My recommendation right now is to keep Numba in mind, to try it in your problems, but not necessarily depend on it -- yet.

More Information on Numba

There's still not a lot of information out there, but here are a few resources:

Homework: Accelerating the Mandelbrot Set

One of the more interesting and beautiful calculations you can do with Python is the Mandelbrot Set, a straightforwardly-generated set of points whose boundary is a fractal. The computation of whether a point is in the mandelbrot set can be done as follows:

In [72]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

In [76]:
def mandel(x, y, max_iter):
    Given z = x + iy and max_iter, determine whether the candidate
    is in the mandelbrot set for the given number of iterations
    c = complex(x, y)
    z = 0.0j
    for i in range(max_iter):
        z = z*z + c
        if (z.real*z.real + z.imag*z.imag) >= 4:
            return i

    return max_iter

def create_fractal(Nx, Ny, xmin, xmax, ymin, ymax, max_iter):
    """Create and return a fractal image"""
    image = np.zeros((Ny, Nx), dtype=float)
    dx = (xmax - xmin) * 1. / Nx
    dy = (ymax - ymin) * 1. / Ny
    for x in range(Nx):
        rpart = xmin + x * dx
        for y in range(Ny):
            ipart = ymin + y * dy
            color = mandel(rpart, ipart, max_iter)
            image[y, x] = color
    return image

In [77]:
image = create_fractal(300, 200, -2, 1, -1, 1, 20)

AttributeError                            Traceback (most recent call last)
<ipython-input-77-897f8543ae71> in <module>()
----> 1 image = create_fractal(300, 200, -2, 1, -1, 1, 20)
      2 plt.imshow(image,

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/ in numba.numbawrapper.NumbaSpecializingWrapper.__call__ (numba/numbawrapper.c:2827)()

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/decorators.pyc in compile_function(args, kwargs)
    206                          **translator_kwargs)
--> 208             compiled_function = dec(f)
    209             return compiled_function

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/decorators.pyc in _jit_decorator(func)
    261         assert kwargs.get('llvm_ee') is None, "Engine should never be provided"
    262         sig, lfunc, wrapper = compile_function(env, func, argtys, restype=restype,
--> 263                                     nopython=nopython, **kwargs)
    264         return numbawrapper.NumbaCompiledWrapper(func, wrapper, sig, lfunc)

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/decorators.pyc in compile_function(env, func, argtypes, restype, **kwds)
    136     assert kwds.get('llvm_module') is None, kwds.get('llvm_module')
--> 138     func_env = pipeline.compile2(env, func, restype, argtypes, **kwds)
    140     function_cache.register_specialization(func_env)

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/pipeline.pyc in compile2(env, func, restype, argtypes, ctypes, compile_only, **kwds)
    476         pipeline = env.get_pipeline(kwds.get('pipeline_name', None))
    477         func_ast.pipeline = pipeline
--> 478         post_ast = pipeline(func_ast, env)
    479         func_signature = func_env.func_signature
    480         symtab = func_env.symtab

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/pipeline.pyc in __call__(self, ast, env)
    527         if self.is_composed:
--> 528             ast = self.transform(ast, env)
    529         else:
    530             try:

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/pipeline.pyc in transform(self, ast, env)
    905                 stage_tuple = (stage, utils.ast2tree(ast))
    906                 logger.debug(pprint.pformat(stage_tuple))
--> 907             ast = stage(ast, env)
    908         return ast

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/pipeline.pyc in _stage(ast, env)
    890             def _stage(ast, env):
    891                 stage_obj = getattr(env.pipeline_stages, name)
--> 892                 return _check_stage_object(stage_obj)(ast, env)
    893             _stage.__name__ = name
    894             stage = _stage

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/pipeline.pyc in __call__(self, ast, env)
    529         else:
    530             try:
--> 531                 ast = self.transform(ast, env)
    532             except error.NumbaError as e:
    533                 func_env = env.translation.crnt

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/pipeline.pyc in transform(self, ast, env)
    692         crnt = env.translation.crnt
    693         type_inferer = self.make_specializer(type_inference.TypeInferer,
--> 694                                              ast, env, **crnt.kwargs)
    695         type_inferer.infer_types()
    696         crnt.func_signature = type_inferer.func_signature

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/pipeline.pyc in make_specializer(self, cls, ast, env, **kws)
    520                    closure_scope=crnt.closure_scope,
    521                    env=env)
--> 522         return cls(env.context, crnt.func, ast, **kws)
    524     def __call__(self, ast, env):

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer.pyc in __init__(self, context, func, ast, closure_scope, **kwds)
     87         self.function_level = kwds.get('function_level', 0)
---> 88         self.init_locals()
     89         ast.have_return = False

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer.pyc in init_locals(self)
    160         if self.have_cfg:
    161             self.deferred_types = []
--> 162             self.resolve_variable_types()
    164         if debug and self.have_cfg:

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer.pyc in resolve_variable_types(self)
    353         The dependencies are resolved after type inference completes.
    354         """
--> 355         self.analyse_assignments()
    357         for deferred_type in self.deferred_types:

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer.pyc in analyse_assignments(self)
    264                 elif isinstance(stat, control_flow.NameAssignment):
    265                     # print "analysing", stat.lhs
--> 266                     assmnt = self.handle_NameAssignment(stat.assignment_node)
    267                     stat.assignment_node = assmnt

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer.pyc in handle_NameAssignment(self, assignment_node)
    212             return self.visit_For(assignment_node)
    213         else:
--> 214             return self.visit(assignment_node)
    216     def handle_phi(self, node):

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer.pyc in visit(self, node)
    454         if node is Ellipsis:
    455             node = ast.Ellipsis()
--> 456         result = super(TypeInferer, self).visit(node)
    457         return result

/Users/jakevdp/anaconda/lib/python2.7/ast.pyc in visit(self, node)
    239         method = 'visit_' + node.__class__.__name__
    240         visitor = getattr(self, method, self.generic_visit)
--> 241         return visitor(node)
    243     def generic_visit(self, node):

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer.pyc in visit_Assign(self, node)
    531         node.inplace_op = getattr(node, 'inplace_op', None)
--> 533         node.value = self.visit(node.value)
    534         if len(node.targets) != 1 or isinstance(node.targets[0], (ast.List,
    535                                                                   ast.Tuple)):

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer.pyc in visit(self, node)
    454         if node is Ellipsis:
    455             node = ast.Ellipsis()
--> 456         result = super(TypeInferer, self).visit(node)
    457         return result

/Users/jakevdp/anaconda/lib/python2.7/ast.pyc in visit(self, node)
    239         method = 'visit_' + node.__class__.__name__
    240         visitor = getattr(self, method, self.generic_visit)
--> 241         return visitor(node)
    243     def generic_visit(self, node):

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer.pyc in visit_Call(self, node, visitchildren)
   1330             new_node = self._resolve_external_call(node, func_type,
-> 1331                                                    func, arg_types)
   1333         return new_node

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer.pyc in _resolve_external_call(self, call_node, func_type, py_func, arg_types)
   1201             # signature = self.function_cache.get_signature(arg_types)
   1202             new_node = self._resolve_return_type(func_type, new_node,
-> 1203                                                  call_node, arg_types)
   1205         if llvm_func is not None:

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer.pyc in _resolve_return_type(self, func_type, new_node, node, argtypes)
   1256         return infer_call.infer_typefunc(self.context, node,
-> 1257                                          func_type, new_node)
   1259     def visit_Call(self, node, visitchildren=True):

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/infer_call.pyc in infer_typefunc(context, call_node, func_type, default_node)
     32         # Try the module type inferers
     33         result_node = module_type_inference.resolve_call_or_none(
---> 34                                     context, call_node, func_type)
     35         if result_node:
     36             return result_node

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/module_type_inference.pyc in resolve_call_or_none(context, call_node, func_type)
    263         # Try the module type inferers
    264         new_node = nodes.call_obj(call_node, None)
--> 265         return resolve_call(context, call_node, new_node, func_type)
    267 def can_handle_deferred(py_func):

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/module_type_inference.pyc in resolve_call(context, call_node, obj_call_node, func_type)
    247     Returns a new AST node that should replace the ast.Call node.
    248     """
--> 249     result = dispatch_on_value(context, call_node, func_type)
    251     if result is not None and not isinstance(result, ast.AST):

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/module_type_inference.pyc in dispatch_on_value(context, call_node, func_type)
    225         method_kwargs.clear()
--> 227     return inferer(*args, **method_kwargs)
    229 def resolve_call(context, call_node, obj_call_node, func_type):

/Users/jakevdp/anaconda/lib/python2.7/site-packages/numba/type_inference/modules/numpymodule.pyc in empty(shape, dtype, order)
    140         return None
--> 142     return typesystem.array(dtype.dtype, ndim)
    144 register_inferer(np, 'empty', empty)

AttributeError: 'NoneType' object has no attribute 'dtype'

Using this simple code, you can generate some absolutely stunning images. The problem is, to go deeper you need very high resolution and very many iterations, both of which are extremely slow in Python!

Your assignment is to make these faster:

Part 1

Speed up this example using either CTypes, F2Py, or Cython. With CTypes/F2Py, you'll have to write the above functionality as a C/Fortran routine, and use the tools above to wrap that functionality.

For Cython, it will require adding appropriate type information to each of the variables used in the computation. Also, check the Cython documentation and make sure to make mandel() a cdef-ed function. Once you're finished, use the cython -a trick to generate annotated code, and make sure there are no yellow-highlighted lines in the core of your algorithm: the loops should look completely clean!

Part 2

Speed up this example using Numba. Numba is much more of a black box, but you should be able to do it very quickly. You'll likely find that simply adding the decorator to the function doesn't work. I'll give you a hint: Numba doesn't like an array specified with dtype=float. Look back to our previous example for clues about how to fix this. (Also, you might find this notebook to be a useful resource: it is all about doing the mandelbrot set with Numba).

Part 3

Use %timeit to compare your three implementations side-by-side for the values passed above.

Part 4

Once you have these working, choose one of the methods and use it to zoom-in on a portion of the above plot. Go to high resolution and a large number of iterations (say 256). Experiment with different matplotlib color schemes (see some of the options here) and produce your most beautiful visualization of the Mandelbrot set.