Cython is a superset of the Python language, meaning that all python syntax is valid cython syntax, but there are additional cythonic things that you can introduce to the code to give the cython compiler additional information that will be used to auto-generate efficient C-code from your cython code.
I work with images from CCD cameras. We often want to sum up the pixel values within a region surrounding a star or galaxy in order to measure how bright the source is. The simplest way to do this is to draw a circle around the object and sum up all the pixels that fall within the circle. Of course, it can get more complicated when you consider partial pixels, other aperture shapes (ellipses) and associated noise and mask arrays, but for the purpose of this example, we'll keep it simple and just see how fast we can do this simple operation.
In [1]:
import numpy as np
In [2]:
data = np.random.rand(100, 100)
x = 50.0
y = 50.0
r = 10.0
In [3]:
import geometry_py
In [4]:
geometry_py.sum_circle(data, x, y, r)
Out[4]:
In [5]:
%timeit geometry_py.sum_circle(data, x, y, r)
In [6]:
import geometry_np
In [7]:
geometry_np.sum_circle(data, x, y, r)
Out[7]:
In [8]:
%timeit geometry_np.sum_circle(data, x, y, r)
Cython is a superset of the Python language, meaning that all python syntax is valid cython syntax, but there are additional cythonic things that you can introduce to the code to give the cython compiler additional information that will be used to auto-generate efficient C-code from your cython code.
In [9]:
import geometry_cy
In [10]:
geometry_cy.sum_circle(data, x, y, r)
Out[10]:
In [11]:
%timeit geometry_cy.sum_circle(data, x, y, r)
Have you ever wondered what Python is made from? Did it crawl out of the primordial-ooze of assembly code directly into the beautiful high-level language that we love to use? Nope! In fact, the language itself is abstract, based on a set of rules and regulations, with a set of core libraries and concepts defined for each specific language version (e.g. 2.7.6 or 3.4.1). In order to actually be able to use the language, the Python language must be implemented according to this set of rules and regulations.
What we usually think of as Python is actually a C-based implementation of the Python language, or CPython. There are other implementations of Python: Jython (JVM based), IronPython (.NET-based), or PyPy (RPython), but CPython is by far the largest and most commonly used.
The set of tools, rules, and regulations for implementing Python in C is contained in the Python/C API. One of the great things about Python as an open-source language is that the full API is freely available. In other words, you have access to the same exact stuff that the Python core developers do! In theory, there's nothing stopping you from implementing your own version of Python! (Keep in mind however that there have been dozens of core developers, plus thousands of brilliant contributors to CPython over 23 years... so you might want to hold off on starting over from scratch)
This is where the idea of extending python comes from. Using the C/API (or higher-level toolkits that handle some of the trickier parts of the API for you) anybody can build their own C-extension than can then be used just like any other library in python. In this demo, we'll very briefly touch on a couple of tools that allow us to extend Python in this manner.
As indicated above, you can learn the C/API yourself and write any C-code you want to produce Python modules based on your C-code. The main advantage here is that you have full access to the API without any abstraction layers or "middle-men" in between you and the C-code. This direct access is also a double-edged sword however in that you are responsible for all of the things required by the C/API. One of the worst things about the C/API is dealing with how python does memory management via reference counting. This can be a huge pain to get correct and very difficult to debug.
In [ ]:
#import geometry_c
In [ ]:
#geometry_c.sum_circle(data, x, y, r)
Cython handles the wrapping step by auto-generating C-code from your cython input file. This makes cython relatively easy to start using (all you have to do is append .pyx to any .py file and use distutils to build the cython extension).
It's not all cake and rainbows however; because all of the wrapping is being autogenerated for you, you don't necessarily know what is going on in that step. As a result, you will likely not get the behavior/performance gains that you were expecting the first time you use cython, due to inefficient switching between C and python code. Fortunately, there are tools to help you debug this and learn as you go.
To be a truly proficient cython programmer probably involves learning the same amount of new rules and syntax as does learning the C/API, but for simple C-extensions cython is probably your best bet to get a quick start.
In [ ]:
!cat geometry_cy.c
Extending python is a great solution when you don't want to give up the convience, flexibility, and feature-richness of the Python environment, but need to leverage some C capabilities as well. Some of the more common use-cases for extending python include:
I recommend snakeviz for viewing profiling results run with the standard Python profiler.
To install,
pip install snakeviz
Run the profile. Create a test script such as my_program.py
and run something like:
python -m cProfile -o program.prof my_program.py
View the profile.
snakeviz program.prof
In [ ]:
%load_ext snakeviz
In [ ]:
%%snakeviz
geometry_np.sum_circle(data, x, y, r)
Other options that we won't be discussing in any detail today include:
It is important to note that all of the extension options (cython, boost, swig, etc.) are based on the same C/API, but much effort has been put into simplifying the process of creating the C-extension. Many of these tools remove various parts of the "wrapping" process (e.g. reference counting, other nasty C things) from the user and attempt to automate them instead.