Introduction

While Python is many things, it is generally not thought of as a performance oriented language. We trade execution speed for ease of programming and rapid development times. Despite this limitation, Python has largely become the standard programming language of science and has given rise to an immense array of toolsets and libraries. Unfortunately, we as scientists want it all. How can we leverage the ease of python programming while maintaining performance?

There are a few primary approaches for making our Python code faster:

algorithmic improvements
profiling
performance libraries
parallelization
hardware changes (cpus/gpus/coprocessors, network/cluster improvements)

Improvements to the algorithmic approach combined with profiling should always be a scientific programmer's first tools for improving code execution times. If you can improve your code without introducing extra dependecies, you've achieved a win for the day. We will demonstrate the importance of profiling in the following tutorials.

Unfortunately, even with determined profiling, there is only so much we can do with vanilla Python. We are fortunate that the Python gurus of the world have been working hard to give us a number of libraries which, on one level or another, seek to make Python more performant:

Numpy
Numba
Cython
Dask

Each of these libraries has different use-cases and idioms but they all can be used to speed up Python code. We will explore the use of these libraries in a real, but relatively simple example.

Disclaimer 1 There is another approach to speeding up Python which we will not explore here: replacing the standard python interpreter itself with a speedier version. There are a few notable examples of this (PyPy, Pyston), but we will omit these approaches as mainating multiple Python interpreters is likely outside of the scope of the average scientists interests.

Disclaimer 2 Performance comparisons via microbenchmarks are always inherently biased. A Java programmer comparing Java and C++ approaches is unlikely to write good, idiomatic and performant C++ code. Despite this, they will ultimately reach the conclusion that Java is better because their benchmarks say so. We have done our best below to program idiomatically in each chosen toolset, but these are unlikely to be perfect examples. Reach conclusions at your own risk.

One of my perennial issues with examples in programming is that the examples used to demo a feature or codebase are often far out of line with any sort of real use-case. This makes it difficult to transfer the knowledge from the demo to your workflow. The classic example of this is the use of "add" functions in examples.

Here, I intend use an example that is exteremely relevant to me: the potential energy calculation of a molecular simulation. This is easily identifiable in the majority of molecular simulations as the primary performance bottleneck. This is because, naively, this calculation scales as $O(n^2)$ because the potential energy is a function of the pair distances between all of the atoms.

We start each individual notebook by loading some notebook wide features. Memory profiler and snakeviz are tools we will use to debug our codes. Bokeh and Holoviews are plotting libraries. The debugger was used in the development of this notebook. Hopefully we won't need it here.



In [1]:

    
%load_ext memory_profiler
%load_ext snakeviz
%load_ext cython
import holoviews as hv
hv.extension('bokeh','matplotlib')
from IPython.core import debugger
ist = debugger.set_trace

We then generate arrays of position and simulation box information. Rather than using 'real' data, we will just randomly generate positions. You should see below that the positions array is just an N X 3 array where the three columns correspond to x, y, and z coordinate positions.

Warning Depending on the hardware you are using, you may want to scale down the num variable. This value was chosen for my specific laptop but may not be suitable for your machine.



In [2]:

    
import numpy as np

num = 5000
lmin = -25 #lower simulation box bound in x, y, and z
lmax = +25 #upper simulation box bound in x, y, and z

L = lmax - lmin
box = np.array([L,L,L])
pos = lmin + np.random.random((num,3))*(lmax-lmin)

print('Positions Array Shape:',pos.shape)
print(pos)
print('x min/max: {:+4.2f}/{:+4.2f}'.format(pos.min(0)[0],pos.max(0)[0]))
print('y min/max: {:+4.2f}/{:+4.2f}'.format(pos.min(0)[1],pos.max(0)[1]))
print('z min/max: {:+4.2f}/{:+4.2f}'.format(pos.min(0)[2],pos.max(0)[2]))









    



Positions Array Shape: (5000, 3)
[[-24.63101548  -7.30981693   5.15194202]
 [ -2.99617462  16.20578863   5.24900732]
 [ -7.48051512  -5.13482695   3.77152153]
 ..., 
 [ 17.36684967   9.21279082  15.76759257]
 [ -7.49335352   7.86669909  -5.18779992]
 [ 16.98217212   7.04702386   0.54704187]]
x min/max: -24.99/+25.00
y min/max: -25.00/+24.98
z min/max: -25.00/+24.99

We can plot the positions to get a visual picture of our dataset.



In [3]:

    
%%output backend='matplotlib'
hv.Scatter3D(pos)









    Out[3]:

Finally, we write these positions to a file so that all of the following examples can calculate on the same data set.



In [4]:

    
np.savetxt('data/positions.dat',pos)
np.savetxt('data/box.dat',box)



In [ ]: