Pure Python Approach (Standard Libary Only)

To begin, we implement the calcuation in pure Python in order to get our baseline performance.


In [1]:
%load_ext memory_profiler
%load_ext snakeviz
%load_ext cython
import holoviews as hv
hv.extension('bokeh','matplotlib')
from IPython.core import debugger
ist = debugger.set_trace


We load in the position and box information created in the intro notebook. If you haven't run that notebook, this line will not work! (You don't have to read the wall of text, just run the cells...)


In [2]:
import numpy as np
pos = np.loadtxt('data/positions.dat')
box = np.loadtxt('data/box.dat')

#In order to make this test fair and 'pure python', 
#we need to convert the numpy position and box size arrays to python lists.
x,y,z = map(list,pos.T)
lx,ly,lz = box

print('Read {:d} positions.'.format(pos.shape[0]))
print('x min/max: {:+4.2f}/{:+4.2f}'.format(pos.min(0)[0],pos.max(0)[0]))
print('y min/max: {:+4.2f}/{:+4.2f}'.format(pos.min(0)[1],pos.max(0)[1]))
print('z min/max: {:+4.2f}/{:+4.2f}'.format(pos.min(0)[2],pos.max(0)[2]))


Read 5000 positions.
x min/max: -24.99/+24.97
y min/max: -25.00/+24.99
z min/max: -25.00/+24.99

We are going to use a soft potential without a hard-core here because our random generation of positions is going to leave us with overlaps. This potential doesn't diverge at the origin and will essentially represent our coordinate positions as soft blobs. We implement this here in pure python.


In [3]:
from math import cos,pi

def potentialEnergyFunk(r,width=1.0,height=10.0):
    '''
    Calculates the (soft) potential energy between two atoms
    
    Parameters
    ----------
    r: float
        separation distance between two atoms
    height: float
        breadth of the potential i.e. where the potential goes to zero
    width: float
        strength/height of the potential
    '''
    if r<width:
        return 0.5 * height * (1 + cos(pi*r/width))
    else:
        return 0

Let's plot the soft potential using some fancy HoloViews + Bokeh magic!


In [4]:
%%opts Curve [width=600,show_grid=True,height=350]

dr = 0.05          # spacing of r points
rmax = 10.0        # maximum r value
pts = int(rmax/dr) # number of r points
r = [dr*i for i in range(pts)]

def plotFunk(width,height,label='dynamic'):
    
    # Need to wrap potentialEnergyFunk for map call below to behave
    funk = lambda r: potentialEnergyFunk(r,width,height)
    
    U = list(map(funk,r))
    
    return hv.Curve((r,U),kdims=['Separation Distance'],vdims=['Potential Energy'],label=label)
    
dmap = hv.DynamicMap(plotFunk,kdims=['width','height'])
dmap = dmap.redim.range(width=((1.0,10.0)),height=((1.0,5.0)))
dmap*plotFunk(10.0,5.0,label='width: 10., height: 5.')*plotFunk(1.0,1.0,label='width: 1., height: 1.')


Out[4]:

In [5]:
from math import sqrt

def calcTotalEnergy(x,y,z,lx,ly,lz):
    '''
    Parameters
    ----------
    x,y,z: lists of floats
        1-D lists of cartesian coordinate positions
    
    lx,ly,lz: floats
        simulation box dimensions
    '''
    
    #sanity check
    assert len(x) == len(y) == len(z)
    
    # store half box lengths for minimum image convention
    lx2 = lx/2.0
    ly2 = ly/2.0
    lz2 = lz/2.0
    
    U = 0
    
    #The next two loops simply loop over every element in the x, y, and z arrays
    for i,(x1,y1,z1) in enumerate(zip(x,y,z)):
        for j,(x2,y2,z2) in enumerate(zip(x,y,z)):
            
            # We only need to consider each pair once
            if i>=j:
                continue
           
            dx = abs(x1 - x2)
            dy = abs(y1 - y2)
            dz = abs(z1 - z2)
                
            # The next few lines take care of applying the minimum image convention
            # This is neccesary because many/most molecular simulations use periodic boundary conditions
            if dx > lx2:
                dx -= lx
            if dy > ly2:
                dy -= ly
            if dz > lz2:
                dz -= lz
                
            dist = sqrt(dx*dx + dy*dy + dz*dz)
            
            U += potentialEnergyFunk(dist)
    
    return U

In the cell below we are setting up a profiling environment (via the cell magic %%prun) and storing the profiling information in prof/python.prof.

To visualize profiling output: From cmd line, in the same directory as this notebook run (after running the cell):

snakeviz --server prof/python.prof

We then run the calcTotalEnergy function to profile its execution performance.


In [6]:
%%prun -D prof/python.prof
energy = calcTotalEnergy(x,y,z,lx,ly,lz)


 
*** Profile stats marshalled to file 'prof/python.prof'. 

While the execution time is extremely important, equally as important is the memory usuage. While we are lucky enough to work on supercomputers which generally have massive amounts of memory, the memory usage in this calculation can scale as poorly as the execution time. In particular, we will see that while certain approaches run significantly faster than pure Python for these (relatively small) system sizes, they would not be sustainable at scale.

Below we profile the memory usage of the python function, and then write the max memory and increment to a file.


In [7]:
memprof = %memit -o calcTotalEnergy(x,y,z,lx,ly,lz)

usage = memprof.mem_usage[0]
incr = memprof.mem_usage[0] - memprof.baseline
with open('prof/python.memprof','w') as f:
    f.write('{}\n{}\n'.format(usage,incr))


peak memory: 150.44 MiB, increment: 0.00 MiB

In [8]:
with open('energy/python.dat','w') as f:
    f.write('{}\n'.format(energy))

In [ ]:


In [ ]: