Numpy

The best part about Numpy is that, not only do we get massive speedups because numpy can perform many of its operations at the C-level, the vectorized api makes the code simpler and (to some extent more pythonic). The only "downside" is that we have to learn to write our code using Numpy idioms rather than Python idioms.


In [ ]:
%load_ext memory_profiler
%load_ext snakeviz
%load_ext cython
import holoviews as hv
hv.extension('bokeh','matplotlib')
from IPython.core import debugger
ist = debugger.set_trace

We load in the position and box information created in the intro notebook. If you haven't run that notebook, this line will not work! (You don't have to read the wall of text, just run the cells...)


In [ ]:
import numpy as np
pos = np.loadtxt('data/positions.dat')
box = np.loadtxt('data/box.dat')

print('Read {:d} positions.'.format(pos.shape[0]))
print('x min/max: {:+4.2f}/{:+4.2f}'.format(pos.min(0)[0],pos.max(0)[0]))
print('y min/max: {:+4.2f}/{:+4.2f}'.format(pos.min(0)[1],pos.max(0)[1]))
print('z min/max: {:+4.2f}/{:+4.2f}'.format(pos.min(0)[2],pos.max(0)[2]))

Round 1: Vectorized Operations

We need to re-implement the potential energy function in numpy.


In [ ]:
import numpy as np

def potentialEnergyFunk(r,width=1.0,height=10.0):
    '''
    Calculates the (soft) potential energy between two atoms
    
    Parameters
    ----------
    r: ndarray (float)
        separation distances between two atoms
    height: float
        breadth of the potential i.e. where the potential goes to zero
    width: float
        strength/height of the potential
    '''
    U = np.zeros_like(r)
    mask = (r<width) #only do calculation below the cutoff width
    U[mask] = 0.5 * height * (1 + np.cos(np.pi*r[mask]/width))
    return U

We can plot the potential energy again just to make sure this function behaves as expected.


In [ ]:
%%opts Curve [width=600,show_grid=True,height=350]

dr = 0.05          # spacing of r points
rmax = 10.0        # maximum r value
pts = int(rmax/dr) # number of r points
r = np.arange(dr,rmax,dr)

def plotFunk(width,height,label='dynamic'):
    U = potentialEnergyFunk(r,width,height)
    return hv.Curve((r,U),kdims=['Separation Distance'],vdims=['Potential Energy'],label=label)
    
dmap = hv.DynamicMap(plotFunk,kdims=['width','height'])
dmap = dmap.redim.range(width=((1.0,10.0)),height=((1.0,5.0)))
dmap*plotFunk(10.0,5.0,label='width: 10., height: 5.')*plotFunk(1.0,1.0,label='width: 1., height: 1.')

In [ ]:
from math import sqrt

def calcTotalEnergy1(pos,box):
    '''
    Parameters
    ----------
    pos: ndarray, size (N,3), (float)
        array of cartesian coordinate positions
    
    box: ndarray, size (3), (float)
        simulation box dimensions
    '''
    
    #sanity check
    assert box.shape[0] == 3
    
    # This next line is rather unpythonic but essentially it convinces
    # numpy to perform a subtraction between the full Cartesian Product
    # of the positions array
    dr = np.abs(pos - pos[:,np.newaxis,:])
    
    #still need to apply periodic boundary conditions
    dr = np.where(dr>box/2.0,dr-box,dr)
        
    dist = np.sqrt(np.sum(np.square(dr),axis=-1))
    
    # calculate the full N x N distance matrix
    U = potentialEnergyFunk(dist)

    # extract the upper triangle from U
    U = np.triu(U,k=1) 
    
    return U.sum()

Runtime profiling!


In [ ]:
%%prun -D prof/numpy1.prof
energy = calcTotalEnergy1(pos,box)

In [ ]:
with open('energy/numpy1.dat','w') as f:
    f.write('{}\n'.format(energy))

Memory profiling!


In [ ]:
memprof = %memit -o calcTotalEnergy1(pos,box)

usage = memprof.mem_usage[0]
incr = memprof.mem_usage[0] - memprof.baseline
with open('prof/numpy1.memprof','w') as f:
    f.write('{}\n{}\n'.format(usage,incr))

Round 2: Less is More

This is good, but can we do better? With this implementation, we are actually calculating twice as potential energies as we need to! Let's reimplement the above to see if we can speed up this function (and possible reduce the memory usage).


In [ ]:
from math import sqrt

def calcTotalEnergy2(pos,box):
    '''
    Parameters
    ----------
    pos: ndarray, size (N,3), (float)
        array of cartesian coordinate positions
    
    box: ndarray, size (3), (float)
        simulation box dimensions
    '''
    
    #sanity check
    assert box.shape[0] == 3
    
    # This next line is rather unpythonic but essentially it convinces
    # numpy to perform a subtraction between the full Cartesian Product
    # of the positions array
    dr = np.abs(pos - pos[:,np.newaxis,:])
    
    #extract out upper triangle
    dr = dr[np.triu_indices(dr.shape[0],k=1)]  #<<<<<<<
    
    #still need to apply periodic boundary conditions
    dr = np.where(dr>box/2.0,dr-box,dr)
        
    dist = np.sqrt(np.sum(np.square(dr),axis=-1))
    
    # calculate the full N x N distance matrix
    U = potentialEnergyFunk(dist)
    
    return U.sum()

In [ ]:
%%prun -D prof/numpy2.prof
energy = calcTotalEnergy2(pos,box)

In [ ]:
with open('energy/numpy2.dat','w') as f:
    f.write('{}\n'.format(energy))

Memory profiling!


In [ ]:
memprof = %memit -o calcTotalEnergy2(pos,box)

usage = memprof.mem_usage[0]
incr = memprof.mem_usage[0] - memprof.baseline
with open('prof/numpy2.memprof','w') as f:
    f.write('{}\n{}\n'.format(usage,incr))

In [ ]: