Nesterov Gradient Descent in OpenMM

Nesterov's accelerated gradient descent is an an "optimal" first minimizer, in that it has $1/k^2$ convergence after $k$ steps, whereas standard gradient descent, even with fancy step sizes has $1/k$. The convergence rate of $1/k^2$ is optimal for first-order methods. Here's a nice blog post about the algorithm.


In [1]:
import os
import time
from pprint import pformat
import simtk.openmm as mm
from simtk.openmm import app
from simtk.unit import *

In [2]:
# set up a generic Minimizer that runs a minimization and
# gives a nice report with timing information.

class BaseMinimizer(object):
    def __init__(self):
        # should set self.context (in the subclass)
        pass
    
    def minimize(self):
        # should do the minimization (in the subclass)
        pass

    def benchmark(self):
        initialEnergy = self.context.getState(getEnergy=True)\
            .getPotentialEnergy().value_in_unit(kilojoule_per_mole)
        startTime = time.time()
        self.minimize()
        endTime = time.time()
        finalEnergy = self.context.getState(getEnergy=True)\
            .getPotentialEnergy().value_in_unit(kilojoule_per_mole)

        reportLines = [
            '{name} with {platform} platform and {numParticles:d} particles',
            '  ({details})',
            '  initial energy = {initial:>{width}.4f} kJ/mol',
            '  final energy   = {final:>{width}.4f} kJ/mol',
            '  elapsed time   = {time:.4f} s',
            '',
        ]
        
        platform = self.context.getPlatform()
        properties = {k: platform.getPropertyValue(self.context, k) for k in platform.getPropertyNames()}
        deviceNames = [v for k, v in properties.items() if 'DeviceName' in k]
        
        report = os.linesep.join(reportLines).format(
            name = self.__class__.__name__, width=12,
            details=('device = ' + deviceNames[0] if len(deviceNames) > 0 else ''),
            numParticles=self.context.getSystem().getNumParticles(),
            platform=self.context.getPlatform().getName(),
            initial=initialEnergy, final=finalEnergy, time=(endTime-startTime))
        return report

We start with the minimizer that's build into OpenMM, which uses L-BFGS


In [3]:
class BFGSMinimizer(BaseMinimizer):
    def __init__(self, system, initialPositions, tolerance=1*kilojoules_per_mole, maxIterations=0):
        self.context = mm.Context(system, mm.VerletIntegrator(0))
        self.context.setPositions(initialPositions)
        self.tolerance = tolerance
        self.maxIterations = maxIterations

    def minimize(self):
        mm.LocalEnergyMinimizer.minimize(self.context, self.tolerance, self.maxIterations)

Now, let's try to implement a different minimizer with Nesterov's accelerated gradient descent. OpenMM's custom integrators are really flexible (!), so we can actually implement the method entirely in python and OpenMM will create GPU kernels to execute it for us.


In [4]:
class NesterovMinimizer(BaseMinimizer):
    """Local energy minimzation with Nesterov's accelerated gradient descent
    
    Parameters
    ----------
    system : mm.System
        The OpenMM system to minimize
    initialPositions : 2d array
        The positions to start from
    numIterations : int
        The number of iterations of minimization to run
    stepSize : int
        The step size. This isn't in time units.
    """
    def __init__(self, system, initialPositions, numIterations=1000, stepSize=1e-6):
        self.numIterations = numIterations

        integrator = mm.CustomIntegrator(stepSize)
        integrator.addGlobalVariable('a_cur', 0)
        integrator.addGlobalVariable('a_old', 0)
        integrator.addPerDofVariable('y_cur', 0)
        integrator.addPerDofVariable('y_old', 0)
        integrator.addComputeGlobal('a_cur', '0.5*(1+sqrt(1+(4*a_old*a_old)))')
        integrator.addComputeGlobal('a_old', 'a_cur')
        integrator.addComputePerDof('y_cur', 'x + dt*f')
        integrator.addComputePerDof('y_old', 'y_cur')
        integrator.addComputePerDof('x', 'y_cur + (a_old - 1) / a_cur * (y_cur - y_old)')

        self.context = mm.Context(system, integrator)
        self.context.setPositions(initialPositions)

    def minimize(self):
        self.context.getIntegrator().step(self.numIterations)

Let's get some files from the RCSB to test these minimizers.


In [5]:
import gzip
from io import BytesIO
from six.moves.urllib.request import urlopen
from pdbfixer.pdbfixer import PDBFixer
from simtk.openmm.app.internal.pdbstructure import PdbStructure

def downloadFromRCSB(PDB_ID):
    "Download a PDB from the RCSB, run PDBFixer, and return a topology and positions"
    url = "ftp://ftp.wwpdb.org/pub/pdb/data/structures/all/pdb/pdb"+PDB_ID.lower()+".ent.gz"
    response = urlopen(url)
    content = gzip.GzipFile(fileobj=BytesIO(response.read())).read()
    pdb = PdbStructure(content.decode().splitlines())
    fixer = PDBFixer(pdb)
    fixer.findMissingResidues()
    fixer.findNonstandardResidues()
    fixer.replaceNonstandardResidues()
    fixer.findMissingAtoms()
    fixer.addMissingAtoms()
    fixer.removeHeterogens(False)
    fixer.addMissingHydrogens(7.0)

    return fixer.topology, fixer.positions

I guess it's time to test these out...


In [6]:
for code in ['2EQQ', '2POR']:
    topology, positions = downloadFromRCSB(code)
    system = app.ForceField('amber99sbildn.xml').createSystem(topology)

    print('PDB_ID {}:'.format(code))
    print(NesterovMinimizer(system, positions, stepSize=1e-6, numIterations=500).benchmark())
    print(BFGSMinimizer(system, positions, maxIterations=500).benchmark())
    print('')


PDB_ID 2EQQ:
NesterovMinimizer with CUDA platform and 423 particles
  (device = GeForce GTX 660)
  initial energy =    -353.2573 kJ/mol
  final energy   =   -2891.6297 kJ/mol
  elapsed time   = 0.0181 s

BFGSMinimizer with CUDA platform and 423 particles
  (device = GeForce GTX 660)
  initial energy =    -353.2573 kJ/mol
  final energy   =   -3328.7600 kJ/mol
  elapsed time   = 0.4011 s


PDB_ID 2POR:
NesterovMinimizer with CUDA platform and 4291 particles
  (device = GeForce GTX 660)
  initial energy =  167722.4771 kJ/mol
  final energy   =   16021.9786 kJ/mol
  elapsed time   = 0.2871 s

BFGSMinimizer with CUDA platform and 4291 particles
  (device = GeForce GTX 660)
  initial energy =  167722.4771 kJ/mol
  final energy   =    5618.0848 kJ/mol
  elapsed time   = 1.1622 s



In [6]: