Replacing ligand parameters in an already-parametrized system

This example applies SMIRNOFF-format parameters to a BRD4 inhibitor from the living review on binding free energy benchmark systems by Mobley and Gilson. The BRD4 system comes from the accompanying GitHub repository.

This example uses ParmEd to take a protein-ligand system parameterized with an alternate force field, and replace the force field used for the ligand with an OpenFF force field. This example is meant to illustrate how to apply parameters to a single ligand, but it's also easy to process many ligands.

Loading the already-parametrized system


In [1]:
# Retrieve protein and ligand files for BRD4 and a docked inhibitor from the benchmark systems GitHub repository
# https://github.com/MobleyLab/benchmarksets
import requests
repo_url = 'https://raw.githubusercontent.com/MobleyLab/benchmarksets/master/input_files/'
sources = {
    'system.prmtop' : repo_url + 'BRD4/prmtop-coords/BRD4-1.prmtop',
    'system.crd'   : repo_url + 'BRD4/prmtop-coords/BRD4-1.crds',
    'ligand.sdf'   : repo_url + 'BRD4/sdf/ligand-1.sdf',
    'ligand.pdb'   : repo_url + 'BRD4/pdb/ligand-1.pdb'
}
for (filename, url) in sources.items():
    r = requests.get(url)
    open(filename, 'w').write(r.text)

In [2]:
#Read AMBER to ParmEd Structure object
import parmed
in_prmtop = 'system.prmtop'
in_crd = 'system.crd'
orig_structure = parmed.amber.AmberParm(in_prmtop, in_crd)

Let's inspect the unique molecules in the system


In [3]:
pieces = orig_structure.split()
for piece in pieces:
    print(f"There are {len(piece[1])} instance(s) of {piece[0]}")


There are 1 instance(s) of <AmberParm 2035 atoms; 121 residues; 2064 bonds; PBC (orthogonal); parametrized>
There are 1 instance(s) of <AmberParm 26 atoms; 1 residues; 28 bonds; PBC (orthogonal); parametrized>
There are 32 instance(s) of <AmberParm 1 atoms; 1 residues; 0 bonds; PBC (orthogonal); NOT parametrized>
There are 35 instance(s) of <AmberParm 1 atoms; 1 residues; 0 bonds; PBC (orthogonal); NOT parametrized>
There are 11000 instance(s) of <AmberParm 3 atoms; 1 residues; 3 bonds; PBC (orthogonal); parametrized>
  • The first molecule species has 2035 atoms, so it's probably the protein
  • The second molecule species has 26 atoms, which is the size of our ligand
  • The third and fourth molecule species have 32 and 35 copies, respectively, and one atom each. They are probably counterions
  • The fifth molecule species has 11,000 copies with three atoms each, so these are our waters.

We could drill into the ParmEd objects to find more about these if needed.

It's important to note that pieces[1] is the parameterized ligand, as we will be replacing it further down in this example. If you apply this notebook to a system with a different number of components, or with objects in a different order, you may need to change some of the code below accordingly.

Generating an Open Force Field Topology for the ligand

Here we assume a complicated scenario -- We have a SDF of our ligand available (ligand.sdf), containing bond orders and enough detail about the molecule for us to parameterize the ligand. However, this SDF does not necessarily have the same atom indexing or coordinates as the original ligand in system.prmtop and system.crd. If we mix up the ligand atom indices and try to use the original ligand coordinates, the ligand's initial geometry will be nonsense. So, we've also got a copy of the ligand as ligand.pdb (which we could have extracted from a dump of our system to PDB format, if desired), and we're going to use that as a reference to get the atom indexing right.

This example will use the simtk.openmm.app.PDBFile class to read ligand.pdb and then use Topology.from_openmm to create an OpenFF Topology that contains the ligand in the correct atom ordering.

If you know that this indexing mismatch will never occur for your data sources, and that your SDFs always contain the correct ordering, you can skip this step by simply running ligand_off_topology = ligand_off_molecule.to_topology()


In [4]:
from openforcefield.topology import Molecule, Topology
from simtk.openmm.app import PDBFile

ligand_off_molecule = Molecule('ligand.sdf')
ligand_pdbfile = PDBFile('ligand.pdb')
ligand_off_topology = Topology.from_openmm(ligand_pdbfile.topology, 
                                           unique_molecules=[ligand_off_molecule])

Parametrizing the ligand

Note: Even though we plan to constrain bond lengths to hydrogen, we load "openff_unconstrained-1.0.0.offxml". This is because our workflow will involve loading the OFF-parametrized ligand using ParmEd, which applies its own hydrogen bonds at a later time, and will fail if it attempts to maniuplate an OpenMM system that already contains them.

Here we begin by loading a SMIRNOFF force field -- in this case, the OpenFF-1.0 force field, "Parsley".

Once loaded, we create a new OpenMM system containing the ligand, then use ParmEd to create a Structure from that system. We'll re-combine this Structure object with those for the protein, ions, etc. later.


In [5]:
# Load the SMIRNOFF-format Parsley force field
from openforcefield.typing.engines.smirnoff import ForceField
force_field = ForceField('openff_unconstrained-1.0.0.offxml')

ligand_system = force_field.create_openmm_system(ligand_off_topology)
new_ligand_structure = parmed.openmm.load_topology(ligand_off_topology.to_openmm(),
                                                   ligand_system,
                                                   xyz=pieces[1][0].positions)

It's possible to save out ligand parameters at this point, if desired; here we do so to AMBER and GROMACS format just for inspection.


In [6]:
new_ligand_structure.save('tmp.prmtop', overwrite=True)
new_ligand_structure.save('tmp.inpcrd', overwrite=True)
new_ligand_structure.save('tmp.gro', overwrite=True)
new_ligand_structure.save('tmp.top', overwrite=True)

Check for discrepancies between the original ligand and its replacement

Here we check that the number of atoms are the same, and the same elements occur in the same order. This will catch many (but not all) errors where someone provided an SDF file for a different ligand than the one present in the system. It will miss errors where they happen to provide a different ligand with the same number of atoms, the same elements, in the same order -- which is unlikely to happen, but not impossible.


In [7]:
# Check how many atoms and which order elements are in the new ligand
n_atoms_new = len(new_ligand_structure.atoms)
elements_new = [atom.element for atom in new_ligand_structure.atoms]

# Check how many atoms and which order elements are in the old ligand
old_ligand_structure, n_copies = pieces[1]
n_atoms_old = len(old_ligand_structure.atoms)
elements_old = [atom.element for atom in old_ligand_structure.atoms]

print(f"There are {n_atoms_old} in the old ligand structure and {n_atoms_new} atoms "
      f"in the new ligand structure")

# Print out error message if number of atoms doesn't match
if n_atoms_new != n_atoms_old:
    print("Error: Number of atoms in input ligand doesn't match number extracted "
          "from prmtop file.")

if elements_new != elements_old:
    print("Error: Elements in input ligand don't match elements in the ligand "
          "from the prmtop file.")
    print(f"Old elements: {elements_old}")
    print(f"New elements: {elements_new}")


There are 26 in the old ligand structure and 26 atoms in the new ligand structure

That looks OK -- we're seeing a consistent number of atoms in both structures, and no errors about inconsistent elements. That means we're OK to proceed and start combining our ParmEd Structure objects.

Combine receptor and ligand structures

Now, we make a new ParmEd Structure for the complex, and begin adding the pieces of our system back together. Recall that above, we used ParmEd to split different portions of the system into a list of tuples called pieces, where the list items are tuples consisting of (Structure, N) where N denotes the number of times that piece occurs. We have just one protein, for example, but many water molecules.

Here, we begin by combining our original protein with our new ligand.

We also print out a lot of info as we do so just to check that we're ending up with the number of atom types we expect.


In [8]:
# Create a new, empty system
complex_structure = parmed.Structure()

# Add the protein
complex_structure += pieces[0][0]

print("BEFORE SYSTEM COMBINATION (just protein)")
print("Unique atom names:", sorted(list(set([atom.atom_type.name for atom in complex_structure]))))
print("Number of unique atom types:", len(set([atom.atom_type for atom in complex_structure])))
print("Number of unique epsilons:", len(set([atom.epsilon for atom in complex_structure])))
print("Number of unique sigmas:", len(set([atom.sigma for atom in complex_structure])))
print()

print("BEFORE SYSTEM COMBINATION (just ligand)")
print("Unique atom names:", sorted(list(set([atom.atom_type.name for atom in new_ligand_structure]))))
print("Number of unique atom types:", len(set([atom.atom_type for atom in new_ligand_structure])))
print("Number of unique epsilons:", len(set([atom.epsilon for atom in new_ligand_structure])))
print("Number of unique sigmas:", len(set([atom.sigma for atom in new_ligand_structure])))
print()

# Add the ligand
complex_structure += new_ligand_structure

print("AFTER LIGAND ADDITION (protein+ligand)")
print("Unique atom names:", sorted(list(set([atom.atom_type.name for atom in complex_structure]))))
print("Number of unique atom types:", len(set([atom.atom_type for atom in complex_structure])))
print("Number of unique epsilons:", len(set([atom.epsilon for atom in complex_structure])))
print("Number of unique sigmas:", len(set([atom.sigma for atom in complex_structure])))


BEFORE SYSTEM COMBINATION (just protein)
Unique atom names: ['2C', '3C', 'C', 'C*', 'C8', 'CA', 'CB', 'CC', 'CN', 'CO', 'CR', 'CT', 'CW', 'CX', 'H', 'H1', 'H4', 'H5', 'HA', 'HC', 'HO', 'HP', 'HS', 'N', 'N2', 'N3', 'NA', 'NB', 'O', 'O2', 'OH', 'S', 'SH']
Number of unique atom types: 33
Number of unique epsilons: 14
Number of unique sigmas: 14

BEFORE SYSTEM COMBINATION (just ligand)
Unique atom names: ['C1', 'C2', 'H1', 'H2', 'H3', 'N1']
Number of unique atom types: 6
Number of unique epsilons: 5
Number of unique sigmas: 5

AFTER LIGAND ADDITION (protein+ligand)
Unique atom names: ['2C', '3C', 'C', 'C*', 'C1', 'C2', 'C8', 'CA', 'CB', 'CC', 'CN', 'CO', 'CR', 'CT', 'CW', 'CX', 'H', 'H1', 'H2', 'H3', 'H4', 'H5', 'HA', 'HC', 'HO', 'HP', 'HS', 'N', 'N1', 'N2', 'N3', 'NA', 'NB', 'O', 'O2', 'OH', 'S', 'SH']
Number of unique atom types: 39
Number of unique epsilons: 19
Number of unique sigmas: 19

This looks good. We see that the protein alone has 33 atom types, which have 14 unique sigma/epsilon values, and the ligand has six atom types with five unique sigma/epsilon values. After combining, we end up with 39 atom types having 19 unique sigma and epsilon values, which is correct.

If you're astute, you'll notice the number of atom names doesn't add up. That's OK -- the atom names are just cosmetic attributes and don't affect the assigned parameters.

Add the ions and water back into the system

Remember, we split our system into protein + ligand + ions + water, and then we took out and replaced the ligand, generating a new Structure of the complex. Now we need to re-insert the ions and the water. First we'll handle the ions.

Here, ParmEd has a convenient overload of the multiplication operator, so that if we want a Structure with N copies of an ion, we just ask it to multiply the Structure of an individual ion by the number of occurrences of that ion.


In [9]:
# Add ions
just_ion1_structure = parmed.Structure()
just_ion1_structure += pieces[2][0]
just_ion1_structure *= len(pieces[2][1])

just_ion2_structure = parmed.Structure()
just_ion2_structure += pieces[3][0]
just_ion2_structure *= len(pieces[3][1])

complex_structure += just_ion1_structure
complex_structure += just_ion2_structure

print("AFTER ION ADDITION (protein+ligand+ions)")
print("Unique atom names:", sorted(list(set([atom.atom_type.name for atom in complex_structure]))))
print("Number of unique atom types:", len(set([atom.atom_type for atom in complex_structure])))
print("Number of unique epsilons:", len(set([atom.epsilon for atom in complex_structure])))
print("Number of unique sigmas:", len(set([atom.sigma for atom in complex_structure])))


AFTER ION ADDITION (protein+ligand+ions)
Unique atom names: ['2C', '3C', 'C', 'C*', 'C1', 'C2', 'C8', 'CA', 'CB', 'CC', 'CN', 'CO', 'CR', 'CT', 'CW', 'CX', 'Cl-', 'H', 'H1', 'H2', 'H3', 'H4', 'H5', 'HA', 'HC', 'HO', 'HP', 'HS', 'N', 'N1', 'N2', 'N3', 'NA', 'NB', 'Na+', 'O', 'O2', 'OH', 'S', 'SH']
Number of unique atom types: 41
Number of unique epsilons: 21
Number of unique sigmas: 21

Finally, we do that same thing for the water present in our system:


In [10]:
# Add waters

just_water_structure = parmed.Structure()
just_water_structure += pieces[4][0]
just_water_structure *= len(pieces[4][1])

complex_structure += just_water_structure

print("AFTER WATER ADDITION (protein+ligand+ions+water)")
print("Unique atom names:", sorted(list(set([atom.atom_type.name for atom in complex_structure]))))
print("Number of unique atom types:", len(set([atom.atom_type for atom in complex_structure])))
print("Number of unique epsilons:", len(set([atom.epsilon for atom in complex_structure])))
print("Number of unique sigmas:", len(set([atom.sigma for atom in complex_structure])))


AFTER WATER ADDITION (protein+ligand+ions+water)
Unique atom names: ['2C', '3C', 'C', 'C*', 'C1', 'C2', 'C8', 'CA', 'CB', 'CC', 'CN', 'CO', 'CR', 'CT', 'CW', 'CX', 'Cl-', 'H', 'H1', 'H2', 'H3', 'H4', 'H5', 'HA', 'HC', 'HO', 'HP', 'HS', 'HW', 'N', 'N1', 'N2', 'N3', 'NA', 'NB', 'Na+', 'O', 'O2', 'OH', 'OW', 'S', 'SH']
Number of unique atom types: 43
Number of unique epsilons: 22
Number of unique sigmas: 22

Now that we've re-combined the system, handle the coordinates and box vectors

The above dealt with the chemical topology and parameters for the system, which is most of what we need -- but not quite all. We still have to deal with the coordinates, and also with the information on the simulation box. So, our final stage of setup is to handle the coordinates and box vectors. This is straightforward -- we just need to copy the original coordinates and box vectors. Nothing fancy is needed:


In [11]:
# Copy over the original coordinates and box vectors
complex_structure.coordinates = orig_structure.coordinates
complex_structure.box_vectors = orig_structure.box_vectors

Export to AMBER and GROMACS formats

We started off in AMBER format, and presumably may want to continue in that format -- so let's write out to AMBER and GROMACS format:


In [12]:
# Export the Structure to AMBER files
complex_structure.save('complex.prmtop', overwrite=True)
complex_structure.save('complex.inpcrd', overwrite=True)

# Export the Structure to Gromacs files
complex_structure.save('complex.gro', overwrite=True)
complex_structure.save('complex.top', overwrite=True)

That should conclude our work in this example. However, perhaps we should just doublecheck by ensuring we can actually run some dynamics on the combined system without any trouble.

As a test, run some dynamics on the combined system

First, we create an OpenMM system, as we've done in other examples here. We can do this, in this case, using ParmEd's built-in createSystem functionality already attached to the combined Structure. We ask for a reasonable cutoff, constrained hydrogen bonds (note that this keyword argument overrides the fact that we use the unconstrained force field above; the ligand (and all other molecules in the system) will have covalent bonds to hydrogen constrainted), PME, and rigid water:


In [13]:
from simtk.openmm import app, unit, LangevinIntegrator
import numpy as np
from parmed.openmm import NetCDFReporter


system = complex_structure.createSystem(nonbondedMethod=app.PME,
                                         nonbondedCutoff=9*unit.angstrom,
                                         constraints=app.HBonds,
                                         rigidWater=True)

Next we'll set up the integrator, a reporter to write the trajectory, pick the timestep, and then go on to minimize the energy and run a very short amount of dynamics after setting the temperature to 300K:


In [14]:
integrator = LangevinIntegrator(300*unit.kelvin, 
                                1/unit.picosecond, 
                                0.001*unit.picoseconds)
simulation = app.Simulation(complex_structure.topology, system, integrator)

# Depending on where your system came from, you may want to 
# add something like (30, 30, 30)*Angstrom to center the protein 
# (no functional effect, just visualizes better)
#simulation.context.setPositions(complex_structure.positions + np.array([30, 30, 30])*unit.angstrom)

simulation.context.setPositions(complex_structure.positions)

nc_reporter = NetCDFReporter('trajectory.nc', 10)
simulation.reporters.append(nc_reporter)

In [15]:
simulation.minimizeEnergy()
minimized_coords = simulation.context.getState(getPositions=True).getPositions()

In [ ]:
simulation.context.setVelocitiesToTemperature(300*unit.kelvin)
simulation.step(1000)