This example applies SMIRNOFF-format parameters to a BRD4 inhibitor from the living review on binding free energy benchmark systems by Mobley and Gilson. The BRD4 system comes from the accompanying GitHub repository.
This example uses ParmEd to take a protein-ligand system parameterized with an alternate force field, and replace the force field used for the ligand with an OpenFF force field. This example is meant to illustrate how to apply parameters to a single ligand, but it's also easy to process many ligands.
In [1]:
# Retrieve protein and ligand files for BRD4 and a docked inhibitor from the benchmark systems GitHub repository
# https://github.com/MobleyLab/benchmarksets
import requests
repo_url = 'https://raw.githubusercontent.com/MobleyLab/benchmarksets/master/input_files/'
sources = {
'system.prmtop' : repo_url + 'BRD4/prmtop-coords/BRD4-1.prmtop',
'system.crd' : repo_url + 'BRD4/prmtop-coords/BRD4-1.crds',
'ligand.sdf' : repo_url + 'BRD4/sdf/ligand-1.sdf',
'ligand.pdb' : repo_url + 'BRD4/pdb/ligand-1.pdb'
}
for (filename, url) in sources.items():
r = requests.get(url)
open(filename, 'w').write(r.text)
In [2]:
#Read AMBER to ParmEd Structure object
import parmed
in_prmtop = 'system.prmtop'
in_crd = 'system.crd'
orig_structure = parmed.amber.AmberParm(in_prmtop, in_crd)
Let's inspect the unique molecules in the system
In [3]:
pieces = orig_structure.split()
for piece in pieces:
print(f"There are {len(piece[1])} instance(s) of {piece[0]}")
We could drill into the ParmEd objects to find more about these if needed.
It's important to note that pieces[1]
is the parameterized ligand, as we will be replacing it further down in this example. If you apply this notebook to a system with a different number of components, or with objects in a different order, you may need to change some of the code below accordingly.
Here we assume a complicated scenario -- We have a SDF of our ligand available (ligand.sdf
), containing bond orders and enough detail about the molecule for us to parameterize the ligand. However, this SDF does not necessarily have the same atom indexing or coordinates as the original ligand in system.prmtop
and system.crd
. If we mix up the ligand atom indices and try to use the original ligand coordinates, the ligand's initial geometry will be nonsense. So, we've also got a copy of the ligand as ligand.pdb
(which we could have extracted from a dump of our system to PDB format, if desired), and we're going to use that as a reference to get the atom indexing right.
This example will use the simtk.openmm.app.PDBFile
class to read ligand.pdb
and then use Topology.from_openmm
to create an OpenFF Topology that contains the ligand in the correct atom ordering.
If you know that this indexing mismatch will never occur for your data sources, and that your SDFs always contain the correct ordering, you can skip this step by simply running ligand_off_topology = ligand_off_molecule.to_topology()
In [4]:
from openforcefield.topology import Molecule, Topology
from simtk.openmm.app import PDBFile
ligand_off_molecule = Molecule('ligand.sdf')
ligand_pdbfile = PDBFile('ligand.pdb')
ligand_off_topology = Topology.from_openmm(ligand_pdbfile.topology,
unique_molecules=[ligand_off_molecule])
Here we begin by loading a SMIRNOFF force field -- in this case, the OpenFF-1.0 force field, "Parsley".
Once loaded, we create a new OpenMM system containing the ligand, then use ParmEd to create a Structure
from that system. We'll re-combine this Structure
object with those for the protein, ions, etc. later.
In [5]:
# Load the SMIRNOFF-format Parsley force field
from openforcefield.typing.engines.smirnoff import ForceField
force_field = ForceField('openff_unconstrained-1.0.0.offxml')
ligand_system = force_field.create_openmm_system(ligand_off_topology)
new_ligand_structure = parmed.openmm.load_topology(ligand_off_topology.to_openmm(),
ligand_system,
xyz=pieces[1][0].positions)
It's possible to save out ligand parameters at this point, if desired; here we do so to AMBER and GROMACS format just for inspection.
In [6]:
new_ligand_structure.save('tmp.prmtop', overwrite=True)
new_ligand_structure.save('tmp.inpcrd', overwrite=True)
new_ligand_structure.save('tmp.gro', overwrite=True)
new_ligand_structure.save('tmp.top', overwrite=True)
Here we check that the number of atoms are the same, and the same elements occur in the same order. This will catch many (but not all) errors where someone provided an SDF file for a different ligand than the one present in the system. It will miss errors where they happen to provide a different ligand with the same number of atoms, the same elements, in the same order -- which is unlikely to happen, but not impossible.
In [7]:
# Check how many atoms and which order elements are in the new ligand
n_atoms_new = len(new_ligand_structure.atoms)
elements_new = [atom.element for atom in new_ligand_structure.atoms]
# Check how many atoms and which order elements are in the old ligand
old_ligand_structure, n_copies = pieces[1]
n_atoms_old = len(old_ligand_structure.atoms)
elements_old = [atom.element for atom in old_ligand_structure.atoms]
print(f"There are {n_atoms_old} in the old ligand structure and {n_atoms_new} atoms "
f"in the new ligand structure")
# Print out error message if number of atoms doesn't match
if n_atoms_new != n_atoms_old:
print("Error: Number of atoms in input ligand doesn't match number extracted "
"from prmtop file.")
if elements_new != elements_old:
print("Error: Elements in input ligand don't match elements in the ligand "
"from the prmtop file.")
print(f"Old elements: {elements_old}")
print(f"New elements: {elements_new}")
That looks OK -- we're seeing a consistent number of atoms in both structures, and no errors about inconsistent elements. That means we're OK to proceed and start combining our ParmEd Structure
objects.
Now, we make a new ParmEd Structure
for the complex, and begin adding the pieces of our system back together. Recall that above, we used ParmEd to split different portions of the system into a list of tuples called pieces
, where the list items are tuples consisting of (Structure
, N
) where N
denotes the number of times that piece occurs. We have just one protein, for example, but many water molecules.
Here, we begin by combining our original protein with our new ligand.
We also print out a lot of info as we do so just to check that we're ending up with the number of atom types we expect.
In [8]:
# Create a new, empty system
complex_structure = parmed.Structure()
# Add the protein
complex_structure += pieces[0][0]
print("BEFORE SYSTEM COMBINATION (just protein)")
print("Unique atom names:", sorted(list(set([atom.atom_type.name for atom in complex_structure]))))
print("Number of unique atom types:", len(set([atom.atom_type for atom in complex_structure])))
print("Number of unique epsilons:", len(set([atom.epsilon for atom in complex_structure])))
print("Number of unique sigmas:", len(set([atom.sigma for atom in complex_structure])))
print()
print("BEFORE SYSTEM COMBINATION (just ligand)")
print("Unique atom names:", sorted(list(set([atom.atom_type.name for atom in new_ligand_structure]))))
print("Number of unique atom types:", len(set([atom.atom_type for atom in new_ligand_structure])))
print("Number of unique epsilons:", len(set([atom.epsilon for atom in new_ligand_structure])))
print("Number of unique sigmas:", len(set([atom.sigma for atom in new_ligand_structure])))
print()
# Add the ligand
complex_structure += new_ligand_structure
print("AFTER LIGAND ADDITION (protein+ligand)")
print("Unique atom names:", sorted(list(set([atom.atom_type.name for atom in complex_structure]))))
print("Number of unique atom types:", len(set([atom.atom_type for atom in complex_structure])))
print("Number of unique epsilons:", len(set([atom.epsilon for atom in complex_structure])))
print("Number of unique sigmas:", len(set([atom.sigma for atom in complex_structure])))
This looks good. We see that the protein alone has 33 atom types, which have 14 unique sigma/epsilon values, and the ligand has six atom types with five unique sigma/epsilon values. After combining, we end up with 39 atom types having 19 unique sigma and epsilon values, which is correct.
If you're astute, you'll notice the number of atom names doesn't add up. That's OK -- the atom names are just cosmetic attributes and don't affect the assigned parameters.
Remember, we split our system into protein + ligand + ions + water, and then we took out and replaced the ligand, generating a new Structure
of the complex. Now we need to re-insert the ions and the water. First we'll handle the ions.
Here, ParmEd has a convenient overload of the multiplication operator, so that if we want a Structure
with N copies of an ion, we just ask it to multiply the Structure
of an individual ion by the number of occurrences of that ion.
In [9]:
# Add ions
just_ion1_structure = parmed.Structure()
just_ion1_structure += pieces[2][0]
just_ion1_structure *= len(pieces[2][1])
just_ion2_structure = parmed.Structure()
just_ion2_structure += pieces[3][0]
just_ion2_structure *= len(pieces[3][1])
complex_structure += just_ion1_structure
complex_structure += just_ion2_structure
print("AFTER ION ADDITION (protein+ligand+ions)")
print("Unique atom names:", sorted(list(set([atom.atom_type.name for atom in complex_structure]))))
print("Number of unique atom types:", len(set([atom.atom_type for atom in complex_structure])))
print("Number of unique epsilons:", len(set([atom.epsilon for atom in complex_structure])))
print("Number of unique sigmas:", len(set([atom.sigma for atom in complex_structure])))
Finally, we do that same thing for the water present in our system:
In [10]:
# Add waters
just_water_structure = parmed.Structure()
just_water_structure += pieces[4][0]
just_water_structure *= len(pieces[4][1])
complex_structure += just_water_structure
print("AFTER WATER ADDITION (protein+ligand+ions+water)")
print("Unique atom names:", sorted(list(set([atom.atom_type.name for atom in complex_structure]))))
print("Number of unique atom types:", len(set([atom.atom_type for atom in complex_structure])))
print("Number of unique epsilons:", len(set([atom.epsilon for atom in complex_structure])))
print("Number of unique sigmas:", len(set([atom.sigma for atom in complex_structure])))
The above dealt with the chemical topology and parameters for the system, which is most of what we need -- but not quite all. We still have to deal with the coordinates, and also with the information on the simulation box. So, our final stage of setup is to handle the coordinates and box vectors. This is straightforward -- we just need to copy the original coordinates and box vectors. Nothing fancy is needed:
In [11]:
# Copy over the original coordinates and box vectors
complex_structure.coordinates = orig_structure.coordinates
complex_structure.box_vectors = orig_structure.box_vectors
In [12]:
# Export the Structure to AMBER files
complex_structure.save('complex.prmtop', overwrite=True)
complex_structure.save('complex.inpcrd', overwrite=True)
# Export the Structure to Gromacs files
complex_structure.save('complex.gro', overwrite=True)
complex_structure.save('complex.top', overwrite=True)
That should conclude our work in this example. However, perhaps we should just doublecheck by ensuring we can actually run some dynamics on the combined system without any trouble.
First, we create an OpenMM system, as we've done in other examples here. We can do this, in this case, using ParmEd's built-in createSystem
functionality already attached to the combined Structure
. We ask for a reasonable cutoff, constrained hydrogen bonds (note that this keyword argument overrides the fact that we use the unconstrained
force field above; the ligand (and all other molecules in the system) will have covalent bonds to hydrogen constrainted), PME, and rigid water:
In [13]:
from simtk.openmm import app, unit, LangevinIntegrator
import numpy as np
from parmed.openmm import NetCDFReporter
system = complex_structure.createSystem(nonbondedMethod=app.PME,
nonbondedCutoff=9*unit.angstrom,
constraints=app.HBonds,
rigidWater=True)
Next we'll set up the integrator, a reporter to write the trajectory, pick the timestep, and then go on to minimize the energy and run a very short amount of dynamics after setting the temperature to 300K:
In [14]:
integrator = LangevinIntegrator(300*unit.kelvin,
1/unit.picosecond,
0.001*unit.picoseconds)
simulation = app.Simulation(complex_structure.topology, system, integrator)
# Depending on where your system came from, you may want to
# add something like (30, 30, 30)*Angstrom to center the protein
# (no functional effect, just visualizes better)
#simulation.context.setPositions(complex_structure.positions + np.array([30, 30, 30])*unit.angstrom)
simulation.context.setPositions(complex_structure.positions)
nc_reporter = NetCDFReporter('trajectory.nc', 10)
simulation.reporters.append(nc_reporter)
In [15]:
simulation.minimizeEnergy()
minimized_coords = simulation.context.getState(getPositions=True).getPositions()
In [ ]:
simulation.context.setVelocitiesToTemperature(300*unit.kelvin)
simulation.step(1000)