This uses the state definition from [1]. 6 states named A,B,C,D,E and F
[1] W.-N. Du, K. A. Marino, and P. G. Bolhuis, “Multiple state transition interface sampling of alanine dipeptide in explicit solvent,” J. Chem. Phys., vol. 135, no. 14, p. 145102, 2011.
First we tell the ipython notebook to put figures inside the notebook
In [1]:
%matplotlib inline
%config InlineBackend.figure_format = 'svg'
And import lots of modules
In [2]:
# standard packages
import numpy as np
import mdtraj as md
import pandas as pd
import math
import random
import time
# helpers for phi-psi plotting
import alatools as ala
import matplotlib.pyplot as plt
# OpenMM
from simtk.openmm import app
import simtk.unit as unit
# OpenMMTools
import openmmtools as omt
# OpenPathSampling
import openpathsampling as paths
from openpathsampling.tools import refresh_output
# Visualization of PathTrees
from openpathsampling.visualize import PathTree
from IPython.display import SVG
# the openpathsampling OpenMM engine
import openpathsampling.engines.openmm as eng
Create an AlanineOpenMMSimulator for demonstration purposes
We will need a openmm.System
object and an openmm.Integrator
object.
To learn more about OpenMM, read the OpenMM documentation. The code we use here is based on output from the convenient web-based OpenMM builder.
We first create a snapshot using an existing PDB file. The contained topology is necessary for the OpenMM system object to compute all forces.
In [3]:
pdb_file = "../resources/AD_initial_frame.pdb"
In [4]:
template = eng.snapshot_from_pdb(pdb_file)
using AMBER96 as in the original paper with Tip3P water.
In [5]:
forcefield = app.ForceField('amber96.xml', 'tip3p.xml')
and we use the template to create the necessary OpenMM Topology
object. Note that an openmm topology is (a little bit) different from an mdtraj.Topology
so we need to convert.
In [6]:
pdb = app.PDBFile(pdb_file)
system = forcefield.createSystem(
pdb.topology,
nonbondedMethod=app.PME,
nonbondedCutoff=1.0*unit.nanometers,
constraints=app.HBonds,
rigidWater=True,
ewaldErrorTolerance=0.0005
)
In [7]:
integrator_low = omt.integrators.VVVRIntegrator(
temperature=300 * unit.kelvin, # temperature
timestep=2.0 * unit.femtoseconds # integration step size
)
integrator_low.setConstraintTolerance(0.00001)
we let OpenMM decide to use the fastest platform available. You can ask an OpenMM engine what platform it is using with the .platform
attribute. You can actually override this behaviour when you initialize the engine and pass either a string or a platform object.
In [8]:
print eng.Engine.available_platforms()
There are lots of options. For speed we pick mixed precision on GPUs. Most GPU programming libraries have difficulty with double precision. For our purposes this should be fine and it is faster than double precision. These are the options passed
In [9]:
platform = 'CUDA'
if platform == 'OpenCL':
openmm_properties = {
'OpenCLPrecision': 'mixed',
'OpenCLDeviceIndex': "2" # pick the correct index if you have several instances
}
elif platform == 'CUDA':
openmm_properties = {'CudaPrecision': 'mixed'}
elif platform == 'CPU':
openmm_properties = {}
else:
openmm_properties = {}
An engine in OpenPathSampling also has general options that are the same for any engine, like the number of steps until a frame is stored or the maximal number of steps until an engine will automatically stop.
In [10]:
engine_low_options = {
'n_frames_max': 5000,
'n_steps_per_frame': 10
}
Finally build the main engine from the parts
In [11]:
engine_low = eng.Engine(
template.topology,
system,
integrator_low,
openmm_properties=openmm_properties,
options=engine_low_options
)
engine_low.name = 'default'
For the exploration of state space and getting an initial trajectory we also want to simulate at a very high temperature. Here 750K. This is totally unphysical, but all we want is to generate conformations that are not completely wrong in the sense that they could exist at low temperatures.
Set a high temperature simulation for exploration.
For OpenMM all we need to change is the replace the integrator with a new temperature and half the stepsize. We will later pick only every second frame
In [12]:
integrator_high = omt.integrators.VVVRIntegrator(
temperature=1000 * unit.kelvin, # temperature
timestep=2.0 * unit.femtoseconds # integration step size
)
integrator_high.setConstraintTolerance(0.00001)
In [13]:
engine_high_options = {
'n_frames_max': 2000,
'n_steps_per_frame': 10 # twice as many steps with half stepsize
}
An clone the engine using a new integrator
In [14]:
engine_high = engine_low.from_new_options(
integrator=integrator_high,
options=engine_high_options)
engine_high.name = 'high'
For now we use the default engine at
In [15]:
engine_high.initialize(platform)
print 'High-Engine uses'
print 'platform `%s`' % engine_high.platform
print 'temperature `%6.2f K' % (
float(engine_high.integrator.getGlobalVariableByName('kT')) / 0.0083144621)
We open a new empty storage
In [16]:
storage_file = 'ala_mstis_bootstrap.nc'
storage = paths.Storage(storage_file, 'w')
And store both engines in it. Since these are named you can load these later using storage.engines[name]
.
In [17]:
storage.save(engine_low);
storage.save(engine_high);
And store a template for convenience.
In [18]:
storage.tag['template'] = template
We define states A-F. The specifications are taken from the paper [^1].
The list of state names
In [19]:
states = ['A', 'B', 'C', 'D', 'E', 'F']
If you just want to play around you might want to just simulate the first 4 states, which is much faster. You can do so by uncommenting the line below
In [20]:
states = ['A', 'B', 'C', 'D']
Define the centers.
In [21]:
state_centers = {
'A' : [-150, 150],
'B' : [-70, 135],
'C' : [-150, -65],
'D' : [-70, -50],
'E' : [50, -100],
'F' : [40, 65]
}
And the radii of the interfaces
In [22]:
interface_levels = {
'A' : [10, 20, 45, 65, 80],
'B' : [10, 20, 45, 65, 75],
'C' : [10, 20, 45, 60],
'D' : [10, 20, 45, 60],
'E' : [10, 20, 45, 65, 80],
'F' : [10, 20, 45, 65, 80],
}
And also save these information for later convenience
In [23]:
storage.tag['states'] = states
storage.tag['state_centers'] = state_centers
storage.tag['interface_levels'] = interface_levels
this generates an order parameter (callable) object named psi (so if we call psi(trajectory)
we get a list of the values of psi for each frame in the trajectory). This particular order parameter uses mdtraj's compute_dihedrals function, with the atoms in psi_atoms.
The .with_diskcache
will tell the CVs to also create some space to save the values of this CV in the same storage where the CV gets saved. Usually you want to save the direct CVs with diskcache while CVs that build upon these CVs should be fine. The reason is this:
Loading large snapshots from disk is expensive, while compute something from snapshots in memory is fast. So for analysis re-computing values is fine as long as we do not need to reload the snapshots into memory. In our case this means that the direct CVs (CVs that require accessing snapshots properties) are phi
and psi
since everything else is based upon this, while the distance to the state center opA
to opF
is an indirect CV and should not be stored.
In [24]:
psi_atoms = [6,8,14,16]
psi = paths.MDTrajFunctionCV(
name="psi",
f=md.compute_dihedrals,
topology=template.topology,
indices=[psi_atoms]
).with_diskcache()
phi_atoms = [4,6,8,14]
phi = paths.MDTrajFunctionCV(
name="phi",
f=md.compute_dihedrals,
topology=template.topology,
indices=[phi_atoms]
).with_diskcache()
storage.save([psi, phi]);
Define a function that defines a distance in periodic $\phi,\psi$-space.
In [25]:
def circle_degree(snapshot, center, phi, psi):
import numpy
p = numpy.array([phi(snapshot), psi(snapshot)]) / numpy.pi * 180.0
delta = numpy.abs(center - p)
delta = numpy.where(delta > 180.0, delta - 360.0, delta)
return numpy.hypot(delta[0], delta[1])
Create CVs for all states by using the circle_degree
function with different centers
In [26]:
cv_state = dict()
for state in state_centers:
op = paths.FunctionCV(
name = 'op' + state,
f=circle_degree,
center=np.array(state_centers[state]),
phi=phi,
psi=psi
)
cv_state[state] = op
Volume define regions in state space using given CVs. We define here the regions around the state centers. Their boundaries correspond to the interfaces as used for TIS. Crossing an interface thus corresponds to leaving an interface volume into the next larger one.
In [27]:
interface_sets = {}
for state, levels in interface_levels.iteritems():
interface_sets[state] = \
paths.VolumeInterfaceSet(cv_state[state], 0.0, levels)
Create Volume
objects for all states. In our setup we chose the lowest interface volume for the state definition .
In [28]:
vol_state = {}
for state, levels in interface_levels.iteritems():
vol_state[state] = interface_sets[state][0]
vol_state[state].name = state
We use a littl helper function specific for phi/psi plots to illustrate what is happening. This code will not affect the generation of data but will help in visualizing the system
In [29]:
# reload(ala)
plot = ala.TwoCVSpherePlot(
cvs=(phi, psi),
states=[vol_state[vol] for vol in states],
state_centers=[state_centers[vol] for vol in states],
interface_levels=[interface_levels[vol] for vol in states]
)
plot.new()
# plot the phi/psi plot and show all states and interfaces
plot.main()
# add the initial template
plot.add_snapshot(template, label='initial')
We want to simulate using multistate transition interface sampling and to alleviate
the effort of setting up all states, interfaces, appropriate pathensembles and
a scheme on how to generate samples we use some helper construct like in the other
examples. In our case the MSTISNetwork
comes first and contains the definitions of
all states, their interfaces as well as associated CVs.
To make use of the optional Multi-State Outer Interface MSOuterInterface
we need to define it. It will allow reversing samples that connect two states and overcomes the issue that in all present moves the initial state needs to remain the same.
In [30]:
ms_outers = paths.MSOuterTISInterface.from_lambdas(
{ifaces: max(ifaces.lambdas) + 5
for state, ifaces in interface_sets.items()}
)
In [31]:
mstis = paths.MSTISNetwork([
(vol_state[state], interface_sets[state]) # core, interface set
for state in states],
ms_outers=ms_outers
)
And save the network
In [32]:
storage.tag['network'] = mstis
Next, we need to specify the actual way in which we want to simulate the Network. This could be done by
constructing a suitable PathMover yourself or employing the power of a MoveScheme that will generate all nevessary movers for you. The MoveScheme also effectively knows which initial samples in which ensembles you need. Beside the ones indirectly defined by the network the movescheme might not need some of these because certain moves will not be used. It could also happen that a movescheme require an extra
ensemble to work for special moves that are not directly known from the network definition.
In our case we will use the DefaultScheme
that uses RepEx, Reversal, Shooting and MinusMoves.
In [33]:
scheme = paths.DefaultScheme(mstis)
Next, we want to generate initial pathways for all ensembles, e.g. The TISEnsembles
and MinusInterfaceEnsembles
. There are several ways to do so. A very easy approach is to explore the state space at a high temperature and wait until all states have been visited. Splitting the long trajectory appropriately would give us at least one valid trajectory for each TIS ensemble. In a second step we can use over sub parts of the long trajectory and extend these into minus ensembles.
So let's try this: We generate an ensemble that is True
if a trajectory is at least completely outside one state A-F. It will report false once all states have been hit, which is what we want. We will ues this as a stopping condition to generate a first long trajectory at high temperature.
In [34]:
hit_all_states_emsemble = paths.join_ensembles([paths.AllOutXEnsemble(vol_state[state]) for state in states])
Note that we can omit the .can_append
here since these are identical in their result and this is faster.
In [35]:
# initial_trajectories = engine_high.generate(template, [hit_all_states])
This will take some time, even when running at a high temperature. Since we are curious about the progress we will use an iterator version of this to monitor the progress while generating a trajectory. Note, that the two cells below will give the same result as the one above which we commented.
We will allow to interrupt the simulation and be restarted. So we setup the trajectory with one initial frame
In [36]:
trajectory = paths.Trajectory([template])
and extend it using iter_generate
. What this does (for the curious) is to run a normal simulation loop but interrupt at regular intervals returning the current (unfinished) trajectory. So you can check the status of the simulation and continue.
In [37]:
it = engine_high.iter_generate(
trajectory,
[hit_all_states_emsemble],
intervals=100)
# run the iterator until it is finished
for traj in it:
# get a list of found states for status report
found_states = ','.join(
state for state in states
if paths.PartInXEnsemble(vol_state[state])(traj))
visited_states = ','.join(
state for state in states
if paths.PartInXEnsemble(vol_state[state])(traj[-100:]))
# output status report
s = 'Ran %6d steps [%s]. Found states [%s] / visited [%s]\n' % (
len(traj) - 1,
(len(traj) - 1) * engine_high.snapshot_timestep,
found_states,
visited_states)
refresh_output(s)
# remember the last output for later. Python 2 will leak `traj` into
# local, but Python 3 does not and we want to be compatible
trajectory = traj
Output the long trajectory and see if we hit all states
In [38]:
plot.new()
plot.main()
plot.add_trajectory(trajectory)
It can be difficult to get initial samples for an ensemble for various reasons. And in fact for this example it might be the most tricky part in setting up the path simulation.
In theory you could just pick a frame and extend forward and backward until an ensemble reports cannot append anymore and check if that sample is in the ensemble. While this works in many cases it can be rather inefficient. For us it is better to reuse the previously computed trajectories as a basis to generate samples.
Bottomline: The next command just does what you would try to do with the long trajectory for you: Use split and extend to generate the required set of samples to start.
If you are interested in details keep reading, otherwise execute the next cell.
To do so, we use a number of functions from Ensemble
that will try to generate a sample from a list of initial trajectories in different ways:
ensemble.get_sample_from_trajectories
: directly search for candidate sample among the given trajectoriesensemble.split_sample_from_trajectories
: split the given trajectories in valid samplesensemble.extend_sample_from_trajectories
: try to extend suitable sub-trajectories into valid samplesInstead of calling these functions directly we make use of a function from MoveScheme
that uses its knowledge about the necessary ensembles to run these function for all necessary ensembles to create a suitable set of initial samples.
When doing this there are several aspects that we might want to take into account
The function scheme.initial_conditions_from_trajectories
can take care of all of these. We will run it with the default setting but also use the extend-complex
and extend-minimum
strategies besides split
. That means we try different strategies in a specific order.
split
: try to split the trajectory and look for valid samplesextend-complex
: try to look for suitable sub-trajectories that can be extended into valid samples. Currently this works only for MinusInterfaceEnsembles
and it will extend A-X-A
trajectories into Minus Ensembles.extend-minimum
: try to look for trajectories that cross an interface A-X
and try to extend these into a full ensemble. This one will work for TISEnsemble
and MinusInterfaceEnsemble
. We will use this last method to create minus ensembles for the last state found. Since we stop when we discover it, we will not have generated a sample of type A-X-A
. This might also happen if we only visit a state once.Note, that the creation of sample might fail: Extending might just not reach the desired target state. By default, it will make 3 attempts per extendable subtrajectory. The whole process may take some time. Basically because we need to split a very long trajectory and to use the shortest possible ones we need to really search for all possible sub-trajectories.
In [39]:
total_sample_set = scheme.initial_conditions_from_trajectories(
trajectory,
strategies=[
# 1. split and pick shortest and exclude for MinusInterfaceEnsembles
'split',
# 2. extend-complex (implemented for minus with segments A-X-A)
'extend-complex',
# 3. extend-minimal (implemented for minus and tis with crossings A-X)
'extend-minimal'],
engine=engine_high)
Let's see, if we were successful. If so, the next function should not show any missing or extra ensembles.
In [40]:
print scheme.initial_conditions_report(total_sample_set)
If it does, this means that our attempt to extend was probably not successful. So we can just try again until it does. To extend an existing sample_set you need to pass it to the function.
In [41]:
# loop until we are done
while scheme.check_initial_conditions(total_sample_set)[0]:
total_sample_set = scheme.initial_conditions_from_trajectories(
trajectory,
sample_set=total_sample_set, # this is new
strategies=[ # we omit the `split` strategy
'extend-complex',
'extend-minimal'],
engine=engine_high)
Done.
For the next steps we need the engine running at the desired (room) temperature. So we create a new engine. Since we might run on a local machine and do not have several instances of GPU available we will unload the existing engine and create a new one.
In [42]:
engine_high.unload_context() # delete the current openmm.Context object
engine_low.initialize(platform)
refresh_output('Low-Engine uses', refresh=False)
print 'platform `%s`' % engine_low.platform
print 'temperature `%6.2f K' % (
float(engine_low.integrator.getGlobalVariableByName('kT')) / 0.0083144621)
In molecular dynamics, you need to equilibrate if you don't start with an equilibrium frame (e.g., if you start with solvent molecules on a grid, your system should equilibrate before you start taking statistics). Similarly, if you start with a set of paths which are far from the path ensemble equilibrium, you need to equilibrate. This could either be because your trajectories are not from the real dynamics (generated with metadynamics, high temperature, etc.) or because your trajectories are not likely representatives of the path ensemble (e.g., if you put transition trajectories into all interfaces).
As with MD, running equilibration can be the same process as running the total simulation. However, in path sampling, it doesn't have to be: we can equilibrate without replica exchange moves or path reversal moves, for example. In the example below, we create a MoveScheme
that only includes shooting movers.
In [43]:
equil_scheme = paths.OneWayShootingMoveScheme(mstis, engine=engine_low)
Note, that some Schemes may change the actual required set of initial samples. For
the equilibration we apply shooting moves only to TIS ensembles and not to the
MinusInterfaceEnsemble
s.
Create the PathSampler object that can run the equilibration.
In [44]:
equilibration = paths.PathSampling(
storage=None,
sample_set=total_sample_set,
move_scheme=equil_scheme,
)
And run lots of equilibration steps. This is not really necessary, but we generated from the wrong path distribution and this should give more likeli paths for the right temperature.
In [45]:
equilibration.run(50) # increase to about 250 for better equilibration
equilibrated_sset = equilibration.sample_set
Let's have a quick look how many frames in TIS ensembles are already generated by the default engine at 300K
and how many were done using the high
engine at 1000K
.
In [46]:
engine_list = {}
unique_snapshots = set(sum(
[
samp.trajectory for samp in equilibrated_sset
if isinstance(samp.ensemble, paths.TISEnsemble)
],
[]))
for samp in equilibrated_sset:
if isinstance(samp.ensemble, paths.TISEnsemble):
for snap in samp.trajectory:
eng = snap.engine
engine_list[eng] = engine_list.get(eng, 0) + 1
for eng, counts in engine_list.items():
print '%20s : %6d' % (eng.name, counts)
Finally, save the list of samples for later without any reference to previous samples. This is necessary since usually a sample knows its origin sample and saving a sample as is, would trigger saving the whole history of all intermediate samples used. Since we are not interested in the history of equilibration, this saves a lot of storage.
In [47]:
storage.tag['sampleset'] = equilibrated_sset.copy_without_parents()
Let's have a final look at what we cover with all our initial samples.
In [48]:
plt.figure(21, (12,5))
plt.subplot(121)
plt.title('High temperature')
plot.main()
for s in total_sample_set:
plot.add_trajectory(s.trajectory, line=True)
plt.subplot(122)
plt.title('Equilibrated')
plot.main()
plot.zoom = 180/3.1415926
for s in equilibrated_sset:
plot.add_trajectory(s.trajectory, line=True)
A final check that we still have a valid sampleset
In [49]:
equilibrated_sset.sanity_check()
In [ ]:
print 'snapshots:', len(storage.snapshots)
print 'trajectories:', len(storage.trajectories)
print 'samples:', len(storage.samples)
print 'filesize:', storage.file_size_str
In [ ]:
storage.close()
In [ ]: