Tutorial 2 - AdaptiveMD Trajectory and Modelling Tasks

adaptivemd relies on ansynchronous simulation execution and analysis. The objects introduced in Tutorial 1 provide the basic interface used to create and organize this kind of work. In Tutorial 2 we will learn some more about the class Trajectory representing trajectory data and introduce a modeller task that uses the class PyEMMAAnalysis. We will also learn more about the task object representing any work, and worker objects who take & execute tasks.

Reopening a Project


In [1]:
from __future__ import print_function
from adaptivemd import Project, Trajectory

Let's open our tutorial project by its name. If you completed the previous example this should all work out of the box.


In [2]:
project = Project('tutorial')

This opened all connections to the MongoDB and Session so we can get started. Be sure to have an adaptivemdworker running to complete the tasks we queue in the project.

Let's see where we are. These numbers will depend on when you run this notebook relative to the tasks performed under the project name 'tutorial'. Unless you delete your project, it will accumulate models and files over time, as is our ultimate goal.


In [3]:
def print_summary(project):
    print("Quick summary of items in project")
    print("Tasks : %s" % project.tasks)
    print("Trajs : %s" % project.trajectories)
    print("Models: %s" % project.models)
    
print_summary(project)


Quick summary of items in project
Tasks : <StoredBundle for with 2 file(s) @ 0x10f95fb50>
Trajs : <ViewBundle for with 1 file(s) @ 0x10f95fd10>
Models: <StoredBundle for with 0 file(s) @ 0x10f95fa90>

Now restore our old python references to generate tasks by loading the previously used generators via their names.


In [4]:
engine = project.generators['openmm']
modeller = project.generators['pyemma']
pdb_file = project.files['initial_pdb']

Remember that we stored some files in the database. and of course you can look at them again, should that be important. File objects have a method get_file for printing the contents of text files.


In [5]:
print(pdb_file.get_file()[:1000] + ' [...]')


REMARK   1 CREATED WITH MDTraj 1.8.0, 2016-12-22
CRYST1   26.063   26.063   26.063  90.00  90.00  90.00 P 1           1 
MODEL        0
ATOM      1  H1  ACE A   1      -1.900   1.555  26.235  1.00  0.00          H   
ATOM      2  CH3 ACE A   1      -1.101   2.011  25.651  1.00  0.00          C   
ATOM      3  H2  ACE A   1      -0.850   2.954  26.137  1.00  0.00          H   
ATOM      4  H3  ACE A   1      -1.365   2.132  24.600  1.00  0.00          H   
ATOM      5  C   ACE A   1       0.182   1.186  25.767  1.00  0.00          C   
ATOM      6  O   ACE A   1       1.089   1.407  26.645  1.00  0.00          O   
ATOM      7  N   ALA A   2       0.302   0.256  24.807  1.00  0.00          N   
ATOM      8  H   ALA A   2      -0.588   0.102  24.354  1.00  0.00          H   
ATOM      9  CA  ALA A   2       1.498  -0.651  24.567  1.00  0.00          C   
ATOM     10  HA  ALA A   2       1.810  -0.944  25.570  1.00  0.00          H   
ATOM     11  CB  ALA A   2       1.054  -1.959  23.852 [...]

The Trajectory object

Before we talk about adaptivity, let's have a look at possibilities to generate trajectories. We assume that you successfully ran a first trajectory using a worker. Next, we talk about different ways to generate new trajectories.

Trajectories from a pdb

This will typically be the first step generating data in a project. Remember we already have a PDB stored from setting up the engine. if you want to start from this state, do as before:

  1. create the Trajectory object you want
  2. make a task to run the simulation
  3. submit the task so the simulation can run to create the data

A trajectory contains all necessary information to make itself from running the simulation program. It has

  1. a (hopefully unique) location: This will be the folder where all the associated files go
  2. an initial frame: the initial state to be used by the MD simulation package
  3. a length in frames to run
  4. the Engine: the actual MD package used to create the trajectory

Note, the Engine contains information about the topology and about which output files are generated. This is the essential information you will need for analysis, e.g. what is the filename of the trajectory file that contains the protein structure and what is its stride? Additionally, the Engine creates the call to the simulation program, so any other required arguments, file operations, and settings are provided in the object definition. The call is constructed in the run and extend methods, and contained in the returned task objects.

Let's first build a second Trajectory from scratch. In Tutorial 1 we used the helper method project.new_trajectory to create the trajectory.


In [6]:
file_name = next(project.traj_name)              # get a unique new filename 

trajectory = Trajectory(
    location=file_name,                          # this creates a new filename
    frame=pdb_file,                              # initial frame is the PDB
    length=100,                                  # length is 100 frames
    engine=engine                                # the engine to be used
)

To inspect data outside the adaptivemd package, we want to know where the project's trajectories are by their location. So when using the convenience function new_trajectory, this is set to a folder under the path resolving from {configuration.shared_path}/{project.name}/trajs/.


In [7]:
trajectory2 = project.new_trajectory(
    frame=pdb_file,
    length=100,
    engine=engine,
    number=1          # if more then one you get a list of trajectories
)
#trajectory2 = project.new_trajectory(pdb_file, 100, engine, number=1)

Like in the first example, now that we have the parameters of the Trajectory we can create the task to run simulations.

The Task object

First, an example


In [8]:
task_run = trajectory.run()

This was easy, but we can do some interesting stuff. Since we know the trajectory will exist now we can also extend by some frames. Remember, the trajectory does not really exist yet (not until we submit it and a worker executes it), but we can pretend that it does, since it's relevant propertier are set.


In [9]:
task_extend = trajectory.extend(50)
# Q: if you repeat, does this grow longer?
# H: consider that you are calling from same object

The only problem is to make sure the tasks are run in the correct order. This would not be a problem if the worker will run tasks in the order they are place in the queue, but that defeats the purpose of parallel runs. Therefore an extended tasks knows it is dependent on the existance of the source trajectory. The worker will hence only run a trajectory once the source exists.

Dependency in the project queue?

We might wonder at this point how we manage to construct the dependency graph between all tasks and how this is handled and optimized, etc...

Well, we don't. There is no dependency graph, at least not explicitely. Instead we check at some time among all tasks that should be run, which can be run. And this is easy to check, all dependent tasks need to be completed and must have succeeded. Then we can rely on their (the dependencies) results to exists and it makes sense to continue.

A real dependency graph would go even further and know about all future relations and you could identify bottleneck tasks which are necessary to allow other tasks to be run. We don't do that (yet). It could improve performance in the sense that you will run at optimal load balance and keep all workers as busy as possible. Consider our a attempt a first order dependency graph.


In [10]:
project.queue(task_run, task_extend)

A note on simulation length

Remember that we allow an engine to output multiple trajectory types with freely chosen strides? This could leads to some difficulty indexing frames in the trajectory. Imagine this (unrealistic) situation:

We have

  1. full trajectory with stride=10
  2. a reduced protein-only trajectory with stride=7

Now run a trajectory of length=300. We get

  1. 30+1 full (+1 for the initial frame) and
  2. 42+1 protein frames

That per se is no problem, but if you want to extend we only have a restart file for the very last frame and while this works for the full trajectory, for the protein trajectory you are 6 frames short. Just continuing and concatenating the two leads to a gap of 6+7=13 frames instead of 7. A small but potentially significant source of error.

So, compute the least common multiple of all strides using


In [11]:
engine.native_stride


Out[11]:
10

Chained Tasks

There is also a shorter way of writing this


In [12]:
#task = trajectory.run().extend(50)

This will run as two tasks that first runs the trajectory and then extend it by 50 frames (in native engine frames)

If you want to do that several times, you can pass a list of ints which is a shortcut for calling .extend(l1).extend(l2). ...


In [13]:
# TODO this doesnt work
# does it make sense? see Q/H above
task = trajectory.run().extend([10] * 10)


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-e779420a5801> in <module>()
      1 # TODO this doesnt work
      2 # does it make sense? see Q/H above
----> 3 task = trajectory.run().extend([10] * 10)

/Users/osz/adaptivemd/adaptivemd/engine/engine.pyc in extend(self, length, export_path)
    510 
    511         """
--> 512         t = self.generator.extend(self.trajectory, length, export_path=export_path)
    513 
    514         # this is not really necessary since we require internally that the

/Users/osz/adaptivemd/adaptivemd/engine/openmm/openmm.pyc in extend(self, source, length, resource_name, export_path, cpu_threads, gpu_contexts, mpi_rank)
    188         # create a new file, but with the same name, etc, just new length
    189         target = source.clone()
--> 190         target.length = len(source) + length
    191 
    192         t = TrajectoryExtensionTask(self, target, source, cpu_threads=cpu_threads,

TypeError: unsupported operand type(s) for +: 'int' and 'list'

This will create 10! tasks that each extend the previous one. Each of the task requires the previous one to finish, this way the dependency is preserved. You can use this to mimick using several restarts in between, we don't know which worker will actually start and which worker will continue or finish a trajectory.

Checking the results

Let's see if everything is going as we expect.


In [17]:
for t in project.trajectories:
    print(t.short, t.length)


sandbox:///{}/00000000/ 100
sandbox:///{}/00000001/ 150

If this works, then you should see one 100 frame trajectory from the setup (first example) and a second 150 length trajectory that we just generated by running 100 frames and extending it by another 50.

If not, there might be a problem or (more likely) the tasks are not finished yet. Just try the above cell again and see if it changes to the expected output.

project.trajectories will show you only existing trajectories. Not ones, that are planned or have been extended. If you want to see all the ones already in the project, you can look at project.files. Which is a bundle and bundles can be filtered. But first all files


In [18]:
for f in project.files:
    print(f)


file:///Users/osz/adaptivemd/adaptivemd/scripts/_run_.py
file:///Users/osz/adaptivemd/adaptivemd/engine/openmm/openmmrun.py
file:///Users/osz/adaptivemd/examples/files/alanine/alanine.pdb
file:///Users/osz/adaptivemd/examples/files/alanine/system.xml
file:///Users/osz/adaptivemd/examples/files/alanine/integrator.xml
sandbox:///projects/tutorial/trajs/00000000/
sandbox:///projects/tutorial/trajs/00000001/
sandbox:///projects/tutorial/trajs/00000001/

Now all files filtered by class Trajectory. DT is a little helper to convert time stamps into something readable.


In [19]:
from adaptivemd import DT

In [20]:
for t in project.files.c(Trajectory):
    print(t.short, t.length)
    if t.created:
        if t.created > 0:
            print('created  @ %s, %s' % (DT(t.created), t.exists))
        else:
            print('modified @ %s, %s' % (DT(-t.created), t.exists))
    else:
        print('not existent')


sandbox:///{}/00000000/ 100
created  @ 2018-08-02 13:25:55, True
sandbox:///{}/00000001/ 100
modified @ 2018-08-02 13:28:59, False
sandbox:///{}/00000001/ 150
created  @ 2018-08-02 13:28:59, True

We see the extended trajecory appears twice once with length 100 and once with length 150. This is correct, because the 100 frame trajectory was used and hence saved as a File in the database. But why does this one not appear in the list of trajectories? It was created first and had a timestamp of creation written to .created. This is the time when the worker finishes and was successful.

When a file is overwritten, it is marked as modified by setting a negative timestamp. So if

  1. trajectory.created is None, the file does not exist nor has it.
  2. trajectory.created > 0, the file exists
  3. trajectory.created < 0, the file existed but was overwritten to extend length

The project.trajectories ViewBundle contains files of type Trajectory with positive created index.


In [21]:
for t in project.trajectories:
    print("%s: %d, exists? %s" % (t.basename, t.length, t.exists))


00000000: 100, exists? True
00000001: 150, exists? True

The trajectory reference we have now refers to a non-existing directory, because it was overwritten and a concatenated trajectory object of length 150 is stored.

Dealing with errors

Just for fun let's produce an error by using a wrong initial pdb file.


In [22]:
trajectory = project.new_trajectory(engine['system_file'], 100)
task = engine.run(trajectory)
project.queue(task)

Well, nothing seems changed here but we expect the task to fail. So let's inspect what's happening.


In [24]:
task.state


Out[24]:
u'fail'

You might need to execute this cell several times. It will first become queued, then running and finally fail and stop there.

It failed as we knew it would. No suprise here, but why? Let's look at the stdout and stderr


In [25]:
print(task.stdout)


13:29:17 [worker:3] stdout from running task
GO...


In [26]:
print(task.stderr)


13:29:17 [worker:3] stderr from running task
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .dcd import DCDTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:5: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .binpos import BINPOSTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .xtc import XTCTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .trr import TRRTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:15: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .dtr import DTRTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .tng import TNGTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/__init__.py:54: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._lprmsd import lprmsd
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/geometry/angle.py:31: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from mdtraj.geometry import _geometry, distance
Traceback (most recent call last):
  File "openmmrun.py", line 255, in <module>
    args.system_xml, args.integrator_xml)
  File "openmmrun.py", line 118, in read_input
    returns.update({op_name: func(arg)})
  File "openmmrun.py", line 87, in get_pdbfile
    raise IOError("{} must end in '.pdb' for reading as PDB file".format(topology_pdb))
IOError: coordinates.xml must end in '.pdb' for reading as PDB file

We see what we expect. In openmmrun.py the openmm executable could not load the pdb file.

NOTE If your worker dies for some reason, it will not set a STDOUT or STDERR. If you think that your task should be able to execute, then you can do task.state = 'created' and reset it to be accessible to workers. This is NOT recommended, just to explain how this works.

More about the Trajectory Objects

If you have a Trajectory object and create the real trajectory file, you can also put the Trajectory directly into the queue. This is equivalent to call .run on the trajectory and submit the resulting Task to the queue. To check it's status, etc..., work with a corresponding Task object. One thing to note is that extending from a specific trajectory using the same trajectory object reference will currently result in overwritten extensions and lost data, along with incorrect storage of Trajectory files in project.trajectories. Be sure to either use task.extend or iterate through project.trajectories to extend existing trajectories, which will properly update the created attribute.


In [27]:
# project.queue(project.new_trajectory(pdb_file, 100, engine).run()) can be called as
project.queue(project.new_trajectory(pdb_file, 100, engine))

Trajectories from other trajectory data

This will be the most common case. At least in any remote kind of adaptivity you will not start always from the same position or extend. You want to pick any exisiting frame and continue from there. So, let's do that.

First we get a trajectory. Every Bundle in the project (e.g. .trajectories, .models, .files, .tasks) acts like an enhanced set. You can iterate over all entries as we did before, and you can get one element, which usually is the first stored, but not always. If you are interested in Bundles see the documentation. For now that is enough to know, that a bundle (as a set) has a .one function which is short for getting the first object if you iterate.


In [28]:
trajectory = project.trajectories.one

Good, at least 100 frames. We pick, say, frame at index 28 (which is the 29th frame, we start counting at zero) using the way you pick an element from a python list (which is almost what a Trajectory represents, a list of frames)


In [29]:
frame = trajectory[28]
print(frame, frame.exists)


Frame(sandbox:///{}/00000000/[28]) False

In [30]:
frame = trajectory[30]
print(frame, frame.exists)


Frame(sandbox:///{}/00000000/[30]) True

This part is important! We are running only one full atom trajectory with stride larger than one, so if we want to pick a frame from this trajectory you can pick in theory every frame, but only some of these really exist. If you want to restart from a frame this needs to be the case. Otherwise you run into trouble.

To run a trajectory just use the frame as the initial frame.


In [31]:
frame = trajectory[28]
task = project.new_trajectory(frame, 100, engine).run()
print(task)


None

In [32]:
frame = trajectory[30]
task = project.new_trajectory(frame, 100, engine).run()
print(task)


<adaptivemd.engine.engine.TrajectoryGenerationTask object at 0x10482aa50>

In [33]:
print(task.description)


Task: TrajectoryGenerationTask(OpenMMEngine) [created]

Sources
-- Unstaged
- sandbox:///{}/00000000/ [exists]
-- Staged
- staging:///integrator.xml 
- staging:///system.xml 
- staging:///openmmrun.py 
- staging:///alanine.pdb 

Targets
- sandbox:///{}/00000006/

Modified

Link('staging:///alanine.pdb' > 'worker://initial.pdb)
Link('staging:///system.xml' > 'worker://system.xml)
Link('staging:///integrator.xml' > 'worker://integrator.xml)
Link('staging:///openmmrun.py' > 'worker://openmmrun.py)
Link('sandbox:///{}/00000000/' > 'worker://source/)
mdconvert -o worker://input.pdb -i 3 -t worker://initial.pdb worker://source/master.dcd
Touch('worker://traj/')

j=0
tries=10
sleep=1

trajfile=traj/protein.dcd

while [ $j -le $tries ]; do if ! [ -s $trajfile ]; then python openmmrun.py -r --report-interval 1 -p CPU --types="{'protein':{'stride':1,'selection':'protein','name':null,'filename':'protein.dcd'},'master':{'stride':10,'selection':null,'name':null,'filename':'master.dcd'}}" -s system.xml -i integrator.xml -t worker://input.pdb --length 100 worker://traj/; fi; sleep 1; j=$((j+1)); done
Move('worker://traj/' > 'sandbox:///{}/00000006/)

See, how the actual frame picked in the mdconvert line is -i 3 meaning index 3 which represents frame 30 with stride 10.

Now, run the task.


In [34]:
project.queue(task)

One helpful tool you can use to wait until something happens is the method project.wait_until(condition). This is not so useful in notebooks, but in scripts it can be used to synchronize events in the workflow. condition here is a function that evaluates to True or False. It will be tested in regular intervals, and once it is True, the wait_until function returns.


In [35]:
trajectory.created


Out[35]:
1533230755.345511

And now wait until all events are finished. This method takes a condition and blocks until the condition is True.


In [36]:
project.wait_until(task.is_done)

In [37]:
task.state


Out[37]:
u'success'

In [38]:
print(task.stderr)


13:30:45 [worker:3] stderr from running task
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .dcd import DCDTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:5: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .binpos import BINPOSTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .xtc import XTCTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .trr import TRRTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:15: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .dtr import DTRTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .tng import TNGTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/__init__.py:54: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._lprmsd import lprmsd
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/geometry/angle.py:31: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from mdtraj.geometry import _geometry, distance
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .dcd import DCDTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:5: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .binpos import BINPOSTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .xtc import XTCTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .trr import TRRTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:15: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .dtr import DTRTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .tng import TNGTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/__init__.py:54: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._lprmsd import lprmsd
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/geometry/angle.py:31: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from mdtraj.geometry import _geometry, distance
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/utils/validation.py:116: TypeCastPerformanceWarning: Casting xyz dtype=float64 to <type 'numpy.float32'> 
  TypeCastPerformanceWarning)

Each Task has a function is_done that you can use. It will return if a task is done. That means it either failed or succeeded or was cancelled. Basically when it is not queued anymore.

If you want to run adaptively, all you need to do is to figure out where to start new simulations from and use the methods provided to run these.

Model Tasks

There are of course other things you can do besides creating new trajectories


In [39]:
from adaptivemd.analysis.pyemma import PyEMMAAnalysis

A model generating task works similar to trajectories. You create the generator with options and then you create a Task from passing it a list of trajectories to be analyzed.

The instance to compute an MSM model of trajectories that you pass it. It is initialized with a .pdb file that is used to define the topology. The features argument will be converted to a PyEMMA MDFeaturizer object. More on that later. The other two option chose which output type from the engine we want to analyse. We chose the protein trajectories since these are faster to load and have better time resolution.


In [40]:
modeller = PyEMMAAnalysis(
    engine=engine,
    outtype='protein',
    features={'add_inverse_distances': {'select_Backbone': None}}
).named('pyemma')

#modeller = project.generators['pyemma']

Again we name it pyemma for later reference. By the way, we now have a couple analysis generators with same parameters and name, but since they are different objects they can all be stored. Be careful with names...


In [41]:
task = modeller.execute(list(project.trajectories))
project.queue(task)

In [42]:
project.wait_until(task.is_done)

In [43]:
print(task.stderr)


13:31:51 [worker:3] stderr from running task
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/msmtools/estimation/dense/__init__.py:24: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import mle_trev
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/msmtools/estimation/dense/__init__.py:25: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import mle_trev_given_pi
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/msmtools/estimation/dense/tmatrix_sampler.py:38: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . sampler_rev import SamplerRev
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/msmtools/estimation/dense/tmatrix_sampler.py:39: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . sampler_revpi import SamplerRevPi
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/msmtools/estimation/sparse/__init__.py:32: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import mle_trev_given_pi
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/msmtools/estimation/sparse/__init__.py:33: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import mle_trev
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/bhmm/output_models/discrete.py:24: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from bhmm.output_models.impl_c import discrete as dc
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/bhmm/output_models/gaussian.py:24: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from bhmm.output_models.impl_c import gaussian as gc
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/bhmm/hidden/impl_c/__init__.py:3: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from bhmm.hidden.impl_c.hidden import *
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/thermotools/__init__.py:22: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import bar
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/thermotools/__init__.py:23: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import wham
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/thermotools/__init__.py:24: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import mbar
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/thermotools/__init__.py:25: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import tram
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/thermotools/__init__.py:26: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import dtram
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/thermotools/__init__.py:27: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import mbar_direct
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/thermotools/__init__.py:28: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from . import tram_direct
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/pyemma/thermo/estimators/TRAM_estimator.py:33: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from thermotools import trammbar as _trammbar
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/pyemma/thermo/estimators/TRAM_estimator.py:34: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from thermotools import trammbar_direct as _trammbar_direct
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:4: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .dcd import DCDTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:5: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .binpos import BINPOSTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:6: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .xtc import XTCTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:7: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .trr import TRRTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:15: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .dtr import DTRTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/formats/__init__.py:18: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from .tng import TNGTrajectoryFile
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/__init__.py:54: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from ._lprmsd import lprmsd
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/mdtraj/geometry/angle.py:31: RuntimeWarning: numpy.dtype size changed, may indicate binary incompatibility. Expected 96, got 88
  from mdtraj.geometry import _geometry, distance
/Users/osz/admd/miniconda3/envs/admdenv/lib/python2.7/site-packages/pyemma/__init__.py:134: UserWarning: Python 2.7 usage is deprecated. Future versions of PyEMMA will not support it. Please upgrade your Python installation.
  "Please upgrade your Python installation.", category=UserWarning)


In [44]:
print(task.stdout)


13:31:51 [worker:3] stdout from running task
02-08-18 13:31:49 pyemma.coordinates.data.featurization.featurizer.MDFeaturizer[0] WARNING  The 1D arrays input for add_inverse_distances() have been sorted, and index duplicates have been eliminated.
Check the output of describe() to see the actual order of the features
#trajectories : 4

The above modeller is used without storing first, and it becomes stored when queueing the task. Now, when you try to get a generator by name "pyemma", you can't be sure what you'll get. To show this, we use the c method on generator StoredBundle to see all the modellers.


In [45]:
print(modeller.__uuid__)
[print(g.name, g.__uuid__) for g in project.generators.c(PyEMMAAnalysis)]


144792017792079339759055817894162071954
pyemma 297636284461269466507679179728245227624
pyemma 144792017792079339759055817894162071954
Out[45]:
[None, None]

In [46]:
for m in project.models:
    print(m)


<adaptivemd.model.Model object at 0x104813b10>

So we generated one model. The Model objects contains (in the base version) a .data attribute, which is a dictionary of information about the generated model. As usual, a keys method can show you what data you can access. Let's look at the transition probability matrix $P$ and count matrix $C$ calculated from this data.


In [47]:
model = project.models.last

In [48]:
print(model['msm']['P'])


[[0.84444444 0.         0.0887199  0.         0.         0.
  0.06683566]
 [0.         0.78787879 0.         0.04363512 0.05329142 0.11519466
  0.        ]
 [0.09273342 0.         0.74074074 0.         0.01523732 0.0883101
  0.06297842]
 [0.         0.02489071 0.         0.77777778 0.         0.
  0.19733152]
 [0.         0.04493231 0.03567226 0.         0.75757576 0.16181967
  0.        ]
 [0.         0.05959917 0.12686411 0.         0.09929738 0.63157895
  0.08266039]
 [0.07129277 0.         0.06427081 0.12714431 0.         0.05872068
  0.67857143]]

In [49]:
model.data['msm']['C']


Out[49]:
array([[38.,  0.,  0.,  2.,  2.,  0.,  0.,  0.,  0.,  5.],
       [ 0., 52.,  0.,  0.,  0.,  2.,  0.,  3.,  9.,  0.],
       [ 0.,  0., 32.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 7.,  0.,  0., 40.,  0.,  0.,  0.,  0.,  4.,  3.],
       [ 0.,  0.,  0.,  0., 17.,  0.,  6.,  0.,  0.,  0.],
       [ 0.,  2.,  0.,  0.,  0., 35.,  0.,  0.,  0.,  8.],
       [ 0.,  0.,  0.,  0.,  6.,  0., 25.,  0.,  0.,  0.],
       [ 0.,  2.,  2.,  2.,  0.,  0.,  0., 25.,  4.,  0.],
       [ 0.,  2.,  0.,  8.,  0.,  0.,  0.,  7., 36.,  4.],
       [ 2.,  0.,  0.,  4.,  0.,  8.,  0.,  0.,  4., 38.]])

Pick frames automatically

The last thing that we are covering here is a function that can utilize models to decide which frames are better to start from. The simplest one will use the counts per state, take the inverse and use this as a distribution.


In [50]:
project.find_ml_next_frame(4)


Out[50]:
[Frame(sandbox:///{}/00000004/[10]),
 Frame(sandbox:///{}/00000006/[100]),
 Frame(sandbox:///{}/00000001/[80]),
 Frame(sandbox:///{}/00000004/[60])]

So you can pick states according to the newest (last) model. (This will be moved to the Brain). And since we want trajectories with these frames as starting points there is also a function for that


In [51]:
trajectories = project.new_ml_trajectory(length=100, number=4, engine=engine)
trajectories


Out[51]:
[Trajectory(Frame(sandbox:///{}/00000000/[50]) >> 00000007[0..100]),
 Trajectory(Frame(sandbox:///{}/00000006/[90]) >> 00000008[0..100]),
 Trajectory(Frame(sandbox:///{}/00000006/[20]) >> 00000009[0..100]),
 Trajectory(Frame(sandbox:///{}/00000006/[70]) >> 00000010[0..100])]

Let's submit these before we finish this notebook with a quick discussion of workers


In [52]:
project.queue(trajectories)

The Worker objects

Worker instances execute tasks for you. If you did not stop the worker in the command line it will still be running and you can check its state. Here we use the m method to filter stored objects


In [53]:
project.trigger()
for w in project.workers.m('state','running'):
#for w in project.workers:
#    if w.state == 'running':
        print('[%s- %s] %s:%s' % (w.state, DT(w.seen).time, w.hostname, w.cwd))


[running- 13:32:24] sampion:/Users/osz

Okay, the worker is running, was last reporting its heartbeat at ... and has a hostname and current working directory (where it was executed from). The generators specify which tasks from some generators are executed. If it is None then the worker runs all tasks it finds. You can use this to run specific workers for models and some for trajectory generation.

You can also control it remotely by sending it a command. shutdown will shut it down for you.


In [57]:
# project.workers.last.command = 'shutdown'
# Note that waiting for all trajs to exist is not
# generally a good progress mechanism
#  --> no break on task failure
project.wait_until(lambda: all([tj.exists for tj in trajectories]))
project.workers.all.execute("shutdown")


Out[57]:
[None, None, None]

Afterwards you need to restart you worker to continue with this examples.


In [55]:
project.close()

In [ ]: