AdaptiveMD

Example 1 - Setup

0. Imports


In [1]:
import sys, os

We want to stop RP from reporting all sorts of stuff for this example so we set a specific environment variable to tell RP to do so. If you want to see what RP reports change it to REPORT.


In [2]:
# verbose = os.environ.get('RADICAL_PILOT_VERBOSE', 'REPORT')
os.environ['RADICAL_PILOT_VERBOSE'] = 'ERROR'

We will import the appropriate parts from AdaptiveMD as we go along so it is clear what it needed at what stage. Usually you will have the block of imports at the beginning of your script or notebook as suggested in PEP8.


In [3]:
from adaptivemd import Project


/Users/jan-hendrikprinz/anaconda/lib/python2.7/site-packages/radical/utils/atfork/stdlib_fixer.py:58: UserWarning: logging module already imported before fixup.
  warnings.warn('logging module already imported before fixup.')

Let's open a project with a UNIQUE name. This will be the name used in the DB so make sure it is new and not too short. Opening a project will always create a non-existing project and reopen an exising one. You cannot chose between opening types as you would with a file. This is a precaution to not accidentally delete your project.


In [4]:
project = Project('test')

Now we have a handle for our project. First thing is to set it up to work on a resource.

1. Set the resource

What is a resource? A Resource specifies a shared filesystem with one or more clusteres attached to it. This can be your local machine or just a regular cluster or even a group of cluster that can access the same FS (like Titan, Eos and Rhea do).

Once you have chosen your place to store your results this way it is set for the project and can (at least should) not be altered since all file references are made to match this resource. Currently you can use the Fu Berlin Allegro Cluster or run locally. There are two specific local adaptations that include already the path to your conda installation. This simplifies the use of openmm or pyemma.

Let us pick a local resource on a laptop for now.


In [5]:
from adaptivemd import LocalCluster AllegroCluster

In [6]:
resource_id = 'local.jhp'

if resource_id == 'local.jhp':
    project.initialize(LocalJHP())
elif resource_id == 'local.sheep':
    project.initialize(LocalSheep())
elif resource_id == 'fub.allegro':
    project.initialize(AllegroCluster())

TaskGenerators

TaskGenerators are instances whose purpose is to create tasks to be executed. This is similar to the way Kernels work. A TaskGenerator will generate Task objects for you which will be translated into a ComputeUnitDescription and executed. In simple terms:

The task generator creates the bash scripts for you that run a simulation or run pyemma.

A task generator will be initialized with all parameters needed to make it work and it will now what needs to be staged to be used.


In [7]:
from adaptivemd.engine.openmm import OpenMMEngine
from adaptivemd.analysis.pyemma import PyEMMAAnalysis

from adaptivemd import File, Directory

The engine

A task generator that will create jobs to run simulations. Currently it uses a little python script that will excute OpenMM. It requires conda to be added to the PATH variable or at least openmm to be installed on the cluster. If you setup your resource correctly then this should all happen automatically.

First we define a File object. These are used to represent files anywhere, on the cluster or your local application. File like any complex object in adaptivemd can have a .name attribute that makes them easier to find later.


In [8]:
pdb_file = File('file://../files/alanine/alanine.pdb').named('initial_pdb')

Here we used a special prefix that can point to specific locations.

  • file:// points to files on your local machine.
  • unit:// specifies files on the current working directory of the executing node. Usually these are temprary files for a single execution.
  • shared:// specifies the root shared FS directory (e.g. NO_BACKUP/ on Allegro) Use this to import and export files that are already on the cluster.
  • staging:// a special scheduler specific directory where files are moved after they are completed on a node and should be used for later. Use this to relate to files that should be stored or reused. After you one excution is done you usually move all important files to this place.
  • sandbox:// this should not concern you and is a special RP folder where all pilot/session folders are located.

So let's do an example for an OpenMM engine. This is simply a small python script that makes OpenMM look like a executable. It run a simulation by providing an initial frame, OpenMM specific system.xml and integrator.xml files and some additional parameters like the platform name, how often to store simulation frames, etc.


In [9]:
engine = OpenMMEngine(
    pdb_file=pdb_file,
    system_file=File('file://../files/alanine/system.xml'),
    integrator_file=File('file://../files/alanine/integrator.xml'),
    args='-r --report-interval 1 -p CPU --store-interval 1'
).named('openmm')

To explain this we have now an OpenMMEngine which uses the previously made pdb File object and uses the location defined in there. The same some Files for the OpenMM XML files and some args to store each frame (to keep it fast) and run using the CPU kernel.

Last we name the engine openmm to find it later.


In [10]:
engine.name


Out[10]:
'openmm'

The modeller

The instance to compute an MSM model of existing trajectories that you pass it. It is initialized with a .pdb file that is used to create features between the $c_\alpha$ atoms. This implementaton requires a PDB but in general this is not necessay. It is specific to my PyEMMAAnalysis show case.


In [11]:
modeller = PyEMMAAnalysis(
    pdb_file=pdb_file
).named('pyemma')

Again we name it pyemma for later reference.

Add generators to project

Next step is to add these to the project for later usage. We pick the .generators store and just add it. Consider a store to work like a set() in python. It contains objects only once and is not ordered. Therefore we need a name to find the objects later. Of course you can always iterate over all objects, but the order is not given.

To be precise there is an order in the time of creation of the object, but it is only accurate to seconds and it really is the time it was created and not stored.


In [12]:
import datetime
datetime.datetime.fromtimestamp(modeller.__time__).strftime("%Y-%m-%d %H:%M:%S")


Out[12]:
'2017-02-27 00:24:04'

In [13]:
project.generators.add(engine)
project.generators.add(modeller)

In [14]:
print project.generators


<StoredBundle with 0 file(s) @ 0x11f942810>

Note, that you cannot add the same engine twice. But if you create a new engine it will be considered different and hence you can store it again.

Create one intial trajectory

Finally we are ready to run a first trajectory that we will store as a point of reference in the project. Also it is nice to see how it works in general.

1. Open a scheduler

a job on the cluster to execute tasks

the .get_scheduler function delegates to the resource and uses the get_scheduler functions from there. This is merely a convenience since a Scheduler has the responsibility to open queues on the resource for you.

You have the same options as the queue has in the resource. This is often the number of cores and walltime, but can be additional ones, too.

Let's open the default queue and use a single core for it since we only want to run one simulation.


In [15]:
scheduler = project.get_scheduler(cores=1)

Next we create the parameter for the engine to run the simulation. Since it seemed appropriate we use a Trajectory object (a special File with initial frame and length) as the input. You could of course pass these things separately, but this way, we can actualy reference the no yet existing trajectory and do stuff with it.

A Trajectory should have a unique name and so there is a project function to get you one. It uses numbers and makes sure that this number has not been used yet in the project.


In [16]:
trajectory = project.new_trajectory(engine['pdb_file'], 100)
trajectory


Out[16]:
Trajectory('alanine.pdb' >> 00000000.dcd[0..100])

This says, initial is alanine.pdb run for 100 frames and is named xxxxxxxx.dcd.

Now, we want that this trajectory actually exists so we have to make it (on the cluster which is waiting for things to do). So we need a Task object to run a simulation. Since Task objects are very flexible there are helper functions to get them to do, what you want, like the ones we already created just before. Let's use the openmm engine to create an openmm task


In [17]:
task = engine.task_run_trajectory(trajectory)

That's it, just that a trajectory description and turn it into a task that contains the shell commands and needed files, etc.

Last step is to really run the task. You can just use a scheduler as a function or call the .submit() method.


In [18]:
scheduler(task)


Out[18]:
[<adaptivemd.task.Task at 0x1214190d0>]

Now we have to wait. To see, if we are done, you can check the scheduler if it is still running tasks.


In [26]:
scheduler.is_idle


* unit.000000  state Failed (None), out/err:  / 
task did not complete
Out[26]:
True

In [35]:
print scheduler.generators


<StoredBundle with 0 file(s) @ 0x11f942810>

or you wait until it becomes idle using .wait()


In [27]:
# scheduler.wait()

If all went as expected we will now have our first trajectory.


In [28]:
print project.files
print project.trajectories


<StoredBundle with 0 file(s) @ 0x11f942850>
<ViewBundle with 0 file(s) @ 0x11f942890>

Excellent, so cleanup and close our queue


In [67]:
scheduler.exit()

and close the project.


In [68]:
project.close()

The final project.close() will also shut down all open schedulers for you, so the exit command would not be necessary here. It is relevant if you want to exit the queue as soon as possible to save walltime.

Summary

You have finally created an AdaptiveMD project and run your first trajectory. Since the project exists now, it is much easier to run more trajectories now.


In [ ]: