First we cover some basics about adaptive sampling to get you going.
We will briefly talk about
In [1]:
import sys, os
Alright, let's load the package and pick the Project
since we want to start a project
In [2]:
from adaptivemd import Project
Let's open a project with a UNIQUE name. This will be the name used in the DB so make sure it is new and not too short. Opening a project will always create a non-existing project and reopen an exising one. You cannot chose between opening types as you would with a file. This is a precaution to not accidentally delete your project.
In [3]:
# Use this to completely remove the example-worker project from the database.
Project.delete('tutorial')
In [4]:
project = Project('tutorial')
Now we have a handle for our project. First thing is to set it up to work on a resource.
A Resource
specifies a shared filesystem with one or more clusteres attached to it. This can be your local machine or just a regular cluster or even a group of cluster that can access the same FS (like Titan, Eos and Rhea do).
Once you have chosen your place to store your results it is set for the project and can (at least should) not be altered since all file references are made to match this resource.
Let us pick a local resource on your laptop or desktop machine for now. No cluster / HPC involved for now.
In [6]:
from adaptivemd import LocalResource
We now create the Resource object
In [7]:
resource = LocalResource()
Since this object defines the path where all files will be placed, let's get the path to the shared folder. The one that can be accessed from all workers. On your local machine this is trivially the case.
In [8]:
resource.shared_path
Out[8]:
Okay, files will be placed in $HOME/adaptivemd/
. You can change this using an option when creating the Resource
LocalCluster(shared_path='$HOME/my/adaptive/folder/')
If you are interested in more information about Resource
setup consult the documentation about Resource
Last, we save our configured Resource
and initialize our empty prohect with it. This is done once for a project and should not be altered.
In [17]:
project.initialize(resource)
In [18]:
from adaptivemd import File, Directory
First we define a File
object. Instead of just a string, these are used to represent files anywhere, on the cluster or your local application. There are some subclasses or extensions of File
that have additional meta information like Trajectory
or Frame
. The underlying base object of a File
is called a Location
.
We start with a first PDB file that is located on this machine at a relative path
In [21]:
pdb_file = File('file://../files/alanine/alanine.pdb')
File
like any complex object in adaptivemd can have a .name
attribute that makes them easier to find later. You can either set the .name
property after creation, or use a little helper method .named()
to get a one-liner. This function will set .name
and return itself.
For more information about the possibilities to specify filelocation consult the documentation for File
In [ ]:
pdb_file.name = 'initial_pdb'
The .load()
at the end is important. It causes the File
object to load the content of the file and if you save the File
object, the actual file is stored with it. This way it can simply be rewritten on the cluster or anywhere else.
In [ ]:
pdb_file.load()
TaskGenerators are instances whose purpose is to create tasks to be executed. This is similar to the
way Kernels work. A TaskGenerator will generate Task
objects for you which will be translated into a ComputeUnitDescription
and executed. In simple terms:
The task generator creates the bash scripts for you that run a simulation or run pyemma.
A task generator will be initialized with all parameters needed to make it work and it will now what needs to be staged to be used.
In [48]:
from adaptivemd.engine.openmm import OpenMMEngine
A task generator that will create jobs to run simulations. Currently it uses a little python script that will excute OpenMM. It requires conda to be added to the PATH variable or at least openmm to be installed on the cluster. If you setup your resource correctly then this should all happen automatically.
So let's do an example for an OpenMM engine. This is simply a small python script that makes OpenMM look like a executable. It run a simulation by providing an initial frame, OpenMM specific system.xml and integrator.xml files and some additional parameters like the platform name, how often to store simulation frames, etc.
In [49]:
engine = OpenMMEngine(
pdb_file=pdb_file,
system_file=File('file://../files/alanine/system.xml').load(),
integrator_file=File('file://../files/alanine/integrator.xml').load(),
args='-r --report-interval 1 -p CPU'
).named('openmm')
We have now an OpenMMEngine which uses the previously made pdb File
object and uses the location defined in there. The same for the OpenMM XML files and some args to run using the CPU
kernel, etc.
Last we name the engine openmm
to find it later.
In [50]:
engine.name
Out[50]:
Next, we need to set the output types we want the engine to generate. We chose a stride of 10 for the master
trajectory without selection and a second trajectory with only protein atoms and native stride.
Note that the stride and all frame number ALWAYS refer to the native steps used in the engine. In out example the engine uses 2fs
time steps. So master stores every 20fs
and protein every 2fs
In [51]:
engine.add_output_type('master', 'master.dcd', stride=10)
engine.add_output_type('protein', 'protein.dcd', stride=1, selection='protein')
In [52]:
from adaptivemd.analysis.pyemma import PyEMMAAnalysis
The instance to compute an MSM model of existing trajectories that you pass it. It is initialized with a .pdb
file that is used to create features between the $c_\alpha$ atoms. This implementaton requires a PDB but in general this is not necessay. It is specific to my PyEMMAAnalysis show case.
In [53]:
modeller = PyEMMAAnalysis(
engine=engine,
outtype='protein',
features={'add_inverse_distances': {'select_Backbone': None}}
).named('pyemma')
Again we name it pyemma
for later reference.
The other two option chose which output type from the engine we want to analyse. We chose the protein trajectories since these are faster to load and have better time resolution.
The features dict expresses which features to use. In our case use all inverse distances between backbone c_alpha atoms.
Next step is to add these to the project for later usage. We pick the .generators
store and just add it. Consider a store to work like a set()
in python. It contains objects only once and is not ordered. Therefore we need a name to find the objects later. Of course you can always iterate over all objects, but the order is not given.
To be precise there is an order in the time of creation of the object, but it is only accurate to seconds and it really is the time it was created and not stored.
In [54]:
project.generators.add(engine)
project.generators.add(modeller)
Note, that you cannot add the same engine twice. But if you create a new engine it will be considered different and hence you can store it again.
Finally we are ready to run a first trajectory that we will store as a point of reference in the project. Also it is nice to see how it works in general.
We are using a Worker approach. This means simply that someone (in our case the user from inside a script or a notebook) creates a list of tasks to be done and some other instance (the worker) will actually do the work.
First we create the parameters for the engine to run the simulation. Since it seemed appropriate we use a Trajectory
object (a special File
with initial frame and length) as the input. You could of course pass these things separately, but this way, we can actualy reference the no yet existing trajectory and do stuff with it.
A Trajectory should have a unique name and so there is a project function to get you one. It uses numbers and makes sure that this number has not been used yet in the project.
In [56]:
trajectory = project.new_trajectory(engine['pdb_file'], 100, engine)
trajectory
Out[56]:
This says, initial is alanine.pdb
run for 100 frames and is named xxxxxxxx.dcd
.
You might wonder why a Trajectory
object is necessary. You could just build a function that will take these parameters and run a simulation. At the end it will return the trajectory object. The same object we created just now.
The main reason is to familiarize you with the general concept of asyncronous execution and so-called Promises. The trajectory object we built is similar to a Promise so what is that exactly?
A Promise is a value (or an object) that represents the result of a function at some point in the future. In our case it represents a trajectory at some point in the future. Normal promises have specific functions do deal with the unknown result, for us this is a little different but the general concept stands. We create an object that represents the specifications of a Trajectory
and so, regardless of the existence, we can use the trajectory as if it would exists:
Get the length
In [61]:
print trajectory.length
and since the length is fixed, we know how many frames there are and can access them
In [64]:
print trajectory[20]
ask for a way to extend the trajectory
In [65]:
print trajectory.extend(100)
ask for a way to run the trajectory
In [66]:
print trajectory.run()
We can ask to extend it, we can save it. We can reference specific frames in it before running a simulation. You could even build a whole set of related simulations this way without running a single frame. You might understand that this is pretty powerful especially in the context of running asynchronous simulations.
Last, we did not answer why we have two separate steps: Create the trajectory first and then a task from it. The main reason is educational:
It needs to be clear that a
Trajectory
can exist before running some engine or creating a task for it. TheTrajectory
is not a result of a simulation action.
Now, we want that this trajectory actually exists so we have to make it. This requires a Task
object that knows to describe a simulation. Since Task
objects are very flexible and can be complex there are helper functions (i.e. factories) to get these in an easy manner, like the ones we already created just before. Let's use the openmm engine to create an openmm task now.
In [57]:
task = engine.run(trajectory)
As an alternative you can directly use the trajectory (which knows its engine) and call .run()
In [58]:
task = trajectory.run()
That's it, just take a trajectory description and turn it into a task that contains the shell commands and needed files, etc.
Finally we need to add this task to the things we want to be done. This is easy and only requires saving the task to the project. This is done to the project.tasks
bundle and once it has been stored it can be picked up by any worker to execute it.
In [32]:
project.queue(task) # shortcut for project.tasks.add(task)
That is all we can do from here. To execute the tasks you need to run a worker using
adaptivemdworker -l tutorial --verbose
Once this is done, come back here and check your results. If you want you can execute the next cell which will block until the task has been completed.
In [33]:
print project.files
print project.trajectories
and close the project.
In [27]:
project.close()
The final project.close() will close the DB connection.
In [ ]: