This notebook is an introduction to handling PathMover and MoveChange instances. It mostly covers the questions on
Load OPENPATHSAMPLING
In [1]:
import openpathsampling as p
Let's open a file that contains a mover and lot's of changes
In [2]:
st = p.storage.Storage('_toy_retis.nc', mode='r')
A Simulator creates simulation steps. So we load a single step.
In [3]:
mc = st.steps[3]
print mc
Each step posesses a movechange which we will get
In [17]:
pmc = mc.change
print pmc
And the mover that was used to generate this change
In [18]:
pm = pmc.mover
print pm.treeprint()
Let's first check if the pmc was really created by the pathmover
In [19]:
pmc in pm
Out[19]:
This should be obvious, but under the hood there is some more fancy stuff happening. The in keyword with pathmovers and changes actually works on trees of movers. These trees are used to represent a specific order or movers being called and these trees are unique for changes. This means there is a way to label each possible pmc that can be generated by a mover. This possible only refers to choices in actual movers.
This is due to special PathMovers that can pick one or more movers among a set of possibilities. The simplest of these is the OneWayShooter which choses equal between forward and backward shooting. So a OneWayShooter has two possible outcomes. Let's look at these
In [20]:
ow_mover = p.OneWayShootingMover([], []) # we use dummy arguments since are not going to use it
The property enum will evaluate the (potentially VERY large) number of all possible changes
In [21]:
list(ow_mover.enum)
Out[21]:
which is two in our case. Another example
In [22]:
ow_mover_2 = p.RandomChoiceMover([
p.OneWayShootingMover([], []),
p.OneWayShootingMover([], [])
])
In [23]:
list(ow_mover_2.enum)
Out[23]:
Now we have 4 choices as expected 2 for the RandomChoice * 2 for the OneWayShooter. Although effectively there are only 2 distinct ones. Why is that? The problem in ow_mover_2 is that we defined two separate instances of the OneWayShootingMover and in general different instances are considered different. In this case it might even be possible to check for equality, but we decided to leave this to the user. If you create two instances we assume there is a reason. Maybe just calling the differently (although it makes no sense for the monte carlo scheme).
The next example uses the same mover twice
In [24]:
ow_mover_3 = p.RandomChoiceMover([
ow_mover,
ow_mover
])
In [25]:
list(ow_mover_3.enum)
Out[25]:
And we get only two distinct movers as we wanted to.
Let's look at the real mover
In [26]:
all_changes = list(pm.enum)
print len(all_changes)
This one has 31. Let's if we can reconstruct that number. The pathmover first choses between 4 different move types.
In total we have 12 + 6 + 5 + 8 = 31 as enum predicted.
Note that this is only the maximal possible number of moves and not the real accessible moves. It could be that in a conditional mover some moves are always run in sequence. E.g. in our case the subtrajectoryselection should always work and hence the first 2 choices of the minus mover will never happen. It will always try a swap (which can fail if by chance the trajectory in the innermost interface crosses away from the state). So the effective number of choices will be 29 and not 31.
Let's do some more testing
First let's check if the keys we get for all changes in the storage are actually contained in the mover or put more simple the mover could have generated these changes. This should be true for all changes that originate from our mover.
In [27]:
print [pc in pm for pc in st.movechanges[0:20]]
What happened here? Why are only some changes in the storage made by our mover? Shouldn't all of them (except the 2 in the beginning) be the result of our pathmover? Yes and no. They are, but we are checking if the total change is made from a mover and we are also loading all subchanges subchanges.
In [28]:
print st.movechanges[2]
print st.movechanges[2] in pm
print
print st.movechanges[5]
print st.movechanges[5] in pm
We will now cache all changes to see if the next steps are really fast or not.
In [29]:
_ = list(st.movechanges)
_ = list(st.steps)
Now: How do we get all the changes that are really important? One way is to scan all changes and use only the ones from the mover
In [30]:
real_changes = filter(lambda x : x in pm, st.movechanges)
print len(real_changes), 'of', len(st.movechanges)
While this works it would be better to rely on the steps in the simulation and not on the changes itself. We might store some other changes for whatever reason, but we only want the ones that are associated with a MC step. The steps in the storage do exactly that and point to the changes that are used to generate the next sampleset that we wanted.
In [31]:
step_changes = [step.change for step in st.steps[1:]]
print len(step_changes)
We exclude the first step since it is the initialization which is not generated by our pathmover.
Let's now check which and how often one the 31 possible steps was called.
In [32]:
import collections
counter = collections.defaultdict(lambda: 0)
for ch in step_changes:
counter[ch.unique] += 1
In [36]:
s = '%d of %d different changes run' % (len(counter), len(list(pm.enum)))
print s, '\n', '-' * len(s), '\n'
for y in sorted(counter.items(), key=lambda x : -x[1]):
print
print y[1], 'x'
print y[0].treeprint()
We see that especially minus moves are underrepresented. This is due to the standard weighting of the minus move which runs minus moves much less frequent than other moves.
Let's just do that again and pick a random chance among all the possibilities. This is done only from the possibilities without regard for acceptance etc. We only assume that all possible choices for a single mover are equally likely.
In [39]:
pmc_list = [pm.random() for x in xrange(10000)]
In [46]:
counter2 = collections.defaultdict(lambda: 0)
for ch in pmc_list:
counter2[ch] += 1
In [55]:
s = '%d of %d different changes run' % (len(counter2), len(list(pm.enum)))
print s, '\n', '-' * len(s), '\n'
for y in sorted(counter2.items(), key=lambda x : -x[1]):
print (100.0 * y[1]) / len(pmc_list), '%', repr(y[0])
We realize that a specific shooter is called less likely than other movers
not sure if this is helpful. We could have additional weighting factors for the randomchoice, but each mover would have to know about it. Problem is that in most movers these weighting probabilities are dependent on the context and hence make no sense here!
One solution is to let a mover be able to give a formal expression for each possible choice, since each mover will only depend on
- results of accepted/rejected of submoves and ultimately if certain samples are accepted which in turn depends on
.validand.proposal_probability- some random numbers (choices)
So the SampleGeneratingMovers will return something like
[acc(samp[]), acc(samp[])]while other movers will return something that depends on the submovers, it could just be a list of products.
Each mover has an expression for it being accepted
Each mover has an expression for picking a specific choice
Finally
.random()will use the product of all probabilities and a specific value for accepting
In [57]:
print pm.treeprint()
Get the (unique) location of the randomchoicemover. You can search for Mover classes, Mover instances or by the .name property of a mover which is a string.
In [23]:
loc_rc = pm.locate('RootMover')
print loc_rc
the location object is effectively a tree mover instances described by nested tuples. For convenience it is wrapped to make searching easier and format the output.
In [24]:
print type(loc_rc)
print isinstance(loc_rc, tuple)
print repr(loc_rc)
print str(loc_rc)
In most cases you can use python tuples instead of TupleTree. The structure of a tree of tuples looks as follows:
Each node has the following structure
(head, (child1, ), (child2, ), ...)
where head is a the node content (almost always a Mover instance) and child is again a node. It is important to note that a tuple tree does not support choices of different children like a real pathmover can. Therefore we could generate a tree tuple of the content of a pathmover but it will not reflect the possible choices in it. To get the associated tree tuple of a change (which has no choice in it) we can use .unique.
In [25]:
print pmc
print repr(pmc.unique)
Instead of checking for a pmc directly we can also check for the tuple tree representation
In [26]:
print pmc
print pmc in pm # check if pmc could have been generated by pm
print pmc.unique in pm # check if the tuple tree representation could have been generated by pm
Now get the mover at the loc_rc location (which is of course a RandomChoiceMover).
In [27]:
rc = pm[loc_rc]
These are the weights by mover
In [28]:
dict(zip(rc.movers, rc.weights))
Out[28]:
So a shooting move is about 30x more likely than a minus move.
Finally do some more checks
In [29]:
print rc in pmc # check if the RandomChoiceMover was called in pmc
print pc in pm # check if the pathmover has a RandomChoiceMover at that position in the tree
Note, that certain trees of movers can be arranged differently in a tree, but result in the same possible steps. Like
Sequential(Forward, Sequential([Forward, Forward]))andSequential([Forward, Forward, Forward]). We can regard this as being associative (placing arbitrary brackets) but we will not check for these types of equality. We assume that you arrange steps in a certain way for a reason and that your grouping of sequenced will reflect a certain logical idea. On the other hand are your locators / keys dependend on that choice!
What else can we learn or ask for?
We can check if a particular mover was run. Let's pick the very last MinusExtensionDirectionChoser. To check for it we can either just check for the instance in a pmc
In [30]:
loc_medc = pm.locate('MinusExtensionDirectionChooser')
medc = pm[loc_medc]
print medc
In [31]:
%%time
for pc in step_changes:
if medc in pc:
print repr(pc)
print pc.unique
print
the expression locator in pmc checks for the overall appearance of a specific mover somewhere dependent of the location. While using mover_instance in pmc check for the appearance of that mover instance independent of the location. If a particular mover_instance appears only once then both methods are equal.
In [32]:
print pmc
print loc_medc in pm
print medc in pm
print loc_medc in pmc
print medc in pmc
In this case the MinusExtentionDirectionChoose was not called in this particular pmc.
.unique
In [33]:
first_minus_change = filter(lambda x : p.MinusMover in x, step_changes)[0]
In [34]:
print first_minus_change.unique
will be interpreted as
In [36]:
print first_minus_change.unique
In [47]:
pm.map_tree(lambda x : len(x.name))
Out[47]:
In [38]:
%%timeit
pm.enum
In [39]:
%%timeit
pmc in pm
In [40]:
%%timeit
[p in pmc for p in pm.enum]
In [41]:
%%timeit
pm in pm
In [42]:
%%timeit
pmc.unique in pm