In [8]:
# if our large test file is available, use it. Otherwise, use file generated from toy_mstis_2_run.ipynb
import os
test_file = "../toy_mstis_1k_OPS1.nc"
filename = test_file if os.path.isfile(test_file) else "mstis.nc"
print 'Using file `%s` for analysis' % filename


Using file `../toy_mstis_1k_OPS1.nc` for analysis

Analyzing the MSTIS simulation

Included in this notebook:

  • Opening files for analysis
  • Rates, fluxes, total crossing probabilities, and condition transition probabilities
  • Per-ensemble properties such as path length distributions and interface crossing probabilities
  • Move scheme analysis
  • Replica exchange analysis
  • Replica move history tree visualization
  • Replaying the simulation
  • MORE TO COME! Like free energy projections, path density plots, and more

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import openpathsampling as paths
import numpy as np

The optimum way to use storage depends on whether you're doing production or analysis. For analysis, you should open the file as an AnalysisStorage object. This makes the analysis much faster.


In [3]:
%%time
storage = paths.AnalysisStorage(filename)


CPU times: user 18.8 s, sys: 797 ms, total: 19.5 s
Wall time: 20 s

In [4]:
mstis = storage.networks.first

Reaction rates

TIS methods are especially good at determining reaction rates, and OPS makes it extremely easy to obtain the rate from a TIS network.

Note that, although you can get the rate directly, it is very important to look at other results of the sampling (illustrated in this notebook and in notebooks referred to herein) in order to check the validity of the rates you obtain.

By default, the built-in analysis calculates histograms the maximum value of some order parameter and the pathlength of every sampled ensemble. You can add other things to this list as well, but you must always specify histogram parameters for these two. The pathlength is in units of frames.


In [5]:
mstis.hist_args['max_lambda'] = { 'bin_width' : 0.05, 'bin_range' : (0.0, 0.5) }
mstis.hist_args['pathlength'] = { 'bin_width' : 5, 'bin_range' : (0, 150) }

In [6]:
%%time
mstis.rate_matrix(storage.steps[:], force=True)


/Users/jan-hendrikprinz/Studium/git/openpathsampling/openpathsampling/numerics/wham.py:334: RuntimeWarning: invalid value encountered in divide
  addends_k = np.divide(numerator_byQ, sum_over_Z_byQ)
/Users/jan-hendrikprinz/Studium/git/openpathsampling/openpathsampling/numerics/wham.py:407: RuntimeWarning: invalid value encountered in double_scalars
  output[val] = sum_k_Hk_Q[val] / sum_w_over_Z
CPU times: user 12.2 s, sys: 234 ms, total: 12.4 s
Wall time: 12.6 s
Out[6]:
{x|opA(x) in [0.0, 0.2]} {x|opB(x) in [0.0, 0.2]} {x|opC(x) in [0.0, 0.2]}
{x|opA(x) in [0.0, 0.2]} NaN 3.34326e-05 0.000105517
{x|opB(x) in [0.0, 0.2]} 0.00105543 NaN 0.000595478
{x|opC(x) in [0.0, 0.2]} 2.11191e-05 0 NaN

The self-rates (the rate of returning the to initial state) are undefined, and return not-a-number.

The rate is calcuated according to the formula:

$$k_{AB} = \phi_{A,0} P(B|\lambda_m) \prod_{i=0}^{m-1} P(\lambda_{i+1} | \lambda_i)$$

where $\phi_{A,0}$ is the flux from state A through its innermost interface, $P(B|\lambda_m)$ is the conditional transition probability (the probability that a path which crosses the interface at $\lambda_m$ ends in state B), and $\prod_{i=0}^{m-1} P(\lambda_{i+1} | \lambda_i)$ is the total crossing probability. We can look at each of these terms individually.

Total crossing probability


In [8]:
stateA = storage.volumes["A"]
stateB = storage.volumes["B"]
stateC = storage.volumes["C"]

In [9]:
tcp_AB = mstis.transitions[(stateA, stateB)].tcp
tcp_AC = mstis.transitions[(stateA, stateC)].tcp
tcp_BC = mstis.transitions[(stateB, stateC)].tcp
tcp_BA = mstis.transitions[(stateB, stateA)].tcp
tcp_CA = mstis.transitions[(stateC, stateA)].tcp
tcp_CB = mstis.transitions[(stateC, stateB)].tcp

plt.plot(tcp_AB.x, tcp_AB, '-r')
plt.plot(tcp_CA.x, tcp_CA, '-k')
plt.plot(tcp_BC.x, tcp_BC, '-b')
plt.plot(tcp_AC.x, tcp_AC, '-g') # same as tcp_AB in MSTIS


Out[9]:
[<matplotlib.lines.Line2D at 0x12bf24f50>]

We normally look at these on a log scale:


In [10]:
plt.plot(tcp_AB.x, np.log(tcp_AB), '-r')
plt.plot(tcp_CA.x, np.log(tcp_CA), '-k')
plt.plot(tcp_BC.x, np.log(tcp_BC), '-b')
plt.xlim(0.0, 1.0)


Out[10]:
(0.0, 1.0)

Now, in case you want to know the total crossing probabability at each interface (for example, to use as a bias in an SRTIS calculation):


In [10]:
interface_locations = [0.04, 0.09, 0.16, 0.25]
print "Out of A:", [tcp_AB(x) for x in interface_locations]
print "Out of B:", [tcp_BA(x) for x in interface_locations]
print "Out of C:", [tcp_CA(x) for x in interface_locations]


Out of A: [1.1997344600007338, 1.1373174412505045, 1.0499336150001832, 0.56600000000000006]
Out of B: [1.166925673644593, 1.1147614006306577, 1.0417314184111481, 0.50700000000000001]
Out of C: [1.5249046304153164, 1.3608719334105301, 1.1312261576038289, 0.77400000000000002]

Flux

Here we also calculate the flux contribution to each transition. The flux is calculated based on


In [11]:
import pandas as pd
flux_matrix = pd.DataFrame(columns=mstis.states, index=mstis.states)
for state_pair in mstis.transitions:
    transition = mstis.transitions[state_pair]
    flux_matrix.set_value(state_pair[0], state_pair[1], transition._flux)

flux_matrix


Out[11]:
{x|opA(x) in [0.0, 0.2]} {x|opB(x) in [0.0, 0.2]} {x|opC(x) in [0.0, 0.2]}
{x|opA(x) in [0.0, 0.2]} NaN 0.0208817 0.0208817
{x|opB(x) in [0.0, 0.2]} 0.0158172 NaN 0.0158172
{x|opC(x) in [0.0, 0.2]} 0.0277998 0.0277998 NaN

Conditional transition probability


In [12]:
outer_ctp_matrix = pd.DataFrame(columns=mstis.states, index=mstis.states)
for state_pair in mstis.transitions:
    transition = mstis.transitions[state_pair]
    outer_ctp_matrix.set_value(state_pair[0], state_pair[1], transition.ctp[transition.ensembles[-1]])    

outer_ctp_matrix


Out[12]:
{x|opA(x) in [0.0, 0.2]} {x|opB(x) in [0.0, 0.2]} {x|opC(x) in [0.0, 0.2]}
{x|opA(x) in [0.0, 0.2]} NaN 0.205 0.647
{x|opB(x) in [0.0, 0.2]} 0.514 NaN 0.29
{x|opC(x) in [0.0, 0.2]} 0.006 0 NaN

In [13]:
ctp_by_interface = pd.DataFrame(index=mstis.transitions)
for state_pair in mstis.transitions:
    transition = mstis.transitions[state_pair]
    for ensemble_i in range(len(transition.ensembles)):
        ctp_by_interface.set_value(
            state_pair, ensemble_i,
            transition.conditional_transition_probability(
                storage.steps,
                transition.ensembles[ensemble_i]
        ))
    
    
ctp_by_interface


Out[13]:
0 1 2
({x|opC(x) in [0.0, 0.2]}, {x|opB(x) in [0.0, 0.2]}) 0.000 0.000 0.000
({x|opC(x) in [0.0, 0.2]}, {x|opA(x) in [0.0, 0.2]}) 0.000 0.000 0.006
({x|opA(x) in [0.0, 0.2]}, {x|opB(x) in [0.0, 0.2]}) 0.000 0.000 0.205
({x|opA(x) in [0.0, 0.2]}, {x|opC(x) in [0.0, 0.2]}) 0.000 0.000 0.647
({x|opB(x) in [0.0, 0.2]}, {x|opC(x) in [0.0, 0.2]}) 0.002 0.014 0.290
({x|opB(x) in [0.0, 0.2]}, {x|opA(x) in [0.0, 0.2]}) 0.109 0.226 0.514

Path ensemble properties


In [14]:
hists_A = mstis.transitions[(stateA, stateB)].histograms
hists_B = mstis.transitions[(stateB, stateC)].histograms
hists_C = mstis.transitions[(stateC, stateB)].histograms

Interface crossing probabilities

We obtain the total crossing probability, shown above, by combining the individual crossing probabilities of


In [15]:
hists = {'A': hists_A, 'B': hists_B, 'C': hists_C}
plot_style = {'A': '-r', 'B': '-b', 'C': '-k'}

In [16]:
for hist in [hists_A, hists_B, hists_C]:
    for ens in hist['max_lambda']:
        normalized = hist['max_lambda'][ens].normalized()
        plt.plot(normalized.x, normalized)



In [17]:
# add visualization of the sum

In [18]:
for hist_type in hists:
    hist = hists[hist_type]
    for ens in hist['max_lambda']:
        reverse_cumulative = hist['max_lambda'][ens].reverse_cumulative()
        plt.plot(reverse_cumulative.x, reverse_cumulative, plot_style[hist_type])
plt.xlim(0.0, 1.0)


Out[18]:
(0.0, 1.0)

In [19]:
for hist_type in hists:
    hist = hists[hist_type]
    for ens in hist['max_lambda']:
        reverse_cumulative = hist['max_lambda'][ens].reverse_cumulative()
        plt.plot(reverse_cumulative.x, np.log(reverse_cumulative), plot_style[hist_type])
plt.xlim(0.0, 1.0)


Out[19]:
(0.0, 1.0)

Path length histograms


In [20]:
for hist in [hists_A, hists_B, hists_C]:
    for ens in hist['pathlength']:
        normalized = hist['pathlength'][ens].normalized()
        plt.plot(normalized.x, normalized)



In [21]:
for ens in hists_A['pathlength']:
    normalized = hists_A['pathlength'][ens].normalized()
    plt.plot(normalized.x, normalized)


Sampling properties

The properties we illustrated above were properties of the path ensembles. If your path ensembles are sufficiently well-sampled, these will never depend on how you sample them.

But to figure out whether you've done a good job of sampling, you often want to look at properties related to the sampling process. OPS also makes these very easy.


In [ ]:

Move scheme analysis


In [22]:
scheme = storage.schemes[0]

In [23]:
scheme.move_summary(storage.steps)


pathreversal ran 25.626% (expected 24.88%) of the cycles with acceptance 211/256 (82.42%)
ms_outer_shooting ran 5.806% (expected 4.98%) of the cycles with acceptance 37/58 (63.79%)
shooting ran 42.743% (expected 44.78%) of the cycles with acceptance 293/427 (68.62%)
minus ran 2.703% (expected 2.99%) of the cycles with acceptance 25/27 (92.59%)
repex ran 23.123% (expected 22.39%) of the cycles with acceptance 55/231 (23.81%)

In [24]:
scheme.move_summary(storage.steps, 'shooting')


OneWayShootingMover Out A 2 ran 4.404% (expected 4.98%) of the cycles with acceptance 22/44 (50.00%)
OneWayShootingMover Out B 0 ran 5.405% (expected 4.98%) of the cycles with acceptance 43/54 (79.63%)
OneWayShootingMover Out C 0 ran 3.303% (expected 4.98%) of the cycles with acceptance 28/33 (84.85%)
OneWayShootingMover Out B 1 ran 3.804% (expected 4.98%) of the cycles with acceptance 27/38 (71.05%)
OneWayShootingMover Out C 1 ran 5.506% (expected 4.98%) of the cycles with acceptance 42/55 (76.36%)
OneWayShootingMover Out B 2 ran 4.605% (expected 4.98%) of the cycles with acceptance 30/46 (65.22%)
OneWayShootingMover Out C 2 ran 5.305% (expected 4.98%) of the cycles with acceptance 25/53 (47.17%)
OneWayShootingMover Out A 0 ran 5.005% (expected 4.98%) of the cycles with acceptance 41/50 (82.00%)
OneWayShootingMover Out A 1 ran 5.405% (expected 4.98%) of the cycles with acceptance 35/54 (64.81%)

In [25]:
scheme.move_summary(storage.steps, 'minus')


Minus ran 0.901% (expected 1.00%) of the cycles with acceptance 9/9 (100.00%)
Minus ran 1.101% (expected 1.00%) of the cycles with acceptance 9/11 (81.82%)
Minus ran 0.701% (expected 1.00%) of the cycles with acceptance 7/7 (100.00%)

In [26]:
scheme.move_summary(storage.steps, 'repex')


ReplicaExchange ran 2.603% (expected 2.49%) of the cycles with acceptance 4/26 (15.38%)
ReplicaExchange ran 2.002% (expected 2.49%) of the cycles with acceptance 1/20 (5.00%)
ReplicaExchange ran 2.903% (expected 2.49%) of the cycles with acceptance 1/29 (3.45%)
ReplicaExchange ran 2.603% (expected 2.49%) of the cycles with acceptance 2/26 (7.69%)
ReplicaExchange ran 2.803% (expected 2.49%) of the cycles with acceptance 6/28 (21.43%)
ReplicaExchange ran 2.202% (expected 2.49%) of the cycles with acceptance 12/22 (54.55%)
ReplicaExchange ran 3.103% (expected 2.49%) of the cycles with acceptance 9/31 (29.03%)
ReplicaExchange ran 2.903% (expected 2.49%) of the cycles with acceptance 18/29 (62.07%)
ReplicaExchange ran 2.002% (expected 2.49%) of the cycles with acceptance 2/20 (10.00%)

In [27]:
scheme.move_summary(storage.steps, 'pathreversal')


PathReversal ran 2.503% (expected 2.49%) of the cycles with acceptance 25/25 (100.00%)
PathReversal ran 2.002% (expected 2.49%) of the cycles with acceptance 20/20 (100.00%)
PathReversal ran 2.102% (expected 2.49%) of the cycles with acceptance 21/21 (100.00%)
PathReversal ran 4.004% (expected 2.49%) of the cycles with acceptance 40/40 (100.00%)
PathReversal ran 2.402% (expected 2.49%) of the cycles with acceptance 19/24 (79.17%)
PathReversal ran 1.902% (expected 2.49%) of the cycles with acceptance 16/19 (84.21%)
PathReversal ran 3.203% (expected 2.49%) of the cycles with acceptance 12/32 (37.50%)
PathReversal ran 3.303% (expected 2.49%) of the cycles with acceptance 33/33 (100.00%)
PathReversal ran 2.202% (expected 2.49%) of the cycles with acceptance 22/22 (100.00%)
PathReversal ran 2.002% (expected 2.49%) of the cycles with acceptance 3/20 (15.00%)

Replica exchange sampling

See the notebook repex_networks.ipynb for more details on tools to study the convergence of replica exchange. However, a few simple examples are shown here. All of these are analyzed with a separate object, ReplicaNetwork.


In [28]:
repx_net = paths.ReplicaNetwork(scheme, storage.steps)

Replica exchange mixing matrix


In [29]:
repx_net.mixing_matrix()


Out[29]:
12 11 6 3 7 4 8 5 9 2 1 0 10
12 0.000000 0.000000 0.034884 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
11 0.000000 0.000000 0.000000 0.027132 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
6 0.034884 0.000000 0.000000 0.000000 0.015504 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
3 0.000000 0.027132 0.000000 0.000000 0.000000 0.046512 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
7 0.000000 0.000000 0.015504 0.000000 0.000000 0.000000 0.007752 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
4 0.000000 0.000000 0.000000 0.046512 0.000000 0.000000 0.000000 0.007752 0.000000 0.000000 0.000000 0.000000 0.000000
8 0.000000 0.000000 0.000000 0.000000 0.007752 0.000000 0.000000 0.000000 0.069767 0.000000 0.000000 0.000000 0.000000
5 0.000000 0.000000 0.000000 0.000000 0.000000 0.007752 0.000000 0.000000 0.003876 0.000000 0.000000 0.000000 0.000000
9 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.069767 0.003876 0.000000 0.003876 0.000000 0.000000 0.000000
2 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.003876 0.000000 0.034884 0.000000 0.000000
1 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.034884 0.000000 0.023256 0.000000
0 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.023256 0.000000 0.034884
10 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.034884 0.000000

Replica exchange graph

The mixing matrix tells a story of how well various interfaces are connected to other interfaces. The replica exchange graph is essentially a visualization of the mixing matrix (actually, of the transition matrix -- the mixing matrix is a symmetrized version of the transition matrix).

Note: We're still developing better layout tools to visualize these.


In [30]:
repxG = paths.ReplicaNetworkGraph(repx_net)
repxG.draw('spring')


Replica exchange flow

Replica flow is defined as TODO

Flow is designed for calculations where the replica exchange graph is linear, which ours clearly is not. However, we can define the flow over a subset of the interfaces.


In [ ]:

Replica move history tree


In [31]:
import openpathsampling.visualize as vis
reload(vis)
from IPython.display import SVG

In [32]:
tree = vis.PathTree(
    storage.steps[0:500],
    vis.ReplicaEvolution(replica=2, accepted=True)
)

SVG(tree.svg())


Out[32]:
+BTBBRRBFRBTFRFTBRBRRFExtendExtendTruncateTruncateExtendExtendTruncateTruncateExtendExtendTruncateTruncateExtendExtendcorstep03128142148160175188222254257291296306307384403414438439454472

In [33]:
decorrelated = tree.generator.decorrelated
print "We have " + str(len(decorrelated)) + " decorrelated trajectories."


We have 4 decorrelated trajectories.

Visualizing trajectories


In [34]:
# we use the %run magic because this isn't in a package
%run ../resources/toy_plot_helpers.py
background = ToyPlot()
background.contour_range = np.arange(-1.5, 1.0, 0.1)
background.add_pes(storage.engines[0].pes)

In [35]:
xval = paths.FunctionCV("xval", lambda snap : snap.xyz[0][0])
yval = paths.FunctionCV("yval", lambda snap : snap.xyz[0][1])
visualizer = paths.StepVisualizer2D(mstis, xval, yval, [-1.0, 1.0], [-1.0, 1.0])
visualizer.background = background.plot()



In [36]:
visualizer.draw_samples(list(tree.samples))


Out[36]:

Histogramming data (TODO)


In [37]:
#! skip
# The skip directive tells our test runner not to run this cell
import time
max_step = 10
for step in storage.steps[0:max_step]:
    visualizer.draw_ipynb(step)
    time.sleep(0.1)



In [ ]: