TIS Analysis Framework Examples

This notebook provides an overview of the TIS analysis framework in OpenPathSampling. We start with the StandardTISAnalysis object, which will probably meet the needs of most users. Then we go into details of how to set up custom objects for analysis, and how to assemble them into a generic TISAnalysis object.

For an overview of TIS and this analysis framework, see http://openpathsampling.org/latest/topics/tis_analysis.html


In [1]:
# if our large test file is available, use it. Otherwise, use file generated from toy_mstis_2_run.ipynb
from __future__ import print_function
import os
test_file = "../toy_mstis_1k_OPS1.nc"
filename = "mstis.nc"  # this requires newer functionality that in the standard tests
#filename = test_file if os.path.isfile(test_file) else "mstis.nc"
print('Using file ' + filename + ' for analysis')


Using file mstis.nc for analysis

In [2]:
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import openpathsampling as paths
import pandas as pd

In [3]:
%%time
storage = paths.AnalysisStorage(filename)


CPU times: user 2min 22s, sys: 13.2 s, total: 2min 35s
Wall time: 2min 44s

In [4]:
network = storage.networks[0]
scheme = storage.schemes[0]

In [5]:
stateA = storage.volumes['A']
stateB = storage.volumes['B']
stateC = storage.volumes['C']
all_states = [stateA, stateB, stateC]  # all_states gives the ordering

Simplified Combined Analysis

The StandardTISAnalysis object makes it very easy to perform the main TIS rate analysis. Furthermore, it caches all the intemediate results, so they can also be analyzed.


In [6]:
from openpathsampling.analysis.tis import StandardTISAnalysis

In [7]:
# the scheme is only required if using the minus move for the flux
tis_analysis = StandardTISAnalysis(
    network=network,
    scheme=scheme,
    max_lambda_calcs={t: {'bin_width': 0.05, 'bin_range': (0.0, 0.5)}
                      for t in network.sampling_transitions}
)

In [8]:
#tis_analysis.progress = 'silent'

In [9]:
%%time
rate_matrix = tis_analysis.rate_matrix(steps=storage.steps).to_pandas(order=all_states)



CPU times: user 1h 7min 2s, sys: 5min 38s, total: 1h 12min 40s
Wall time: 1h 15min 1s

In [10]:
rate_matrix


Out[10]:
A B C
A NaN 0.00214965 0.00186714
B 0.00117906 NaN 0.00120443
C 0.00133534 0.00188307 NaN

The rate matrix is a pandas.DataFrame. pandas has conveniences to easily convert that into a LaTeX table:


In [11]:
print(rate_matrix.to_latex(float_format="{:.2e}".format))


\begin{tabular}{llll}
\toprule
{} &        A &        B &        C \\
\midrule
A &      NaN & 2.15e-03 & 1.87e-03 \\
B & 1.18e-03 &      NaN & 1.20e-03 \\
C & 1.34e-03 & 1.88e-03 &      NaN \\
\bottomrule
\end{tabular}

Note that there are many options for setting up the StandardTISAnalysis object. Most customizations to the analysis can be performed by changing the initialization parameters of that object; see its documentation for details.

Looking at the parts of the calculation

Once you run the rate calculation (or if you run tis_analysis.calculate(steps), you have already cached a large number of subcalculations. All of those are available in the results dictionary, although the analysis object has a number of conveniences to access some of them.

Looking at the keys of the results dictionary, we can see what has been cached:


In [12]:
tis_analysis.results.keys()


Out[12]:
dict_keys(['flux', 'max_lambda', 'total_crossing_probability', 'conditional_transition_probability', 'transition_probability', 'rate'])

In practice, however, we won't go directly to the results dictionary. We'd rather use the convenience methods that make it easier to get to the interesting results.

We'll start by looking at the flux:


In [13]:
tis_analysis.flux_matrix


Out[13]:
{(<openpathsampling.volume.CVDefinedVolume at 0x13475ccf8>,
  <openpathsampling.volume.CVDefinedVolume at 0x134232240>): 0.20017125763152918,
 (<openpathsampling.volume.CVDefinedVolume at 0x13475cd30>,
  <openpathsampling.volume.CVDefinedVolume at 0x131e54860>): 0.21013334525579258,
 (<openpathsampling.volume.CVDefinedVolume at 0x12f54f278>,
  <openpathsampling.volume.CVDefinedVolume at 0x13475ca58>): 0.2353906881858017}

In [14]:
s = paths.analysis.tis.flux_matrix_pd(tis_analysis.flux_matrix)
pd.DataFrame(s)


Out[14]:
Flux
State Interface
A 0.0<opA<0.2 0.200171
B 0.0<opB<0.2 0.210133
C 0.0<opC<0.2 0.235391

Next we look at the total crossing probability (i.e., the crossing probability, joined by WHAM) for each sampled transition. We could also look at this per physical transition, but of course $A\to B$ and $A\to C$ are identical in MSTIS -- only the initial state matters.


In [15]:
# if you don't like the "Flux" label in a separate line, fix it by hand
print(s.to_latex(float_format='{:.4f}'.format))


\begin{tabular}{llr}
\toprule
  &             &   Flux \\
State & Interface &        \\
\midrule
A & 0.0<opA<0.2 & 0.2002 \\
B & 0.0<opB<0.2 & 0.2101 \\
C & 0.0<opC<0.2 & 0.2354 \\
\bottomrule
\end{tabular}

Next we look at the total crossing probability (i.e., the crossing probability, joined by WHAM) for each sampled transition. We could also look at this per physical transition, but of course $A\to B$ and $A\to C$ are identical in MSTIS -- only the initial state matters.


In [16]:
for transition in network.sampling_transitions:
    label = transition.name
    tcp = tis_analysis.total_crossing_probability[transition]
    plt.plot(tcp.x, np.log(tcp), label=label)
plt.title("Total Crossing Probability")
plt.xlabel("$\lambda$")
plt.ylabel("$\ln(P(\lambda | X_0))$")
plt.legend();


We may want to look in more detail at one of these, by checking the per-ensemble crossing probability (as well at the total crossing probability). Here we select based on the $A\to B$ transition, we would get the same results if we selected the transition using either trans = network.from_state[stateA] or trans = network.transitions[(stateA, stateC)].


In [17]:
state_pair = (stateA, stateB)
trans = network.transitions[state_pair]
for ens in trans.ensembles:
    crossing = tis_analysis.crossing_probability(ens)
    label = ens.name
    plt.plot(crossing.x, np.log(crossing), label=label)
tcp = tis_analysis.total_crossing_probability[state_pair]
plt.plot(tcp.x, np.log(tcp), '-k', label="total crossing probability")
plt.title("Crossing Probabilities, " + stateA.name + " -> " + stateB.name)
plt.xlabel("$\lambda$")
plt.ylabel("$\ln(P_A(\lambda | \lambda_i))$")
plt.legend();


Finally, we look at the last part of the rate calculation: the conditional transition probability. This is calculated for the outermost interface in each interface set.


In [18]:
tis_analysis.conditional_transition_probability


Out[18]:
A B C
Out A 2 0.569225 0.230536 0.200239
Out B 2 0.214964 0.565445 0.219591
Out C 2 0.113278 0.159743 0.726979

Individual components of the analysis

The combined analysis is the easiest way to perform analysis, but if you need to customize things (or if you want to compare different calculation methods) you might want to create objects for components of the analysis individually. Note that unlike the StandardTISAnalysis object, these do not cache their intermediate results.

Flux from the minus move


In [19]:
from openpathsampling.analysis.tis import MinusMoveFlux

In [20]:
flux_calc = MinusMoveFlux(scheme)

To calculate the fluxes, we use the .calculate method of the MinusMoveFlux object:


In [21]:
%%time
fluxes = flux_calc.calculate(storage.steps)


CPU times: user 1min 53s, sys: 562 ms, total: 1min 53s
Wall time: 1min 54s

In [22]:
fluxes


Out[22]:
{(<openpathsampling.volume.CVDefinedVolume at 0x13475ccf8>,
  <openpathsampling.volume.CVDefinedVolume at 0x134232240>): 0.20017125763152918,
 (<openpathsampling.volume.CVDefinedVolume at 0x13475cd30>,
  <openpathsampling.volume.CVDefinedVolume at 0x131e54860>): 0.21013334525579258,
 (<openpathsampling.volume.CVDefinedVolume at 0x12f54f278>,
  <openpathsampling.volume.CVDefinedVolume at 0x13475ca58>): 0.2353906881858017}

This is in the same format as the flux_matrix given above, and we can convert it to a pandas.DataFrame in the same way. This can then be converted to a LaTeX table to copy-paste into an article in the same way as was done earlier.


In [23]:
pd.DataFrame(paths.analysis.tis.flux_matrix_pd(fluxes))


Out[23]:
Flux
State Interface
A 0.0<opA<0.2 0.200171
B 0.0<opB<0.2 0.210133
C 0.0<opC<0.2 0.235391

The minus move flux calculates some intermediate information along the way, which can be of use for further analysis. This is cached when using the StandardTISAnalysis, but can always be recalculated. The intermediate maps each (state, interface) pair to a dictionary. For details on the structure of that dictionary, see the documentation of TrajectoryTransitionAnalysis.analyze_flux.


In [24]:
%%time
flux_dicts = flux_calc.intermediates(storage.steps)[0]


CPU times: user 2min 9s, sys: 1.44 s, total: 2min 11s
Wall time: 2min 20s

In [25]:
flux_dicts


Out[25]:
{(<openpathsampling.volume.CVDefinedVolume at 0x13475ccf8>,
  <openpathsampling.volume.CVDefinedVolume at 0x134232240>): {'in': <openpathsampling.analysis.trajectory_transition_analysis.TrajectorySegmentContainer at 0x15b8aca90>,
  'out': <openpathsampling.analysis.trajectory_transition_analysis.TrajectorySegmentContainer at 0x15b8ac748>},
 (<openpathsampling.volume.CVDefinedVolume at 0x13475cd30>,
  <openpathsampling.volume.CVDefinedVolume at 0x131e54860>): {'in': <openpathsampling.analysis.trajectory_transition_analysis.TrajectorySegmentContainer at 0x15b8c3fd0>,
  'out': <openpathsampling.analysis.trajectory_transition_analysis.TrajectorySegmentContainer at 0x15b8c3a20>},
 (<openpathsampling.volume.CVDefinedVolume at 0x12f54f278>,
  <openpathsampling.volume.CVDefinedVolume at 0x13475ca58>): {'in': <openpathsampling.analysis.trajectory_transition_analysis.TrajectorySegmentContainer at 0x15b8c39e8>,
  'out': <openpathsampling.analysis.trajectory_transition_analysis.TrajectorySegmentContainer at 0x15b8c3860>}}

Flux from existing dictionary

The DictFlux class (which is required for MISTIS, and often provides better statistics than the minus move flux in other cases) takes a pre-calculated flux dictionary for initialization, and always returns that dictionary. The dictionary is in the same format as the fluxes returned by the MinusMoveFlux.calculate method; here, we'll just use the results we calculated above:


In [26]:
from openpathsampling.analysis.tis import DictFlux

In [27]:
dict_flux = DictFlux(fluxes)

In [28]:
dict_flux.calculate(storage.steps)


Out[28]:
{(<openpathsampling.volume.CVDefinedVolume at 0x13475ccf8>,
  <openpathsampling.volume.CVDefinedVolume at 0x134232240>): 0.20017125763152918,
 (<openpathsampling.volume.CVDefinedVolume at 0x13475cd30>,
  <openpathsampling.volume.CVDefinedVolume at 0x131e54860>): 0.21013334525579258,
 (<openpathsampling.volume.CVDefinedVolume at 0x12f54f278>,
  <openpathsampling.volume.CVDefinedVolume at 0x13475ca58>): 0.2353906881858017}

Note that DictFlux.calculate just echoes back the dictionary we gave it, so it doesn't actually care if we give it the steps argument or not:


In [29]:
dict_flux.calculate(None)


Out[29]:
{(<openpathsampling.volume.CVDefinedVolume at 0x13475ccf8>,
  <openpathsampling.volume.CVDefinedVolume at 0x134232240>): 0.20017125763152918,
 (<openpathsampling.volume.CVDefinedVolume at 0x13475cd30>,
  <openpathsampling.volume.CVDefinedVolume at 0x131e54860>): 0.21013334525579258,
 (<openpathsampling.volume.CVDefinedVolume at 0x12f54f278>,
  <openpathsampling.volume.CVDefinedVolume at 0x13475ca58>): 0.2353906881858017}

This object can be used to provide the flux part of the TIS calculation, in exactly the same way a MinusMoveFlux object does.

Total crossing probability function

To calculate the total crossing probability, we must first calculate the individual ensemble crossing probabilities. This is done by creating a histogram of the maximum values of the order parameter. The class to do that is FullHistogramMaxLambdas. Then we'll create the TotalCrossingProbability.


In [30]:
transition = network.sampling_transitions[0]

In [31]:
print(transition)


TISTransition: Out A
A -> A or all states except A
Interface: 0.0<opA<0.2
Interface: 0.0<opA<0.30000000000000004
Interface: 0.0<opA<0.4


In [32]:
from openpathsampling.analysis.tis import FullHistogramMaxLambdas, TotalCrossingProbability
from openpathsampling.numerics import WHAM

In [33]:
max_lambda_calc = FullHistogramMaxLambdas(
    transition=transition,
    hist_parameters={'bin_width': 0.05, 'bin_range': (0.0, 0.5)}
)

We can also change the function used to calculate the maximum value of the order parameter with the max_lambda_func parameter. This can be useful to calculate the crossing probabilities along some other order parameter.

To calculate the total crossing probability function, we also need a method for combining the ensemble crossing probability functions. We'll use the default WHAM here; see its documentation for details on how it can be customized.


In [34]:
combiner = WHAM(interfaces=transition.interfaces.lambdas)

Now we can put these together into the total crossing probability function:


In [35]:
total_crossing = TotalCrossingProbability(
    max_lambda_calc=max_lambda_calc,
    combiner=combiner
)

In [36]:
tcp = total_crossing.calculate(storage.steps)





In [37]:
plt.plot(tcp.x, np.log(tcp))

plt.title("Total Crossing Probability, exiting " + transition.stateA.name)
plt.xlabel("$\lambda$")
plt.ylabel("$\ln(P_A(\lambda | \lambda_i))$");


Conditional transition probability

The last part of the standard calculation is the conditional transition probability. We'll make a version of this that works for all ensembles:


In [38]:
from openpathsampling.analysis.tis import ConditionalTransitionProbability

In [39]:
outermost_ensembles = [trans.ensembles[-1] for trans in network.sampling_transitions]

In [40]:
cond_transition = ConditionalTransitionProbability(
    ensembles=outermost_ensembles,
    states=network.states
)

In [41]:
ctp = cond_transition.calculate(storage.steps)




In [42]:
ctp


Out[42]:
{<openpathsampling.ensemble.TISEnsemble at 0x12ec076d8>: {<openpathsampling.volume.CVDefinedVolume at 0x13475ccf8>: 0.5692254116710611,
  <openpathsampling.volume.CVDefinedVolume at 0x13475cd30>: 0.2305357942390926,
  <openpathsampling.volume.CVDefinedVolume at 0x12f54f278>: 0.20023879408984627},
 <openpathsampling.ensemble.TISEnsemble at 0x1316d3390>: {<openpathsampling.volume.CVDefinedVolume at 0x13475cd30>: 0.5654445052484951,
  <openpathsampling.volume.CVDefinedVolume at 0x12f54f278>: 0.21959106512113824,
  <openpathsampling.volume.CVDefinedVolume at 0x13475ccf8>: 0.21496442963036666},
 <openpathsampling.ensemble.TISEnsemble at 0x13475c7b8>: {<openpathsampling.volume.CVDefinedVolume at 0x13475cd30>: 0.15974329635341525,
  <openpathsampling.volume.CVDefinedVolume at 0x12f54f278>: 0.7269787572757574,
  <openpathsampling.volume.CVDefinedVolume at 0x13475ccf8>: 0.11327794637082732}}

StandardTISAnalysis.conditional_transition_probability converts this into a pandas.DataFrame, which gives prettier printing. However, the same data in included in this dict-of-dict structure.

Assembling a TIS analysis from scratch

If you're using the "standard" TIS approach, then the StandardTISAnalysis object is the most efficient way to do it. However, if you want to use another analysis approach, it can be useful to see how the "standard" approach can be assembled.

This won't have all the shortcuts or saved intermediates that the specialized object does, but it will use the same algorithms to get the same results.


In [43]:
from openpathsampling.analysis.tis import StandardTransitionProbability, TISAnalysis

Some of the objects that we created in previous sections can be reused here. In particular, there is only only one flux calculation and only one conditional transitional transition probability per reaction network. However, the total crossing probability method is dependent on the transition (different order parameters might have different histrogram parameters). So we need to associate each transition with a different TotalCrossingProbability object. In this example, we take the default behavior of WHAM (instead of specifying in explicitly, as above).


In [44]:
tcp_methods = {
    trans: TotalCrossingProbability(
        max_lambda_calc=FullHistogramMaxLambdas(
            transition=trans,
            hist_parameters={'bin_width': 0.05, 'bin_range': (0.0, 0.5)}
        )
    )
    for trans in network.transitions.values()
}

The general TISAnalysis object makes the most simple splitting: flux and transition probability. A single flux calculation is used for all transitions, but each transition has a different transition probability (since each transition can have a different total crossing probability). We make those objects here.


In [45]:
transition_probability_methods = {
    trans: StandardTransitionProbability(
        transition=trans,
        tcp_method=tcp_methods[trans],
        ctp_method=cond_transition
    )
    for trans in network.transitions.values()
}

Finally we put this all together into a TISAnalysis object, and calculate the rate matrix.


In [46]:
analysis = TISAnalysis(
    network=network,
    flux_method=dict_flux,
    transition_probability_methods=transition_probability_methods
)

In [47]:
analysis.rate_matrix(storage.steps)








Out[47]:
            A           B           C
A         NaN  0.00214965  0.00186714
B  0.00117906         NaN  0.00120443
C  0.00133534  0.00188307         NaN

This is the same rate matrix as we obtained with the StandardTISAnalysis. It is a little faster because we used the precalculated DictFlux object in this instead of the MinusMoveFlux, otherwise this would be slower.


In [ ]: