Using PairID to extract PSA data

Here we will use the convenience class PSAIdentifier to extract Hausdorff pair analysis data generated by PSA.


In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2

# Suppress FutureWarning about element-wise comparison to None
# Occurs when calling PSA plotting functions
import warnings
warnings.filterwarnings('ignore')

Set up input data for PSA using MDAnalysis


In [2]:
from MDAnalysis import Universe
from MDAnalysis.analysis.psa import PSAnalysis
from pair_id import PairID

Initialize lists for the methods on which to perform PSA. PSA will be performed for four different simulations methods with three runs for each: DIMS, FRODA, rTMD-F, and rTMD-S. Also initialize a PSAIdentifier object to keep track of the data corresponding to comparisons between pairs of simulations.


In [3]:
method_names = ['DIMS','FRODA','rTMD-F','rTMD-S']
labels = [] # Heat map labels (not plotted in this example)
simulations = [] # List of simulation topology/trajectory filename pairs
universes = [] # List of MDAnalysis Universes representing simulations

For each method, get the topology and each of three total trajectories (per method). Each simulation is represented as a (topology, trajectory) pair of file names, which is appended to a master list of simulations.


In [4]:
for method in method_names:
    # Note: DIMS uses the PSF topology format
    topname = 'top.psf' if 'DIMS' in method or 'TMD' in method else 'top.pdb'
    pathname = 'fitted_psa.dcd'
    method_dir = 'methods/{}'.format(method)
    if method is not 'LinInt':
        for run in xrange(1, 4): # 3 runs per method
            run_dir = '{}/{:03n}'.format(method_dir, run)
            topology = '{}/{}'.format(method_dir, topname)
            trajectory = '{}/{}'.format(run_dir, pathname)
            labels.append(method + '(' + str(run) + ')')
            simulations.append((topology, trajectory))
    else: # only one LinInt trajectory
        topology = '{}/{}'.format(method_dir, topname)
        trajectory = '{}/{}'.format(method_dir, pathname)
        labels.append(method)
        simulations.append((topology, trajectory))

Generate a list of universes from the list of simulations.


In [5]:
for sim in simulations:
    universes.append(Universe(*sim))

Perform a path similarity analysis

Initialize a PSA comparison from the universe list using a C$_\alpha$ trajectory representation.


In [6]:
psa_hpa = PSAnalysis(universes, path_select='name CA', labels=labels)

Generate PSA Paths from the trajectories


In [7]:
psa_hpa.generate_paths()

Perform a Hausdorff pairs analysis on all of the Paths


In [8]:
psa_hpa.run_pairs_analysis(hausdorff_pairs=True, neighbors=True)

Extract specific data from PSA


In [9]:
identifier = PairID()
for name in method_names:
    identifier.add_sim(name, [1,2,3])

Get the PSA ID for the second DIMS simulation (DIMS 2) and third rTMD-F simulation (rTMD-F 3)


In [10]:
pid = identifier.get_pair_id('DIMS 2', 'rTMD-F 3')

Use the PSA ID to locate the Hausdorff analysis data for the DIMS 2/rTMD-F 3 comparison:

Get the indices of the frames for the DIMS 2 and rTMD-F 3 paths corresponding to the Hausdorff pair


In [11]:
psa_hpa.hausdorff_pairs[pid]


Out[11]:
{'distance': 1.8656396, 'frames': (48, 97)}

In [12]:
psa_hpa.hausdorff_pairs[pid]['frames']


Out[12]:
(48, 97)

Get the rmsd separating the Hausdorff pair (this is the Hausdorff distance!)


In [13]:
psa_hpa.hausdorff_pairs[pid]['distance']


Out[13]:
1.8656396

Get the indices of the nearest neighbor frames for the DIMS 2 and rTMD-F 3 paths


In [14]:
psa_hpa.nearest_neighbors[pid]['frames']


Out[14]:
(array([  3,   3,   7,   8,  13,  13,  17,  17,  19,  22,  22,  22,  28,
         29,  30,  36,  36,  36,  37,  42,  42,  43,  49,  51,  52,  56,
         57,  57,  57,  57,  63,  63,  68,  68,  68,  69,  76,  80,  80,
         83,  84,  84,  89,  90,  90,  91,  91,  93,  97, 102, 102, 105,
        105, 109, 110, 110, 115, 116, 116, 116, 116, 116, 117, 125, 127,
        127, 127, 129, 129, 129, 129, 130, 129, 133, 133, 133, 133, 133,
        133, 134, 133, 133, 137, 138, 134, 138, 138, 138, 138, 139, 138, 139]),
 array([ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  2,  3,  3,
         3,  3,  3,  3,  3,  3,  4,  4,  4, 10, 10, 10, 10, 10, 10, 12, 12,
        14, 14, 14, 14, 15, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17, 17,
        17, 17, 17, 17, 22, 22, 26, 29, 29, 29, 29, 29, 29, 29, 29, 34, 34,
        34, 34, 35, 35, 35, 35, 35, 35, 35, 35, 35, 35, 36, 37, 37, 42, 42,
        42, 42, 42, 42, 44, 44, 44, 45, 45, 45, 45, 45, 45, 45, 45, 52, 52,
        54, 58, 59, 61, 61, 61, 61, 61, 87, 87, 87, 87, 87, 87, 87, 87, 87,
        87, 89, 89, 89, 89, 89, 89, 89, 89, 89, 91, 91, 90, 90, 90, 90, 90,
        90, 90, 91, 91]))

Get the nearest neighbor rmsds for the paths


In [15]:
psa_hpa.nearest_neighbors[pid]['distances']


Out[15]:
(array([ 0.61047804,  0.72486514,  0.78072226,  0.83813328,  0.93167013,
         1.01148224,  1.06094921,  1.11567676,  1.15830064,  1.18572509,
         1.19752169,  1.24201703,  1.28473079,  1.3059119 ,  1.32284486,
         1.34145117,  1.36143172,  1.36866331,  1.45567334,  1.54358423,
         1.5964545 ,  1.63075888,  1.64360261,  1.68257856,  1.72496045,
         1.71841514,  1.71702886,  1.72773194,  1.76599729,  1.72124612,
         1.77787733,  1.76189709,  1.75216496,  1.74139595,  1.7261765 ,
         1.73064542,  1.76115847,  1.7780937 ,  1.8280741 ,  1.81312251,
         1.81970263,  1.79710293,  1.77227151,  1.7905525 ,  1.76451468,
         1.77083576,  1.80586505,  1.83057499,  1.86563957,  1.85736096,
         1.86288309,  1.82903266,  1.7895925 ,  1.78329408,  1.74680781,
         1.72502816,  1.72101748,  1.66805542,  1.63964784,  1.60171819,
         1.57724261,  1.5326587 ,  1.55232775,  1.51061881,  1.47200549,
         1.44320154,  1.38229811,  1.31344652,  1.28408015,  1.24469936,
         1.19692671,  1.17440617,  1.11309803,  1.0645541 ,  1.02498984,
         0.99263138,  0.93730855,  0.91037399,  0.86864704,  0.82913756,
         0.79435873,  0.76110363,  0.73129284,  0.68911695,  0.64772373,
         0.60788435,  0.58289891,  0.55806595,  0.52546072,  0.4935227 ,
         0.4636465 ,  0.45959264], dtype=float32),
 array([ 0.63109344,  0.62719834,  0.61601019,  0.61047804,  0.62155843,
         0.63865268,  0.66227907,  0.67063749,  0.69626302,  0.72733957,
         0.75623101,  0.7901932 ,  0.83100504,  0.85794991,  0.88361055,
         0.90336204,  0.92149943,  0.93565702,  0.97083294,  0.99374866,
         1.03374887,  1.06611073,  1.08965945,  1.13067162,  1.17012823,
         1.20783472,  1.22551477,  1.23616135,  1.23949754,  1.25503623,
         1.27730811,  1.29305518,  1.31983471,  1.32525945,  1.33354521,
         1.33722591,  1.33787072,  1.35226214,  1.3751452 ,  1.39359438,
         1.39420414,  1.39986932,  1.40463161,  1.41709375,  1.4386462 ,
         1.45734608,  1.47568536,  1.49810481,  1.51763892,  1.53081644,
         1.55413353,  1.57935131,  1.6103493 ,  1.65109909,  1.68054557,
         1.69345903,  1.7044816 ,  1.71702886,  1.72613454,  1.72539341,
         1.72467172,  1.72730267,  1.72880113,  1.72190535,  1.73856974,
         1.75418186,  1.7464515 ,  1.73684514,  1.7261765 ,  1.72830319,
         1.73841536,  1.74597847,  1.74871075,  1.74822044,  1.75148296,
         1.74700916,  1.74772978,  1.75728726,  1.77193272,  1.77165914,
         1.7767179 ,  1.78658187,  1.79900873,  1.78998554,  1.78027713,
         1.78107965,  1.77923036,  1.77341914,  1.7738812 ,  1.76738906,
         1.76451468,  1.76607394,  1.77978122,  1.7792269 ,  1.79973316,
         1.80040133,  1.80270863,  1.81065071,  1.82419837,  1.83073819,
         1.83595932,  1.82261348,  1.80349576,  1.79456735,  1.77086103,
         1.74088705,  1.70968091,  1.68400657,  1.67217898,  1.64538348,
         1.62183571,  1.56865823,  1.50714755,  1.4620024 ,  1.39992738,
         1.34150422,  1.28506386,  1.23358619,  1.1904099 ,  1.13146091,
         1.08869743,  1.04396403,  0.98481226,  0.92711079,  0.8842231 ,
         0.83658326,  0.77844107,  0.72915876,  0.68647271,  0.63805151,
         0.60030681,  0.56835544,  0.54370099,  0.50965744,  0.48632449,
         0.47754329,  0.47594741,  0.46904811,  0.45977202,  0.45959264], dtype=float32))

Display the pandas DataFrame containing the set of simulations


In [16]:
df = identifier.data
df


Out[16]:
Sim ID
Name Run ID
DIMS 1 0
2 1
3 2
FRODA 1 3
2 4
3 5
rTMD-F 1 6
2 7
3 8
rTMD-S 1 9
2 10
3 11

Get the simulation IDs for DIMS simulations 1, 2, and 3


In [17]:
df.loc[('DIMS',[1,2,3]), 'Sim ID']


Out[17]:
Name  Run ID
DIMS  1         0
      2         1
      3         2
Name: Sim ID, dtype: int64