# Performing Hausdorff pairs analysis using PSA

In this example, the trajectories have been pre-aligned (as in psa_short.py and psa_short.ipynb) using the fitting scheme described in:

S.L. Seyler, A. Kumar, M.F. Thorpe, and O. Beckstein, Path Similarity Analysis: a Method for Quantifying Macromolecular Pathways. arXiv:1505.04807v1_ [q-bio.QM], 2015.



In [1]:

%matplotlib inline

# Suppress FutureWarning about element-wise comparison to None
# Occurs when calling PSA plotting functions
import warnings
warnings.filterwarnings('ignore')



## 1) Set up input data for PSA using MDAnalysis



In [2]:

from MDAnalysis import Universe
from MDAnalysis.analysis.psa import PSAnalysis
from pair_id import PairID



Initialize lists for the methods on which to perform PSA. PSA will be performed for four different simulations methods with three runs for each: DIMS, FRODA, rTMD-F, and rTMD-S. Also initialize a PSAIdentifier object to keep track of the data corresponding to comparisons between pairs of simulations.



In [3]:

method_names = ['DIMS', 'FRODA', 'GOdMD', 'MDdMD', 'rTMD-F', 'rTMD-S',
'ANMP', 'iENM', 'MAP', 'MENM-SD', 'MENM-SP',
'Morph', 'LinInt']
labels = [] # Heat map labels
simulations = [] # List of simulation topology/trajectory filename pairs
universes = [] # List of MDAnalysis Universes representing simulations



For each method, get the topology and each of three total trajectories (per method). Each simulation is represented as a (topology, trajectory) pair of file names, which is appended to a master list of simulations.



In [4]:

for method in method_names:
# Note: DIMS uses the PSF topology format
topname = 'top.psf' if 'DIMS' in method or 'TMD' in method else 'top.pdb'
pathname = 'fitted_psa.dcd'
method_dir = 'methods/{}'.format(method)
if method is not 'LinInt':
for run in xrange(1, 4): # 3 runs per method
run_dir = '{}/{:03n}'.format(method_dir, run)
topology = '{}/{}'.format(method_dir, topname)
trajectory = '{}/{}'.format(run_dir, pathname)
labels.append(method + '(' + str(run) + ')')
simulations.append((topology, trajectory))
else: # only one LinInt trajectory
topology = '{}/{}'.format(method_dir, topname)
trajectory = '{}/{}'.format(method_dir, pathname)
labels.append(method)
simulations.append((topology, trajectory))



Generate a list of universes from the list of simulations.



In [5]:

for sim in simulations:
universes.append(Universe(*sim))



## 2) Compute and plot nearest neighbor distances using PSA

Initialize a PSA comparison from the universe list using a C$_\alpha$ trajectory representation, then generate PSA Paths from the universes.



In [6]:

psa_hpa = PSAnalysis(universes, path_select='name CA', labels=labels)
psa_hpa.generate_paths()



### Compute full Hausdorff pairs analysis for all unique pairs of paths

Generate the Hausdorff nearest neighbors and Hausdorff pairs (frames and distances)



In [7]:

psa_hpa.run_pairs_analysis(neighbors=True, hausdorff_pairs=True)



Plot clustered heat maps using Ward hierarchical clustering. The first heat map is plotted with the corresponding dendrogram and is fully labeled by the method names; the second heat map is annotated by the Hausdorff distances.

### Use PSA IDs to obtain data for the comparisons we want

Get the Simulation IDs and PSA ID for the second DIMS simulation (DIMS 2) and third rTMD-F simulation (rTMD-F 3).

Note: The comparison between a pair of simulations is assigned a unique PSA ID. Given the order in which simulations are added to PSA, the comparison between a pair of simulations can be identified by (distance) matrix indices. The PSA ID is the index in the corresponding distance vector of a given pair of simulations.



In [8]:

identifier = PairID()
for name in method_names:
run_ids = [1] if 'LinInt' in name else [1,2,3]




In [9]:

s1, s2, s3 = 'DIMS 1', 'DIMS 2', 'rTMD-F 3'
pid1 = identifier.get_pair_id(s1, s2)
pid2 = identifier.get_pair_id(s2, s3)



Plotting nearest neighbors as a function of normalized progress frame for:

DIMS 1 and DIMS 2



In [10]:

psa_hpa.plot_nearest_neighbors(filename='nn_dims1_dims2.pdf', idx=pid1, labels=(s1, s2))




<matplotlib.figure.Figure at 0x7faeb8fc7b50>



DIMS 1 and rTMD-F 3



In [11]:

psa_hpa.plot_nearest_neighbors(filename='nn_dims2_tmds3.pdf', idx=pid2, labels=(s2, s3))




<matplotlib.figure.Figure at 0x7faebfac9ed0>