The contact map classes use an equivalent to mdtraj.atom_slice() to cut down the trajectory or frame to only contain atoms that are part of either the query or haystack. It does not use this optimization if there are no atoms to be slices.
This is more performant for most real world cases, as we tested it for a protein system without water (only hydrogens are sliced) and a system of only tip4p water (hydrogens and dummy atoms are sliced away).
It does involve some overhead mainly consisting of the mapping of sliced indices back to the real indices after the contactmap or frequency has been calculated. This overhead could be significant in the case where the resulting contactmap/frequency is dense (resulting in a lot of keys to be mapped) and very little is sliced away (triggering the optimization).
This notebook will show such a worst case scenario and how to disable this optimization.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import mdtraj as md
traj = md.load("5550217/kras.xtc", top="5550217/kras.pdb")
topology = traj.topology
In [2]:
from contact_map import ContactMap, ContactFrequency, ContactDifference
In [3]:
# Select all atoms
atoms = range(topology.n_atoms)
# Slice away a single atom so the optimization is triggered
used_atoms = atoms [1:]
In [4]:
%%time
frame_contacts = ContactMap(traj[0], query=used_atoms, haystack=used_atoms)
In [5]:
%%time
trajectory_contacts = ContactFrequency(traj, query=used_atoms,
haystack=used_atoms)
In [6]:
# Check if the optimization was triggered
trajectory_contacts.use_atom_slice
Out[6]:
In [7]:
ContactMap._class_use_atom_slice = False
ContactFrequency._class_use_atom_slice = False
ContactDifference._class_use_atom_slice = False
In [8]:
%%time
frame_contacts = ContactMap(traj[0], query=used_atoms, haystack=used_atoms)
In [9]:
%%time
trajectory_contacts = ContactFrequency(traj, query=used_atoms,
haystack=used_atoms)
In [10]:
trajectory_contacts.use_atom_slice
Out[10]:
In [25]:
switch1 = topology.select("resSeq 32 to 38 and symbol != 'H'")
cations = topology.select("resname NA or resname MG")
In [26]:
# Set class_atom_slice to True
ContactMap._class_use_atom_slice = True
ContactFrequency._class_use_atom_slice = True
ContactDifference._class_use_atom_slice = True
In [27]:
%%time
cations_switch1 = ContactFrequency(trajectory=traj, query=cations, haystack=switch1)
In [28]:
# Set class_atom_slice to False
ContactMap._class_use_atom_slice = False
ContactFrequency._class_use_atom_slice = False
ContactDifference._class_use_atom_slice = False
In [29]:
%%time
cations_switch1 = ContactFrequency(trajectory=traj, query=cations, haystack=switch1)