## Computing native contacts with MDTraj

Using the definition from Best, Hummer, and Eaton, "Native contacts determine protein folding mechanisms in atomistic simulations" PNAS (2013) 10.1073/pnas.1311599110

Eq. (1) of the SI defines the expression for the fraction of native contacts, $Q(X)$:

$$Q(X) = \frac{1}{|S|} \sum_{(i,j) \in S} \frac{1}{1 + \exp[\beta(r_{ij}(X) - \lambda r_{ij}^0)]},$$

where

• $X$ is a conformation,
• $r_{ij}(X)$ is the distance between atoms $i$ and $j$ in conformation $X$,
• $r^0_{ij}$ is the distance from heavy atom i to j in the native state conformation,
• $S$ is the set of all pairs of heavy atoms $(i,j)$ belonging to residues $\theta_i$ and $\theta_j$ such that $|\theta_i - \theta_j| > 3$ and $r^0_{i,} < 4.5 \unicode{x212B}$,
• $\beta=5 \unicode{x212B}^{-1}$,
• $\lambda=1.8$ for all-atom simulations


In [ ]:

import numpy as np
import mdtraj as md
from itertools import combinations

def best_hummer_q(traj, native):
"""Compute the fraction of native contacts according the definition from
Best, Hummer and Eaton 

Parameters
----------
traj : md.Trajectory
The trajectory to do the computation for
native : md.Trajectory
The 'native state'. This can be an entire trajecory, or just a single frame.
Only the first conformation is used

Returns
-------
q : np.array, shape=(len(traj),)
The fraction of native contacts in each frame of traj

References
----------
.. Best, Hummer, and Eaton, "Native contacts determine protein folding
mechanisms in atomistic simulations" PNAS (2013)
"""

BETA_CONST = 50  # 1/nm
LAMBDA_CONST = 1.8
NATIVE_CUTOFF = 0.45  # nanometers

# get the indices of all of the heavy atoms
heavy = native.topology.select_atom_indices('heavy')
# get the pairs of heavy atoms which are farther than 3
# residues apart
heavy_pairs = np.array(
[(i,j) for (i,j) in combinations(heavy, 2)
if abs(native.topology.atom(i).residue.index - \
native.topology.atom(j).residue.index) > 3])

# compute the distances between these pairs in the native state
heavy_pairs_distances = md.compute_distances(native, heavy_pairs)
# and get the pairs s.t. the distance is less than NATIVE_CUTOFF
native_contacts = heavy_pairs[heavy_pairs_distances < NATIVE_CUTOFF]
print("Number of native contacts", len(native_contacts))

# now compute these distances for the whole trajectory
r = md.compute_distances(traj, native_contacts)
# and recompute them for just the native state
r0 = md.compute_distances(native, native_contacts)

q = np.mean(1.0 / (1 + np.exp(BETA_CONST * (r - LAMBDA_CONST * r0))), axis=1)
return q




In [ ]:

# pull a random protein from the PDB
# the unitcell info in this PDB happens to be wrong, so lets
# just remove it
traj.unitcell_vectors = None

# just for example, use the first frame as the 'native' conformation
q = best_hummer_q(traj, traj)




In [ ]:

%matplotlib inline
import matplotlib.pyplot as plt
plt.plot(q)
plt.xlabel('Frame', fontsize=14)
plt.ylabel('Q(X)', fontsize=14)
plt.show()




In [ ]: