Computing native contacts with MDTraj

Using the definition from Best, Hummer, and Eaton, "Native contacts determine protein folding mechanisms in atomistic simulations" PNAS (2013) 10.1073/pnas.1311599110

Eq. (1) of the SI defines the expression for the fraction of native contacts, $Q(X)$:

$$ Q(X) = \frac{1}{|S|} \sum_{(i,j) \in S} \frac{1}{1 + \exp[\beta(r_{ij}(X) - \lambda r_{ij}^0)]}, $$


  • $X$ is a conformation,
  • $r_{ij}(X)$ is the distance between atoms $i$ and $j$ in conformation $X$,
  • $r^0_{ij}$ is the distance from heavy atom i to j in the native state conformation,
  • $S$ is the set of all pairs of heavy atoms $(i,j)$ belonging to residues $\theta_i$ and $\theta_j$ such that $|\theta_i - \theta_j| > 3$ and $r^0_{i,} < 4.5 \unicode{x212B}$,
  • $\beta=5 \unicode{x212B}^{-1}$,
  • $\lambda=1.8$ for all-atom simulations

In [ ]:
import numpy as np
import mdtraj as md
from itertools import combinations

def best_hummer_q(traj, native):
    """Compute the fraction of native contacts according the definition from
    Best, Hummer and Eaton [1]
    traj : md.Trajectory
        The trajectory to do the computation for
    native : md.Trajectory
        The 'native state'. This can be an entire trajecory, or just a single frame.
        Only the first conformation is used
    q : np.array, shape=(len(traj),)
        The fraction of native contacts in each frame of `traj`
    ..[1] Best, Hummer, and Eaton, "Native contacts determine protein folding
          mechanisms in atomistic simulations" PNAS (2013)
    BETA_CONST = 50  # 1/nm
    LAMBDA_CONST = 1.8
    NATIVE_CUTOFF = 0.45  # nanometers
    # get the indices of all of the heavy atoms
    heavy = native.topology.select_atom_indices('heavy')
    # get the pairs of heavy atoms which are farther than 3
    # residues apart
    heavy_pairs = np.array(
        [(i,j) for (i,j) in combinations(heavy, 2)
            if abs(native.topology.atom(i).residue.index - \
                   native.topology.atom(j).residue.index) > 3])
    # compute the distances between these pairs in the native state
    heavy_pairs_distances = md.compute_distances(native[0], heavy_pairs)[0]
    # and get the pairs s.t. the distance is less than NATIVE_CUTOFF
    native_contacts = heavy_pairs[heavy_pairs_distances < NATIVE_CUTOFF]
    print("Number of native contacts", len(native_contacts))
    # now compute these distances for the whole trajectory
    r = md.compute_distances(traj, native_contacts)
    # and recompute them for just the native state
    r0 = md.compute_distances(native[0], native_contacts)
    q = np.mean(1.0 / (1 + np.exp(BETA_CONST * (r - LAMBDA_CONST * r0))), axis=1)
    return q

In [ ]:
# pull a random protein from the PDB
traj = md.load_pdb('')
# the unitcell info in this PDB happens to be wrong, so lets
# just remove it
traj.unitcell_vectors = None

# just for example, use the first frame as the 'native' conformation
q = best_hummer_q(traj, traj[0])

In [ ]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.xlabel('Frame', fontsize=14)
plt.ylabel('Q(X)', fontsize=14)

In [ ]: