Featurizing Ligand-Protein trajectories

Trajectories containing a protein and ligand can now be featurized in several ways. A reference frame with at least two chains, one of which is the protein and one of which is the ligand, is required for all featurizations. These chains can be manually specified by their indexes or MSMBuilder can guess which trajectory is the protein (by choosing the longest CA-containing chain) and which is the ligand (by choosing the longest chain containing up to 200 atoms; tie goes to the lower index).

Here we explore Ligand-Protein contact featurizations and their binary transforms as well as RMSD calculations with customizable alignment and calcuation indices.

Generate a toy trajectory


In [ ]:
%matplotlib inline
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
import mdtraj as md

top = md.Topology()
c = top.add_chain()

r0 = top.add_residue('HET', c)
r1 = top.add_residue('HET', c)
r2 = top.add_residue('HET', c)
r3 = top.add_residue('HET', c)
r4 = top.add_residue('HET', c)
r5 = top.add_residue('HET', c)
r6 = top.add_residue('HET', c)
r7 = top.add_residue('HET', c)
r8 = top.add_residue('HET', c)
r9 = top.add_residue('HET', c)
residues = [r0,r1,r2,r3,r4,r5,r6,r7,r8,r9]

c_ligand = top.add_chain()
r_ligand = top.add_residue('HET', c_ligand)

for _ in range(10):
    for _, res in enumerate(residues):
        top.add_atom('CA', md.element.carbon, res)
for _ in range(10):
    top.add_atom('CA', md.element.carbon, r_ligand)
traj = md.Trajectory(xyz=np.random.uniform(size=(100, 110, 3)),
                     topology=top,
                     time=np.arange(100))
ref = md.Trajectory(xyz=np.random.uniform(size=(1, 110, 3)),
                    topology=top,
                    time=np.arange(1))

Identify residues characterizing a binding pocket with respect to a reference structure


In [ ]:
from msmbuilder.featurizer import LigandContactFeaturizer
from msmbuilder.featurizer import BinaryLigandContactFeaturizer

feat = LigandContactFeaturizer(reference_frame=ref, binding_pocket=0.1)
df = pd.DataFrame(feat.describe_features(ref))
df

In [ ]:
pocket_contacts = feat.transform(traj)
print("Number of frames is {}".format(len(pocket_contacts)))
print("Number of features is {}".format(pocket_contacts[0].shape[1]))

Create a histogram of instances each residue is within a certain cutoff distance of the ligand


In [ ]:
feat = BinaryLigandContactFeaturizer(reference_frame=ref, cutoff=0.1)
pocket_bins = feat.transform(traj)
print("Number of residues is {}".format(pocket_bins[0].shape[1]))

In [ ]:
count_list = []
for res in range(pocket_bins[0].shape[1]):
    count_list.append(sum([pocket_bins[i][0][res]
                       for i in range(len(pocket_bins))]))

In [ ]:
fig = plt.figure(figsize=(6,5))
plt.title('Instances within {} nm cutoff'.format(feat.cutoff),fontsize=18)
plt.bar(range(pocket_bins[0].shape[1]),count_list)
plt.ylabel('Counts', fontsize=16)
plt.xlabel('Residue index', fontsize=16)
plt.xticks(np.linspace(0.4,10.4,pocket_bins[0].shape[1]+1),range(pocket_bins[0].shape[1]))
plt.tight_layout()

Using the LigandRMSDFeaturizer

Compute the RMSD of each frame in the trajectory to each frame in a reference trajectory for any set of alignment indices and any set of indices to use for the RMSD calculation. By default, structures are aligned by the protein atoms and the RMSD is calculated for ligand atoms.


In [ ]:
from msmbuilder.featurizer import LigandRMSDFeaturizer
feat = LigandRMSDFeaturizer(reference_frame=ref, reference_traj=traj[0:2])
rmsds = feat.transform([traj])
print(rmsds[0][:2])

Specific indices of the ligand and protein can be specified for alignment and calculation. If no reference trajectory is provided, the reference frame is used.


In [ ]:
feat_indices = LigandRMSDFeaturizer(reference_frame=ref, align_indices=range(50),
                                   calculate_indices=[105])
rmsds_indices = feat_indices.transform([traj])
print(rmsds_indices[0][:2])

Custom indices can also be provided. For example, here we have aligned by the protein (this is the default option but has been enumerated here for clarity) but calculated the RMSD based on all atoms in the reference frame.


In [ ]:
feat_custom = LigandRMSDFeaturizer(reference_frame=ref, align_by='protein',
                        calculate_for='custom', calculate_indices=range(ref.n_atoms))
rmsds_custom = feat_custom.transform([traj])
print(rmsds_custom[0][:2])

Using all atoms for both aligning and calculating RMSD is equivalent to mdtraj's implementation of RMSD calculations.


In [ ]:
feat_mdtraj = LigandRMSDFeaturizer(reference_frame=ref, align_by='custom',
                                  align_indices=range(ref.n_atoms), calculate_for='custom',
                                  calculate_indices=range(ref.n_atoms))
rmsds_mdtraj = feat_mdtraj.transform([traj])
real_mdtraj = md.rmsd(traj, ref, frame=0)
print("multichain implementation:\t {}, ...".format((rmsds_mdtraj[0][0][0],
                                                   rmsds_mdtraj[0][1][0])))
print("mdtraj implementation:\t\t {}, ...".format((real_mdtraj[0], real_mdtraj[1])))

In [ ]: