Markov state Models (MSMs) are kinetic models that describe the conformational dynamics of proteins in terms of jumps between the conformational states of the protein. A MSMs can be thought as a map of the conformational space a molecule explores and it consists of
(1) a set of conformational states which are corresponding to the local minima in the free energy landscape
(2) a transition probability matrix which stands for the rate for transitioning across the barrier separating two states.
Recently, MSMs have had success at modeling long-time statistical dynamics.
In [9]:
# load simulation data
import mdtraj as md
trajs = [md.load('bound_trajs/liganded_1.pdb'), md.load('bound_trajs/liganded_2.pdb'),
md.load('bound_trajs/liganded_3.pdb'), md.load('bound_trajs/liganded_4.pdb'),
md.load('bound_trajs/liganded_5.pdb'), md.load('bound_trajs/liganded_6.pdb'),
md.load('bound_trajs/liganded_7.pdb'), md.load('bound_trajs/liganded_8.pdb'),
md.load('bound_trajs/liganded_9.pdb'), md.load('bound_trajs/liganded_10.pdb'),
md.load('bound_trajs/liganded_11.pdb'), md.load('bound_trajs/liganded_12.pdb'),
md.load('bound_trajs/liganded_13.pdb')]
top = trajs[0].topology
ref = md.load('bound_trajs/bound.pdb')
In [10]:
trajs
Out[10]:
In [11]:
ref
Out[11]:
In [12]:
# featurize simulation data
from msmbuilder.featurizer import SuperposeFeaturizer
indices = [atom.index for atom in top.atoms if atom.element.symbol in ['C', 'O', 'N']]
featurizer = SuperposeFeaturizer(indices, ref)
sequences = featurizer.transform(trajs)
The transition count matrix C(τ), which denotes the number of observed transitions from state i at time t to state j at time t + τ, where τ is the lag time of the model.
the transition probability matrix T(τ) includes the probability of transitions from state i to state j in a certain time interval τ is obtained by normalizing C(τ) with the sum of all transitions from state i.
In [15]:
from msmbuilder.cluster import KCenters
from msmbuilder.msm import MarkovStateModel
msmts0, msmts1 = {}, {}
lag_times = [1, 10, 20, 30, 40]
n_states = [4, 8, 16, 32, 64]
for n in n_states:
msmts0[n] = []
msmts1[n] = []
for lag_time in lag_times:
assignments = KCenters(n_clusters=n).fit_predict(sequences)
msm = MarkovStateModel(lag_time=lag_time, verbose=False).fit(assignments)
timescales = msm.timescales_
msmts0[n].append(timescales[0])
msmts1[n].append(timescales[1])
print('n_states=%d\tlag_time=%d\ttimescales=%s' % (n, lag_time, timescales[0:2]))
print()
In [16]:
%matplotlib inline
from matplotlib.pyplot import *
figure(figsize=(14,3))
for i, n in enumerate(n_states):
subplot(1,len(n_states),1+i)
plot(lag_times, msmts0[n])
plot(lag_times, msmts1[n])
if i == 0:
ylabel('Relaxation Timescale')
xlabel('Lag Time')
title('%d states' % n)
show()
looking at the transition/kinetics between the microstates in order to connect those microstates separated by low-energy barriers (i.e. those with fast inter-microstates transitions) into a single metastable state (macrostate). In this way, by first partitioning the conformations into microstates and then lumping the microstates into macrostates the complete energy landscape can be partitioned into a small set of metastable states, such macrostate division of the energy landscape is not only representing the underlying kinetics of the system, but also by reducing the number of states in the system.
eigenvector actually corresponds to a certain state distribution that can give rise to a sustainable mode of transitions between groups of states, with the signed structure of the eigenvector indicating the two groups between which the transition occurs.
In the adaptive sampling scheme, MSMs are built after an initial dataset is obtained to identify new conformational states of protein, which are then used as the starting points for future simulations.
Adaptive sampling procedures seek to identify the starting configurations for future simulations such that the conformational landscapes can be sampled efficiently.
This procedure ensures that the future simulations are started from the edges of the conformational landscape, thereby avoiding the already well-sampled stable states.
In [ ]: