Here we run through most of the things that can be done with this package using a simple two-state model. There are more sophisticated examples that enable for further possibilities.
The first thing one must do is download the data from the following link. Once this is done, we will import a number of libraries we will need as we run this example.
In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
import math
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style="ticks", color_codes=True, font_scale=1.5)
sns.set_style({"xtick.direction": "in", "ytick.direction": "in"})
We start loading the simulation data using the trajectory
module. For this we use the external library MDtraj
, which contains all sorts of methods for parsing and calculating interestign properties of our time-series data.
In [2]:
import mdtraj as md
from mastermsm.trajectory import traj
In [3]:
tr = traj.TimeSeries(top='data/alaTB.gro', traj=['data/protein_only.xtc'])
print (tr.mdt)
So does what we have calculated look somewhat like a Ramachandran map?
In [4]:
phi = md.compute_phi(tr.mdt)
psi = md.compute_psi(tr.mdt)
res = [x for x in tr.mdt.topology.residues]
In [5]:
fig,ax = plt.subplots(figsize=(4,4))
ax.plot(180./math.pi*phi[1],180./math.pi*psi[1],'o', markersize=1)
ax.set_xlim(-180,180)
ax.set_ylim(-180,180)
ax.xaxis.set_ticks(range(-180,181,90))
ax.yaxis.set_ticks(range(-180,181,90))
ax.set_xlabel(r'$\phi$', fontsize=18)
ax.set_ylabel(r'$\psi$', fontsize=18)
Out[5]:
Next we proceed to discretize the trajectory based on the Ramachandran angles.
In [6]:
tr.discretize(states=['A', 'E'])
For plotting we convert helical configurations in 1 and beta in 0.
In [7]:
y = [0 if x == 'A' else 1 for x in tr.distraj]
fig, (ax1, ax2) = plt.subplots(2,1, sharex=True)
ax1.plot(psi[1]*180/math.pi,'o', markersize=1)
ax2.plot(y)
ax1.set_ylabel(r'$\psi$', fontsize=14)
ax1.set_xlim(0,2000)
ax1.set_ylim(-180,180)
ax1.yaxis.set_ticks(range(-180,181,90))
ax2.set_ylabel('State')
ax2.set_xlim(0,2000)
ax2.set_ylim(-0.2,1.2)
ax2.yaxis.set_ticks([0,1])
labels = [item.get_text() for item in ax2.get_xticklabels()]
labels[0] = 'c'
labels[1] = 'h'
ax2.set_yticklabels(labels)
ax2.set_xlabel('Time [ps]')
Out[7]:
In the plot we see how we go from the time series of continuous torsion angles converts into a time series of discrete states. We can obtain a list of states in the following way.
In [8]:
tr.find_keys()
tr.keys
tr.file_name
Out[8]:
After having loaded our trajectory using the functionalities from the trajectory
module we start building the master equation model. For this, we make use of the msm
module. There are two steps corresponding to the two main classes within that module. First we create an instance of the SuperMSM
, which can be used to direct the whole process of constructing and validating the MSM.
In [9]:
from mastermsm.msm import msm
msm_alaTB = msm.SuperMSM([tr])
Then, using the do_msm
method, we produce instances of the MSM
class at a desired lag time, $\Delta t$. Each of these contains an MSM built at a specific lag time. These are stored as a dictionary in the msms
attribute of the SuperMSM
class.
In [10]:
lagt = 1
msm_alaTB.do_msm(lagt)
msm_alaTB.msms[lagt].do_trans()
msm_alaTB.msms[lagt].boots()
The resulting model has a number of things we may be interested in, like its eigenvalue spectrum (in this case limited to a single relaxation time, corresponding to the exchange of helix and coil) or the equilibrium probabilities of the microstates.
In [11]:
fig, ax = plt.subplots(1, 2, figsize=(5,2.5))
ax[0].errorbar([1], msm_alaTB.msms[lagt].tau_ave, msm_alaTB.msms[lagt].tau_std ,fmt='o-', markersize=10)
ax[1].errorbar([1,2], msm_alaTB.msms[lagt].peq_ave, msm_alaTB.msms[lagt].peq_std ,fmt='o-', markersize=10)
ax[0].set_ylabel(r'$\tau$ [ps]', fontsize=18)
ax[0].set_xlabel(r'$\lambda_1$', fontsize=18)
ax[1].set_ylabel(r'$P_{eq}$', fontsize=18)
ax[0].set_xticks([])
ax[1].set_xticks([1,2])
ax[1].set_xticklabels(labels[:2])
ax[1].set_xlim(0.5,2.5)
ax[0].set_ylim(0,50)
ax[1].set_ylim(0,1)
plt.tight_layout(w_pad=1)
However, from simply calculating these quantities we do not know how informative they really are. In order to understand whether the values we calculate are really reflective of the properties of the underlying system we resort to validation of the MSM. The two-level structure that we have described, consisting of the SuperMSM
and MSM
classes, allows for the user to test some global convergence properties first (at the level of the SuperMSM
).
In [12]:
msm_alaTB.convergence_test(time=[1, 2, 5, 7, 10, 20, 50, 100], error=True)
In [13]:
tau_vs_lagt = np.array([[x,msm_alaTB.msms[x].tauT[0],msm_alaTB.msms[x].tau_std[0]] \
for x in sorted(msm_alaTB.msms.keys())])
fig, ax = plt.subplots()
ax.errorbar(tau_vs_lagt[:,0],tau_vs_lagt[:,1],fmt='o-', yerr=tau_vs_lagt[:,2], markersize=10)
#ax.plot(tau_vs_lagt[:,0],tau_vs_lagt[:,0])
ax.fill_between(10**np.arange(-0.2,3,0.2), 1e-1, 10**np.arange(-0.2,3,0.2), facecolor='lightgray', alpha=0.5)
ax.set_xlabel(r'$\Delta$t [ps]', fontsize=16)
ax.set_ylabel(r'$\tau$ [ps]', fontsize=16)
ax.set_xlim(0.8,200)
ax.set_ylim(0,70)
_ = ax.set_xscale('log')
#ax.set_yscale('log')
Here we see that from the very beginning the relaxation times are independent of the lag time ($\Delta$t) used in the construction of the model. This convergence is a good indicator of the Markovianity of the model and is a result of the use of transition based assignment. The shaded area corresponds to the range of lag times where the information we obtain is largely unreliable, because the lag time itself is longer than the relaxation time.
In [14]:
pMSM_E, pMD_E, epMD_E = msm_alaTB.ck_test(time=[1, 2, 5, 7, 10, 20, 50], init=['E'])
pMSM_A, pMD_A, epMD_A = msm_alaTB.ck_test(time=[1, 2, 5, 7, 10, 20, 50], init=['A'])
In [15]:
fig, ax = plt.subplots(1,2, figsize=(8,3.5), sharex=True, sharey=True)
ax[0].errorbar(pMD_E[:,0], pMD_E[:,1], epMD_E, fmt='o')
for p in pMSM_E:
ax[0].plot(p[0], p[1], label="$\Delta t$=%g"%p[0][0])
ax[0].legend(fontsize=10)
ax[1].errorbar(pMD_A[:,0], pMD_A[:,1], epMD_A, fmt='o')
for p in pMSM_A:
ax[1].plot(p[0], p[1])
ax[0].set_xscale('log')
ax[0].set_ylabel('P(t)')
ax[0].set_xlabel('Time (ps)')
ax[1].set_xlabel('Time (ps)')
plt.tight_layout()
These plots show the decay of the population from a given initial condition. In this case, the left and right plots corresponds to starting in the E
and A
basins respectively. In both cases we compare the calculation from the simulation data (as circles) and the propagation from MSMs calculated at different lag times (lines). The agreement between the simulation data and the model predictions confirm the result from the convergence analysis.
The MSM can also be validated against the autocorrelation function (ACF) of the eigenmodes. If the simulation data is projected in the eigenmodes, then the ACF for mode $n$ should decay with a timescale equal to $-1/\lambda_n$. In this case there is only one mode to reproduce.
In [16]:
msm_alaTB.msms[2].do_trans(evecs=True)
acf = msm_alaTB.msms[2].acf_mode()
time = np.arange(len(acf[1]))*msm_alaTB.data[0].dt
fig, ax = plt.subplots()
ax.plot(time, acf[1], 'o')
ax.plot(time,np.exp(-time*1./msm_alaTB.msms[2].tauT[0]))
ax.set_xlim(0,200)
ax.set_ylim(0,1)
ax.set_xlabel('Time [ps]')
ax.set_ylabel('C$_{11}$(t)')
Out[16]:
From the transition matrix we can calculate the rate matrix. One possibility is to use an approximate method based simply on a Taylor expansion (De Sancho, Mittal and Best, JCTC, 2013). We can check whether our approximate method gives a good result. We use short times since we have checked that short times are sufficient in this case for obtaining converged relaxation times.
In [17]:
fig, ax = plt.subplots(1,2, figsize=(7.5,3.5))
for i in [1, 2, 5, 7, 10, 20]:
msm_alaTB.msms[i].do_rate()
ax[0].errorbar(msm_alaTB.msms[i].tauT, msm_alaTB.msms[i].tauK, fmt='o', xerr=msm_alaTB.msms[i].tau_std, markersize=10, label=str(i))
ax[1].errorbar(msm_alaTB.msms[i].peqT, msm_alaTB.msms[i].peqK, fmt='o', xerr=msm_alaTB.msms[i].peq_std, markersize=10, label=str(i))
ax[0].plot([0,100],[0,100],'--', color='lightgray')
ax[0].set_xlabel(r'$\tau_T$ [ps]', fontsize=20)
ax[0].set_ylabel(r'$\tau_K$ [ps]', fontsize=20)
ax[0].set_xlim(0,60)
ax[0].set_ylim(0,60)
ax[1].plot([0.1,1],[0.1,1],'--', color='lightgray')
ax[1].set_xlabel(r'$p_T$', fontsize=20)
ax[1].set_ylabel(r'$p_K$', fontsize=20)
ax[1].set_xlim(0.2,0.8)
ax[1].set_ylim(0.2,0.8)
ax[0].legend(fontsize=9, bbox_to_anchor=(1.0, 0.65))
plt.tight_layout(pad=0.4, w_pad=3)
The method produces acceptable solutions for short lag times (up to 5-10 ps) although the result rapidly diverges from the transition matrix relaxation time at long lag times. Equilibrium probabilities are recovered correctly at all lag times from the rate matrices.