In this tutorial, we use Bayesian Energy Landscape Tilting (BELT) to infer the conformational ensemble of tri-alanine. This calculation is similar to the one done in [1], but uses only a single experiment (for speed reasons).
Performing an BELT calculation requires three inputs:
A set of conformations to be used as a "starting" guess for the ensemble: $x_i$
A set of experimental measurements
and their uncertainties
The predicted experimental observables (predictions
) at each conformation.
BELT combines measurements
, uncertainties
, and predictions
using Bayesian inference to produce a conformational ensemble that best agrees with your data.
We first import some Python libraries used in the tutorial
In [1]:
import numpy as np
import matplotlib.pyplot as plt
from fitensemble import belt, example_loader
We will use a single $^3J(H^N H^\alpha )$ $ \;$ NMR measurement to reweight molecular dynamics simulations performed in the Amber ff96 forcefield. Note that a single experiment is not sufficient to correct the forcefield bias in ff96. However, the single-experiment calculation illustrates the key points and runs in under a minute.
We next load the data used to connect simulation and experiment. The trialanine data is loaded by a helper function (load_alanine
) included in FitEnsemble for the purpose of simplifying the tutorial.
In [2]:
predictions, measurements, uncertainties = example_loader.load_alanine_numpy()
The inputs to FitEnsemble should be Numpy arrays. Be careful to get the input shapes correct: measurements
and uncertainties
should have shape (num_measurements
), while predictions should have shape (
num_frames,
num_measurements`). Thus, measurements[i] and uncertainties[i] give the ith experimental measurement and uncertainty. Similarly, predictions[j,i] gives the ith experiment predicted for the jth conformation.
In [3]:
num_frames, num_measurements = predictions.shape
print(measurements.shape)
print(uncertainties.shape)
print(predictions.shape)
In addition to the quantities necessary for the BELT calculation, it's also useful to examine additional structural properties such as the $\phi$ and $\psi$ backbone torsions. The follow code loads the backbone torsion angles $\phi$ and $\psi$ for the internal alanine in tri-alanine. In addition, we have also calculated the conformation states of each snapshot ($\alpha_r$, $\beta$, $PP_{II}$ $\;$, or $\alpha_l$), using the state definitions from [2].
In [4]:
phi, psi, assignments, indicators = example_loader.load_alanine_dihedrals()
In [5]:
num_samples = 20000 # Generate 20,000 MCMC samples
thin = 25 # Subsample (i.e. thin) the MCMC traces by 25X to ensure independent samples
burn = 5000 # Discard the first 5000 samples as "burn-in"
We also need to define the amount of regularization to use. We will use maximum entropy regularization, which prefers ensembles that weight each conformation uniformly. With a large value of regularization_strength
($\lambda \;$), we strongly prefer the raw MD ensemble, which gives equal population to each conformation.
In [6]:
regularization_strength = 3.0
In [7]:
belt_model = belt.MaxEntBELT(predictions, measurements, uncertainties, regularization_strength)
We are good to go! We now sample the likelihood of models given our data:
In [8]:
belt_model.sample(num_samples, thin=thin, burn=burn)
Any calculation performed using Markov chain Monte Carlo should be checked for convergence. There are many sophisticated tests of convergence, but a powerful heuristic is to graphically examine the trace of model parameters $\alpha$. A properly sampled and thinned MCMC trace should look like white noise:
In [9]:
a0 = belt_model.mcmc.trace("alpha")[:,0]
plt.plot(a0)
plt.title(r"MCMC trace of $\alpha$")
Out[9]:
The key outputs of an BELT calculation are the posterior-averaged conformational populations. The accumulate_populations()
member function calculates these populations for you:
In [10]:
p = belt_model.accumulate_populations()
In [11]:
print("Reduced chi squared of raw MD: %f" % np.mean(((predictions.mean(0) - measurements) / uncertainties)**2))
print("Reduced chi squared of BELT: %f" % np.mean(((predictions.T.dot(p) - measurements) / uncertainties)**2))
The above verifies that our BELT model gives excellent agreement with the available experimental data--much more so than the raw MD. At this point, we can dig deeper into understanding the results of our calculation. Because we are looking at a single residue, we can visualize the relevant populations using a Ramachandran plot of the $\phi$ and $\psi$ torsion angles. We first look at the raw MD simulation.
In [12]:
h,x,y = np.histogram2d(phi,psi, bins=100)
extent = [-180,180,-180,180]
plt.figure()
plt.xlabel("$\phi$")
plt.ylabel("$\psi$")
plt.title(r"MD Population Density")
plt.imshow(h.T,origin='lower',extent=extent)
plt.xlim(-180,180);
plt.ylim(-180,180);
We now look at the reweighted (i.e. BELT) ensemble. You will notice that the $PP_{II}$ $\;$ basin (top center-right) has increased its population, while the $\beta$ basin (top left) has decreased its population. We therefore conclude that the Amber ff96 forcefield is overly $\beta$-prone; the BELT model has partially corrected this bias.
In [13]:
h,x,y = np.histogram2d(phi,psi,bins=100,weights=p)
plt.figure()
plt.xlabel("$\phi$")
plt.ylabel("$\psi$")
plt.title(r"Reweighted Population Density")
plt.imshow(h.T,origin='lower',extent=extent)
plt.xlim(-180,180);
plt.ylim(-180,180);
Congratulations! You've finished the first tutorial. Continue on to the next tutorial here [X].
Beauchamp, K. A., Das, R. , and Pande, V. S. Inferring Conformational Ensembles from Noisy Experiments. In Prep.
Sosnick et al. Helix, sheet, and polyproline II frequencies and strong nearest neighbor effects in a restricted coil library. Biochemistry, 2005.