Modelling Conformational Ensembles with FitEnsemble

Introduction

In this tutorial, we use Bayesian Energy Landscape Tilting (BELT) to infer the conformational ensemble of tri-alanine. This calculation is similar to the one done in [1], but uses only a single experiment (for speed reasons).

Performing an BELT calculation requires three inputs:

A set of conformations to be used as a "starting" guess for the ensemble: $x_i$
A set of experimental measurements and their uncertainties
The predicted experimental observables (predictions) at each conformation.

BELT combines measurements, uncertainties, and predictions using Bayesian inference to produce a conformational ensemble that best agrees with your data.

Importing FitEnsemble

We first import some Python libraries used in the tutorial



In [1]:

    
import numpy as np
import matplotlib.pyplot as plt
from fitensemble import belt, example_loader

Load Input Data

We will use a single $^3J(H^N H^\alpha )$ $ \;$ NMR measurement to reweight molecular dynamics simulations performed in the Amber ff96 forcefield. Note that a single experiment is not sufficient to correct the forcefield bias in ff96. However, the single-experiment calculation illustrates the key points and runs in under a minute.

We next load the data used to connect simulation and experiment. The trialanine data is loaded by a helper function (load_alanine) included in FitEnsemble for the purpose of simplifying the tutorial.



In [2]:

    
predictions, measurements, uncertainties = example_loader.load_alanine_numpy()

The inputs to FitEnsemble should be Numpy arrays. Be careful to get the input shapes correct: measurements and uncertainties should have shape (num_measurements), while predictions should have shape (num_frames,num_measurements`). Thus, measurements[i] and uncertainties[i] give the ith experimental measurement and uncertainty. Similarly, predictions[j,i] gives the ith experiment predicted for the jth conformation.



In [3]:

    
num_frames, num_measurements = predictions.shape

print(measurements.shape)
print(uncertainties.shape)
print(predictions.shape)









    



(1,)
(1,)
(295189, 1)

In addition to the quantities necessary for the BELT calculation, it's also useful to examine additional structural properties such as the $\phi$ and $\psi$ backbone torsions. The follow code loads the backbone torsion angles $\phi$ and $\psi$ for the internal alanine in tri-alanine. In addition, we have also calculated the conformation states of each snapshot ($\alpha_r$, $\beta$, $PP_{II}$ $\;$, or $\alpha_l$), using the state definitions from [2].



In [4]:

    
phi, psi, assignments, indicators = example_loader.load_alanine_dihedrals()

Set Run Parameters

We also need to define a number of parameters for the BELT calculation. These include the number of MCMC samples to generate, the amount by which to thin (i.e. subsample) the traces, and the number of samples to discard during the equilibration:



In [5]:

    
num_samples = 20000  # Generate 20,000 MCMC samples
thin = 25  # Subsample (i.e. thin) the MCMC traces by 25X to ensure independent samples
burn = 5000  # Discard the first 5000 samples as "burn-in"

We also need to define the amount of regularization to use. We will use maximum entropy regularization, which prefers ensembles that weight each conformation uniformly. With a large value of regularization_strength ($\lambda \;$), we strongly prefer the raw MD ensemble, which gives equal population to each conformation.



In [6]:

    
regularization_strength = 3.0

Create BELT Model

At this point, we are almost ready to go. We create an BELT object using our simulation and experimental data:



In [7]:

    
belt_model = belt.MaxEntBELT(predictions, measurements, uncertainties, regularization_strength)

We are good to go! We now sample the likelihood of models given our data:



In [8]:

    
belt_model.sample(num_samples, thin=thin, burn=burn)









    



[****************100%******************]  25000 of 25000 complete

Check Convergence

Any calculation performed using Markov chain Monte Carlo should be checked for convergence. There are many sophisticated tests of convergence, but a powerful heuristic is to graphically examine the trace of model parameters $\alpha$. A properly sampled and thinned MCMC trace should look like white noise:



In [9]:

    
a0 = belt_model.mcmc.trace("alpha")[:,0]
plt.plot(a0)
plt.title(r"MCMC trace of $\alpha$")









    Out[9]:





<matplotlib.text.Text at 0x4a409d0>

The key outputs of an BELT calculation are the posterior-averaged conformational populations. The accumulate_populations() member function calculates these populations for you:



In [10]:

    
p = belt_model.accumulate_populations()

Evaluate Model Quality

You should now check the agreement between the simulation and experiment. We check both the raw MD simulation and the BELT model.



In [11]:

    
print("Reduced chi squared of raw MD: %f" % np.mean(((predictions.mean(0) - measurements) / uncertainties)**2))
print("Reduced chi squared of BELT: %f" % np.mean(((predictions.T.dot(p) - measurements) / uncertainties)**2))









    



Reduced chi squared of raw MD: 6.429270
Reduced chi squared of BELT: 0.020964

Examine your Results

The above verifies that our BELT model gives excellent agreement with the available experimental data--much more so than the raw MD. At this point, we can dig deeper into understanding the results of our calculation. Because we are looking at a single residue, we can visualize the relevant populations using a Ramachandran plot of the $\phi$ and $\psi$ torsion angles. We first look at the raw MD simulation.



In [12]:

    
h,x,y = np.histogram2d(phi,psi, bins=100)
extent = [-180,180,-180,180]
plt.figure()
plt.xlabel("$\phi$")
plt.ylabel("$\psi$")
plt.title(r"MD Population Density")
plt.imshow(h.T,origin='lower',extent=extent)
plt.xlim(-180,180);
plt.ylim(-180,180);

We now look at the reweighted (i.e. BELT) ensemble. You will notice that the $PP_{II}$ $\;$ basin (top center-right) has increased its population, while the $\beta$ basin (top left) has decreased its population. We therefore conclude that the Amber ff96 forcefield is overly $\beta$-prone; the BELT model has partially corrected this bias.



In [13]:

    
h,x,y = np.histogram2d(phi,psi,bins=100,weights=p)
plt.figure()
plt.xlabel("$\phi$")
plt.ylabel("$\psi$")
plt.title(r"Reweighted Population Density")
plt.imshow(h.T,origin='lower',extent=extent)
plt.xlim(-180,180);
plt.ylim(-180,180);

Congratulations! You've finished the first tutorial. Continue on to the next tutorial here [X].

References

Beauchamp, K. A., Das, R. , and Pande, V. S. Inferring Conformational Ensembles from Noisy Experiments. In Prep.
Sosnick et al. Helix, sheet, and polyproline II frequencies and strong nearest neighbor effects in a restricted coil library. Biochemistry, 2005.