File Input and Output

In this tutorial, we discuss saving an BELT model to disk for later use. We repeat the same calculations as in the previous tutorials, but this time save the model to disk for later use.



In [6]:

    
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from fitensemble import belt, example_loader

predictions, measurements, uncertainties = example_loader.load_alanine_numpy()
phi, psi, assignments, indicators = example_loader.load_alanine_dihedrals()

num_samples = 20000  # Generate 20,000 MCMC samples
thin = 25  # Subsample (i.e. thin) the MCMC traces by 25X to ensure independent samples
burn = 5000  # Discard the first 5000 samples as "burn-in"

regularization_strength = 3.0  # How strongly do we prefer a "uniform" ensemble (the "raw" MD)?

Saving to HDF5

The following code builds a model, just like before. However, we now pass a filename argument when we begin the MCMC sampling. This tells PyMC to save the results to disk as an HDF5 database. This is useful for situations where the entire MCMC trace cannot fit in system memory.



In [2]:

    
belt_model = belt.MaxEntBELT(predictions, measurements, uncertainties, regularization_strength)

pymc_filename = "trialanine.h5"
belt_model.sample(num_samples, thin=thin, burn=burn,filename=pymc_filename)









    



[****************100%******************]  25000 of 25000 complete

The HDF5 database file trialanine.h5 contains all components necessary to work with your BELT model. This includes

predictions
measurements
uncertainties
The MCMC trace

Loading from HDF5

To load an HDF5 file from disk, use the load() function:



In [3]:

    
belt_model = belt.MaxEntBELT.load("./trialanine.h5")

As we did previously, this model can be used to calculate the MAP conformational populations (p) or a trace (state_pops_trace) of an arbitrary observable:



In [7]:

    
p = belt_model.accumulate_populations()
state_pops_trace = belt_model.trace_observable(indicators.T)

state_pops = state_pops_trace.mean(0)
pd.DataFrame([state_pops],columns=[r"$PP_{II}$",r"$\beta$",r"$\alpha_r$",r"$\alpha_l$"],index=["BELT"])









    Out[7]:






  
    
      
      $PP_{II}$
      $\beta$
      $\alpha_r$
      $\alpha_l$
    
  
  
    
      BELT
       0.556701
       0.389624
       0.050868
       0.002806

As before, we see the conformational populations of our BELT ensemble.