Week 5 Tutorial

Checking and Comparing Models

Introduction

In this tutorial we'll look at some simple, realistic, simulated data, and do some model evaluation:

  • We'll fit a simple model, and then do a posterior predictive model check of the adequacy of the fit
  • We'll quantify the generalized predictive accuracy of the model with the DIC
  • We'll calculate the Bayesian evidence for the model

For homework, you will:

  • Devise an alternative model, repeat the above steps, and compare your more complex model with the simple starting one.

Since we know that you'll need to repeat the above steps we will aim to write this notebook such that, when completed, it provides a module that you can extend at home.

Reminder!

After pulling down the tutorial notebook, immediately make a copy. Then do not modify the original. Do your work in the copy. This will prevent the possibility of git conflicts should the version-controlled file change at any point in the future.


In [ ]:
class SolutionMissingError(Exception):
    def __init__(self):
        Exception.__init__(self,"You need to edit the code provided for it to run!")
def REPLACE_WITH_YOUR_SOLUTION():
    raise SolutionMissingError

The Dataset

Our data is just a list of numbers. Each one represents a measured distance $y$ between two different estimates of the center of a galaxy cluster: the location of the brightest galaxy and a centroid of the emissive, diffuse gas. The context here is that automated algorithms sometimes fail to chose the central galaxy correctly (because of image artifacts or other problems), whereas the gas centroid is more reliable but also more expensive to measure. Therefore, we'd like to use this data set to characterize the distribution of mis-centerings so that the galaxy-based centers can be used for large sample, with the resulting errors propagated forward through future processing, e.g., weak lensing estimates.

Let's load up the data and have a look.


In [ ]:
import numpy as np
import scipy.stats as st
import matplotlib.pyplot as plt

plt.rcParams['xtick.labelsize'] = 'large'
plt.rcParams['ytick.labelsize'] = 'large'
%matplotlib inline
# or %matplotlib notebook

In [ ]:
# Load data into global variable y. 
# You'll need to make sure of the data file's location! A good guess would be:
# y = np.loadtxt('../../data/model_comparison.dat')
y = REPLACE_WITH_YOUR_SOLUTION()

In [ ]:
plt.rcParams['figure.figsize'] = (8.0, 6.0)

bins = np.linspace(0,1000,20)
plt.hist(y, bins=bins, color="lightgray")

plt.xlabel("Measured distance $y$")
plt.ylabel("Frequency");

Preamble: Pre-registering a Hypothesis

The hypothesis we will test in this tutorial is the model outlined in the next section - but how well that model fits the data is a question we will answer in part using a test statistic.

Having understood what the data represent (and had a quick look at them), what feature in the data do you want your model to explain well?

With this in mind, what is a good test statistic to summarize the data? Spend a few mins thinking about this and discussing it, before writing it down below. You'll then use this "pre-registered" test statistic in a Bayesian model check below.

"Registering a hypothesis" in this way is a good idea (and standard practice in clinical trials in medical research) - it helps you avoid falling into the trap of a posteriori reasoning. Here's an interesting article in Science magazine on the topic, to read at home.


In [ ]:
# Test statistic: ...

Evaluating a Simple Model

First, let's assume a simple model $H_1$, that the sampling distribution is an exponential:

Model 1: $P(y|a_1, H_1) = \frac{1}{a_1}e^{-y/a_1}$; $y\geq0$

We'll then want to fit that model to the data, and then evaluate the model using the methods we saw in the model evaluation lesson. These include:

  • A visual check using replica datasets drawn from the posterior predictive distribution
  • A quantitative posterior predictive model check using a suitable test statistic $T(y)$
  • Calculating the Deviance Information Criterion (DIC) to assess the model's (relative) generalized predictive accuracy
  • Calculating the Bayesian Evidence to provide insight on the (relative) probability of the model given the data.

Notice that each of these bulleted operations can be coded as a function of the model (e.g. a visual check of the model, the evidence of the model, and so on). That suggests that we should write a class that completely describes the model, and then a set of functions that act on model objects passed to them.

Look carefully through the SimpleModel class below. It contains methods for evaluating its various PDFs, and also for drawing samples (or replica datasets) from them.

Note that some methods have lines that need to be replaced with your own code before they will run! The first part of the tutorial will involve fixing these lines. Each time you edit the class, you'll need to re-run its cell, and also re-make any instances of the class that you are using.

When you are done reading the code, scroll down to the exercises.



In [ ]:
class SimpleModel(object):
    """
    Parameter inference and model evaluation in a simple cluster mis-centering analysis.
    """
    def __init__(self):
        # Sometimes it will be convenient to compute many log_likelihood values at once:
        self.vectorized_log_likelihood = np.vectorize(self.log_likelihood)
        # NB Higher dimension models would need an additional kwarg, signature="(Ndim)->()"
        # Dimensionality of the parameter space:
        self.Ndim = 1
        # Every prior needs some hyperparameters to define it:
        self.hyperparameters = REPLACE_WITH_YOUR_SOLUTION()
        return
  
    def __repr__(self):
        return "Simple model assuming exponential sampling distribution with mean a, and a <to be named> prior for a, with hyperparameters {} and {}.".format(self.hyperparameters[0], self.hyperparameters[1])

    def log_prior(self, a):
        """
        Evaluate the log prior PDF P(a|H)
        """
        value = REPLACE_WITH_YOUR_SOLUTION()
        return value

    def draw_samples_from_prior(self, Ns):
        """
        Draw samples from the prior PDF P(a|H)
        """
        samples = REPLACE_WITH_YOUR_SOLUTION() 
        return samples

    def log_likelihood(self, a):
        """
        Evaluate the log of the likelihood function L(a) = P(y|a,H)
        
        May need to guard against values of a<0...
        """
        value = np.sum(st.expon.logpdf(y, scale=a))
        return value
    
    def log_posterior(self, a):
        """
        Evaluate the log of the (unnormalized) posterior PDF P(a|y,H)
        
        Notes
        ----
        * We'll use this with an MCMC sampler, so it should call the non-vectorized likelihood.
        """
        value = REPLACE_WITH_YOUR_SOLUTION()
        return value

    def sampling_distribution(self, yy, a):
        """
        Evaluate the sampling distribution P(yy|a,H) at a point in data space yy given parameter a
            
        Notes
        -----
        This is useful for making plots of "the model" overlaid on the histogram of the data
        """
        value = st.expon.pdf(yy, scale=a)
        return value 
        
    def generate_replica_dataset(self, a):
        """
        Draw a replica dataset y_rep from the sampling distribution P(y_rep|a,H)
        """
        Nd = len(y)
        y_rep = st.expon.rvs(size=Nd, scale=a) 
        return y_rep
   
    def draw_samples_from_posterior(self, guess=None, nwalkers=None, nsteps=None, burn=None, thinby=None):
        """
        Use emcee to draw samples from P(a|y,H)
        """
        # Deal with unset inputs:
        if guess is None: print("You need to specify a starting point in parameter space with the `guess=` kwarg...")
        if nwalkers is None: print("You need to specify the `nwalkers=` kwarg...")
        if nsteps is None: print("You need to specify the chain length `nsteps=` kwarg...")
        if burn is None: print("You need to specify the length of burnin `burn=` kwarg...")
        if thinby is None: print("You need to specify the thinning factor `thinby=` kwarg...")
      
        import emcee
        # The density to sample is this model's own posterior PDF"
        lnprob = self.log_posterior
        npars = self.Ndim
        
        self.sampler = emcee.EnsembleSampler(nwalkers, npars, lnprob) 
        # You could add e.g. threads=4 to speed things up with multiprocessing
        
        # Generate an ensemble of walkers within +/-1% of the guess:
        theta_0 = np.array([[guess]*(1.0 + 0.01*np.random.randn(npars)) for j in range(nwalkers)])
        # Note that the initial parameter array theta_0 should have dimensions nwalkers × npars
        
        # Evolve the ensemble:
        self.sampler.run_mcmc(theta_0, nsteps)
        
        # Plot the raw samples:
        plt.rcParams['figure.figsize'] = (12.0, 6.0)
        plt.subplot(211)
        for j in range(nwalkers): 
            plt.plot(self.sampler.chain[j,:,0], 'o', alpha=0.2)
        plt.title("Raw Markov chains")
            
        # Extract the chain, remove burnin, merge, and thin:
        samples = self.sampler.chain[:, burn:, :].reshape((-1, npars))
        samples = samples[range(0,samples.shape[0],thinby),:]
        
        # Keep the samples with the model for future use!
        self.samples = samples
        self.Nsamples = len(samples)
        
        # Plot the thinned chain
        plt.subplot(212)
        plt.plot(samples[:,0], 'o')
        plt.title("Thinned, post-burnin chains");

        return samples

1. Select a prior distribution for the parameter in Model 1.

Make sure it's a proper (normalizable) distribution. We don't want to deal with improper distributions when calculating the evidence later on.

Now implement your prior PDF in the class above. For example, to adopt a uniform prior, you could just use two hyperparameters to define the upper and lower limits of a uniform distribution. Note that you need to write code both to evaluate the log prior and draw samples from the prior.

Once you have completed your functions, instantiate a model and draw a dozen samples from its prior, like this:


In [ ]:
Model1 = SimpleModel()
print(Model1)

a = Model1.draw_samples_from_prior(12)
print("12 sample a values drawn from the prior: ",a)

2. Write the log posterior function for Model 1, and test it.

Edit the class code, and then re-run its cell and the Model1 assignment cell. Do some simple tests of the Model1 log posterior method to make sure it does what you want.


In [ ]:


In [ ]:

3. Fit Model 1 to the data.

The draw_samples_from_posterior method carries out a parameter inference with emcee, displaying its Markov chains, removing burn-in, thinning, and concatenating the chains. Since this step isn't really the point of this problem, the code is given to you, but you'll still need to experiment with the keyword argument ("kwarg") inputs (and read the code to see what they do) in order to get good results. (The suggestions in the cell below are pretty terrible.)

NB. For future reference, the MCMC samples are also stored in the Model.samples array.

If you do not already have emcee installed, you should be able to pip install it. The documentation is available here.


In [ ]:
samples = Model1.draw_samples_from_posterior(guess=1000.0, nwalkers=100, nsteps=10, burn=0, thinby=100)

It will be useful for later to know the mean of the posterior:


In [ ]:
pm = np.mean(Model.samples, axis=0)
print("Posterior mean value of a = ",pm)

4. Visually compare the predictions of Model 1 with the data.

First, let's just plot the posterior-mean model over the data.


In [ ]:
plt.rcParams['figure.figsize'] = (8.0, 6.0)

# First the histogram of observed data, as backdrop:
plt.hist(y, bins=bins, color="lightgray", density=True, label="Observed")

# Now overlay a curve following the sampling distribution conditioned on the posterior mean value of a: 
pp = REPLACE_WITH_YOUR_SOLUTION()
plt.plot(bins, pp, linestyle="dashed", color="red", label="Posterior mean model")

plt.xlabel("Measured distance $y$")
plt.ylabel("Normalized Frequency")
plt.legend();

This kind of plot should be familiar: it's often a good idea to evaluate model adequacy in data space. You should already be able to see telling differences between the a well-fitting model's sampling distribution, and the data histogram.

Now, let's compare a random predicted ("replica") data set, drawn from the posterior predictive distribution, with the data. To do this we first draw a parameter value from the posterior PDF, and then generate a dataset from the sampling distribution conditioned on that value. The result is a sample from $P(y_{rep}|y)$.


In [ ]:
plt.rcParams['figure.figsize'] = (8.0, 6.0)

# First the histogram of observed data, as backdrop:
plt.hist(y, bins=bins, color="lightgray", density=True, label="Observed")

# Choose a posterior sample "at random" and generate a replica dataset, and show its histogram
j = REPLACE_WITH_YOUR_SOLUTION()
a = Model.samples[j]
mock = REPLACE_WITH_YOUR_SOLUTION()
plt.hist(mock, bins=bins, alpha=1.0, histtype="step", color="red", density=True, label="Sample posterior prediction")

plt.xlabel("Measured distance $y$")
plt.ylabel("Normalized Frequency")
plt.legend();

This plot is nice because it is comparing apples with apples: do mock datasets drawn from our model sampling distribution with any plausible parameter value "look like" the real data?

To best evaluate this, we want to visualize the posterior predictive distribution of replica datasets. We can do this by plotting many replica datasets drawn from the posterior predictive PDF, e.g. one for each of our posterior samples. Let's put this plot in a function, so we can re-use it later.


In [ ]:
def visual_check(Model, Nreps=None):
    plt.rcParams['figure.figsize'] = (8.0, 6.0)

    # First the histogram of observed data, as backdrop:
    bins = np.linspace(0,1000,20)
    plt.hist(y, bins=bins, color="lightgray", density=True, label="Observed")

    # Compute the posterior mean parameter (vector)
    pm = np.mean(Model.samples, axis=0)

    # Make a large number of replica datasets, and overlay histograms of them all 
    if Nreps is None: Nreps = len(Model.samples)
    alpha = 5.0 / Nreps
    for j in range(Nreps): 
        if j==0:  
            # Plot a dataset drawn using a = the posterior mean a, to give a good legend
            mock = REPLACE_WITH_YOUR_SOLUTION()
            plt.hist(mock, bins=bins, histtype="step", alpha=1.0, color="red", density=True, label="Sample posterior predictions")
        else:     
            # Take the next posterior sample a and generate a replica dataset
            mock = REPLACE_WITH_YOUR_SOLUTION()
            plt.hist(mock, bins=bins, histtype="step", alpha=alpha, color="red", density=True)

    # Include the posterior mean model for comparison
    pp = REPLACE_WITH_YOUR_SOLUTION()
    plt.plot(bins, pp, linestyle="dashed", color="red", label="Posterior mean model")

    plt.xlabel("Measured distance $y$")
    plt.ylabel("Normalized Frequency")
    plt.legend();
    
    return

In [ ]:
visual_check(Model1)

5. Quantitative posterior predictive model check

Now let's quantify the (in)adequacy of the fit with a quantitative posterior predictive model check. We define a test statistic $T(y)$ that summarizes a dataset $y$, and look at its posterior predictive distribution.

Code up the test statistic you came up with when looking at the data at the start of this tutorial - that relates to your "pre-registered hypothesis."


In [ ]:
def T(y):
    """
    Compute a test statistic to summarize a single dataset y
    """
    T = REPLACE_WITH_YOUR_SOLUTION()
    return T

To sample the posterior predictive distribution of test statistics $P(T(y_{rep})|y)$, we need to generate replica datasets from the model:


In [ ]:
def distribution_of_T(Model):
    """
    Compute T(yrep) for each yrep drawn from the posterior predictive distribution, using samples stored in Model
    """
    TT = np.array([T(Model.generate_replica_dataset(a)) for a in Model.samples])
    return TT

We can now do the following:

  • plot a histogram of TT
  • compare that distribution with T(real data)
  • compute and report the p-value for T

And we want all of that in packaged in functions of the model, so that we can re-use it later (on different models!).

First let's write a function to compute the p-value, $P(T > T(y)|y,H)$:


In [ ]:
def pvalue(Model):
    """
    Compute the posterior predictive p-value, P(T > T(y)|y,H):
    """
    p = REPLACE_WITH_YOUR_SOLUTION()
    # Print out the result and return it
    print("The (approximate) posterior predictive probability P(T > T(y)|y,H) = {}".format(p))
    return p

And then write a function that plots the distribution of T, and reports the p-value:


In [ ]:
def posterior_predictive_check(Model):
    """
    Compute the posterior predictive distribution of the test statistic T(y_rep), and compare with T(y_obs) 
    """
    # Compute distribution of T(yrep):
    TT = distribution_of_T(Model)
    
    # Plot:
    plt.rcParams['figure.figsize'] = (8.0, 6.0)
    plt.hist(TT, bins=50, histtype="step", color="red", label="$P(T(y_{\\rm rep})|y)$")
    
    # Overlay T(y_obs):
    plt.axvline(x=T(y), color="gray", linestyle="dashed", label="$T(y_{\\rm observed})$")
    plt.xlabel("Test statistic T(y)")
    plt.ylabel("Posterior predictive probability density")
    plt.legend();
    
    # Compute p-value:
    p = pvalue(Model)

    return p

In [ ]:
p1 = posterior_predictive_check(Model1)

6. Calculate the DIC for Model 1.

We saw in class that the Deviance Information Criterion is given by:

$\mathrm{DIC} = \langle D(\theta) \rangle + p_D; \quad p_D = \langle D(\theta) \rangle - D(\langle\theta\rangle)$

where the deviance $D(\theta)=-2\log P(\mathrm{data}|\theta)$, and averages $\langle\rangle$ are over the posterior.

In the homework we'll compare these to the corresponding values for a different model - so again, we need a re-usable function that takes a model object as input and returns the DIC, and pD. Let's write this function, and execute it for the simple model.


In [ ]:
def DIC(Model):
    """
    Compute the Deviance Information Criterion for the given model
    """
    
    pD = REPLACE_WITH_YOUR_SOLUTION()
    DIC = REPLACE_WITH_YOUR_SOLUTION()
    
    print("The effective number of parameters fitted, pD = {:.3f}. The value of the DIC is {:.1f}".format(pD,DIC))
    
    return DIC, pD

In [ ]:
DIC1, pD1 = DIC(Model1)

Does your value of $p_D$ make sense?

7. Compute the evidence for Model 1.

To do this, note that

$P(D|H)=\int P(D|\theta,H) \, P(\theta|H) d\theta$

can be approximated by an average over samples from the prior

$P(D|H) \approx \frac{1}{m}\sum_{k=1}^m P(D|\theta_k,H)$; $\theta_k\sim P(\theta|H)$.

This approach is not especially efficient or numerically stable in general (because the likelihood typically is large in only a small fraction of the prior volume), but it will do for this exercise.

In a function, draw a large number of samples from the prior and use them to calculate the evidence. To avoid numerical over/underflows, we'll use the special scipy function logsumexp to do the sum.

Roughly how precisely do we need to know the log Evidence, to be able to compare models?


In [ ]:
def log_evidence(Model, N=1000):
    """
    Compute the log evidence for the model
    """
    
    from scipy.special import logsumexp
    
    log_evidence = REPLACE_WITH_YOUR_SOLUTION()
    
    print("The log Evidence for the model is {}".format(log_evidence))
    return log_evidence

In [ ]:
%%time
logE1 = log_evidence(Model1, N=1000)

Summary

We now have a lot of useful machinery for evaluating models for our data. We can do both visual and quantitative posterior predictive model checks to assess the adequacy of the model fit, and we can compute the DIC and the Bayesian evidence for the model, for use in model comparison. In the homework you will use this machinery to investigate a more complex model of your own design. Go to "File" -> "Download as..." -> "Python (.py)" to export the code from this notebook to a module you can %load into your homework notebook.