Inferring Cluster Model Parameters from an X-ray Image

Forward modeling is always instructive: we got a good sense of the parameters of our cluster + background model simply by generating mock data and visualizing it.

The "inverse problem", also known as "inference," is to learn the parameters of an assumed model from a set of data. Intuitively we can see how it is going to work: try a lot of possible parameter combinations, and see which ones "match" the data.

The sampling distribution ${\rm Pr}(d|\theta,H)$ encodes uncertainty about what might have been, given a model (or hypothesis) $H$ with parameters $\theta$. It allows us to generate mock datasets that are similar to the data that we do observe.

Our inability to guess parameter values accurately first time shows that we are uncertain about them. In Bayesian inference, we use probability distributions to describe this uncertainty mathematically.

Probability

The idea of using probability distributions to quantify the uncertainty in our model parameters (and indeed in the models themselves) is due to Pierre Simon Laplace (1774), who rediscovered Thomas Bayes' earlier results on the probability of future events given their past history.

Let's remind ourselves how probabilities work

Laplace and Bayes' key result is the following, usually referred to as "Bayes' Theorem:"

${\rm Pr}(\theta|d,H) = \frac{1}{{\rm Pr}(d|H)}\;{\rm Pr}(d|\theta,H)\;{\rm Pr}(\theta|H)$

What you know about your model parameters given the data is what you knew about them before $\left[ {\rm Pr}(\theta|H) \right]$, combined with what the data are telling you $\left[ {\rm Pr}(d|\theta,H) \right]$.

${\rm Pr}(\theta|d,H)$ is called the posterior probability distribution for the parameters given the data and the model, and is the general solution to the inverse problem.

Before we take any data, our uncertainty about our model parameter values is encoded in the prior PDF for the parameters given the model, ${\rm Pr}(\theta|H)$.

Both the posterior and prior PDFs are functions of the model parameters. The sampling distribution ${\rm Pr}(d|\theta,H)$ is a function of the data given the parameters - when written as a function of $\theta$ it is called the likelihood of the parameters given the model.

The likelihood captures the information that is in the data, and so lies at the center of data analysis.

${\rm Pr}(d|\theta,H)$ has the form of a prior over datasets. This makes sense: we can imagine defining this PDF and using it to generate mock data without us ever having seen any real data at all!

PGMs for Inverse Problems

Here's the probabilistic graphical model for the inverse X-ray cluster model problem:



In [1]:

    
# import cluster_pgm
# cluster_pgm.inverse()

from IPython.display import Image
Image(filename="cluster_pgm_inverse.png")









    Out[1]:

Spot the difference!

The data, $N_k$, now have a double circle around them, to remind us that even though we are supposing that they have been drawn from the sampling distribution ${\rm Pr}(N_k\;|\;\mu_k(\theta),{\rm ex}_k,{\rm pb}_k,H)$, in practice we only get one draw, and the $N_k$ are constants.

The parameters, $\theta$, are now allowed to vary, and are not asserted to have fixed values - so they come with a probability distribution represented by a circular node: the prior.

This PGM illustrates the joint PDF for the parameters and the data, which can be factorised as:

$\prod_k \; {\rm Pr}(N_k\;|\;\mu_k(\theta),{\rm ex}_k,{\rm pb}_k,H) \; {\rm Pr}(\,\theta\,|H)$

Note the huge product over pixel values!

It can also be factorised to:

${\rm Pr}(\,\theta\,|\{N_k\}\,H) \; {\rm Pr}(\{N_k\}\,|H)$

which is, up to the normalizing constant, the posterior PDF for the model parameters, given all the data $\{N_k\}$. This is just Bayes Theorem rearranged, with the normalizing denominator appearing on the left hand side instead.

PGMs can be used to design inferences

Calculating Posterior PDFs

Notice that the prior PDF ${\rm Pr}(\theta|H)$ and the likelihood function ${\rm Pr}(d|\theta,H)$ can be evaluated at any point in the parameter space.

This means that we can always simply evaluate the posterior PDF on a grid (or at least attempt to), and normalize it by numerical integration.

Let's do this for a simplified version of our X-ray cluster model.



In [2]:

    
%load_ext autoreload
%autoreload 2



In [3]:

    
from __future__ import print_function
import numpy as np
import cluster



In [4]:

    
lets = cluster.XrayData()

lets.read_in_data()
lets.set_up_maps()



In [5]:

    
x0,y0 = 328,328    # The center of the image is 328,328
S0,b = 0.001,1e-6  # Cluster and background surface brightness, arbitrary units
beta = 2.0/3.0     # Canonical value is beta = 2/3
rc = 12            # Core radius, in pixels

logprob = lets.evaluate_unnormalised_log_posterior(x0,y0,S0,rc,beta,b)



In [6]:

    
print (logprob)









    



-28132.2655058

Good. Here's the code that is being run, inside the "XrayData" class:

def evaluate_log_prior(self):
        # Uniform in all parameters...
        return 0.0  # HACK


    def evaluate_log_likelihood(self):
        self.make_mean_image()
        # Return un-normalized Poisson sampling distribution:
        # log (\mu^N e^{-\mu} / N!) = N log \mu - \mu + constant
        return np.sum(self.im * np.log(self.mu) - self.mu)


    def evaluate_unnormalised_log_posterior(self,x0,y0,S0,rc,beta,b):
        self.set_pars(x0,y0,S0,rc,beta,b)
        return self.evaluate_log_likelihood() + self.evaluate_log_prior()

It's worth starting at, and thinking about, this code for a few minutes.

Recall from the PGM discussion above that we have ${\rm Pr}(\,\theta\,|\{N_k\}\,H) = \frac{1}{Z} \prod_k \; {\rm Pr}(N_k\;|\;\mu_k(\theta),{\rm ex}_k,{\rm pb}_k,H) \; {\rm Pr}(\,\theta\,|H)$ where $Z = {\rm Pr}(\{N_k\}\,|H)$

The product over (assumed) independent pixel values' Poisson sampling distribution terms becomes a sum in the log likelihood.

If the prior PDF for all parameters is uniform, then the log prior (and the prior) is just a constant (whose actual value is unimportant). In other problems we will need to be more careful than this!

Now let's try evaluating the 2D posterior PDF for cluster position, conditioned on reasonable values of the cluster and background flux, cluster size and beta:



In [7]:

    
npix = 15

# Initial guess at the interesting range of cluster position parameters:
#xmin,xmax = 310,350
#ymin,ymax = 310,350

# Refinement, found by fiddling around a bit:
xmin,xmax = 327.7,328.3
ymin,ymax = 346.4,347.0

x0grid = np.linspace(xmin,xmax,npix)
y0grid = np.linspace(ymin,ymax,npix)
logprob = np.zeros([npix,npix])
for i,x0 in enumerate(x0grid):
    for j,y0 in enumerate(y0grid):
        logprob[j,i] = lets.evaluate_unnormalised_log_posterior(x0,y0,S0,rc,beta,b)
    print ("Done column",i)









    



Done column 0
Done column 1
Done column 2
Done column 3
Done column 4
Done column 5
Done column 6
Done column 7
Done column 8
Done column 9
Done column 10
Done column 11
Done column 12
Done column 13
Done column 14



In [8]:

    
print (logprob[0:5,0])









    



[-4331.31567117 -4329.10284341 -4327.28054096 -4325.84891577 -4324.80810427]

To normalize this, we need to take care not to try and exponentiate any very large or small numbers...



In [9]:

    
Z = np.max(logprob)
prob = np.exp(logprob - Z)
norm = np.sum(prob)
prob /= norm



In [10]:

    
print (prob[0:5,0])









    



[  5.26903048e-08   4.81669999e-07   2.97965165e-06   1.24713198e-05
   3.53127149e-05]

Let's plot this as a 2D probability density map.



In [11]:

    
import astropy.visualization as viz
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 10.0)



In [12]:

    
plt.imshow(prob, origin='lower', cmap='Blues', interpolation='none', extent=[xmin,xmax,ymin,ymax])
plt.xlabel('x / pixels')
plt.ylabel('y / pixels')









    Out[12]:





<matplotlib.text.Text at 0x10e313ad0>

The above figure captures and illustrate our uncertainty about the position of the cluster center, given our model (and its assumptions), plus our assertions about the values of the other parameters. We say it is "conditioned on" the values of $\beta$, $r_c$, $S_0$ and $b$.

To fully account for our uncertainty we should allow all model parameters to vary, and compute the 6D posterior PDF. As you can see, this will be time consuming! In sessions 3 and 4 we will look at ways to do these calculations far more efficiently.

Summarizing Posterior PDFs

The Bayesian solution to an inference problem is the posterior PDF for the parameters given the data. However, it is often helpful to compress that information into a few numbers (to put in the abstract of a paper, for example).

Most useful summaries of posterior PDFs are integrals. For example:

The posterior mean, $\langle x \rangle = \int\;{\rm Pr}(x|d,H)\;dx$

The posterior median, $x_{50}$ s.t. $\int_{x_{50}}^{\infty}\;{\rm Pr}(x|d,H)\;dx = 0.5$

The $N^{th}$ percentile, $x_{N}$ s.t. $\int_{x_{N}}^{\infty}\;{\rm Pr}(x|d,H)\;dx = N\%$

The posterior probability that $x > a$, $\int_{a}^{\infty}\;{\rm Pr}(x|d,H)\;dx$

The reason why integral quantities are useful is that they quantify probability mass rather than probability density: they are insensitive to sharp features that have low associated integrated probability.

For a 2D PDF like ours, the equivalent quantity to a percentile is a "confidence contour," that encloses a specified percentage of the integrated posterior probability. Standard choices are the contours that enclose 68% and 95% of the probability. (Do you know why?)

Let's add these contours to our plot.



In [13]:

    
# First, double-check that the posterior PDF sums
# (i.e. approximately integrates) to 1:

print (np.sum(prob))

1.0



In [16]:

    
# Now, sort the pixel value of the PDF map, and
# find the cumulative distribution:

sorted = np.sort(prob.flatten())
C = sorted.cumsum()

# Find the pixel values that lie at the levels that contain
# 68% and 95% of the probability:
lvl68 = np.min(sorted[C > (1.0 - 0.68)])
lvl95 = np.min(sorted[C > (1.0 - 0.95)])

plt.imshow(prob, origin='lower', cmap='Blues', interpolation='none', extent=[xmin,xmax,ymin,ymax])
plt.contour(prob.T,[lvl95,lvl68],colors='black',extent=[xmin,xmax,ymin,ymax])
plt.xlabel('x / pixels')
plt.ylabel('y / pixels')

plt.savefig("figures/cluster_2D-inferred-xy.png")

In the next example, we'll look at some 1D marginalised distributions.



In [ ]: