Reminder!

After pulling down the tutorial notebook, immediately make a copy. Then do not modify the original. Do your work in the copy. This will prevent the possibility of git conflicts should the version-controlled file change at any point in the future. (The same exhortation applies to homeworks.)

Week 7 Tutorial

Fitting a cluster model to XMM image data

Alas, we return (to the X-ray imaging data from Week 2)! Refer to that tutorial for a basic introduction to the nature of the data products.

In case you don't still have the data sitting around, you can download them again from here:

(Remember that these are not to be committed to the repository.)

Previously, we ignored the big, honkin' cluster of galaxies in the center of the field, and concentrated on building a generative model for the point-like sources. This time, we'll simply mask out those sources, and instead try to fit a model to the extended emission from the cluster (Abell 1835).

Context

The basic problem is one of parameter inference, which you're familar with by now. What distinguishes this assignment from previous ones is

the number of free parameters
the (relatively) larger computational expense of evaluating the likelihood

Consequently, this is a good excuse for you (in the homework) to figure out how to work with one of the pre-packaged MCMC algorithms that exist. We have a non-exhaustive list here, but anything is fair game apart from the following restrictions:

You must be able to articulate how the method works (at the superficial level of our expositions in lecture).
emcee is off limits, since you previously saw it in homework, and we use it again in this tutorial.

We'll give you a lot of code below. It isn't intentionally buggy, but do look through it and make sure you're happy with what it does.

Import packages and read in the data

These are all the imports I think we might need...



In [ ]:

    
import astropy.io.fits as pyfits
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from astropy.visualization import LogStretch
logstretch = LogStretch()
import scipy.stats as stats
from io import StringIO
import daft
from matplotlib import rc
import emcee
import corner

class SolutionMissingError(Exception):
    def __init__(self):
        Exception.__init__(self,"You need to complete the solution for this code to work!")
def REPLACE_WITH_YOUR_SOLUTION():
    raise SolutionMissingError
REMOVE_THIS_LINE = REPLACE_WITH_YOUR_SOLUTION

Here's a class to deal with reading in and handling the data.



In [ ]:

    
class XrayData:
    def __init__(self, imagefile, expmapfile):
        """
        Read in the data from `imagefile' and `expmapfile'.
        Store the actual arrays from these in `im' and `ex'.
        Create helper arrays of the x and y IMAGE coordinates of each pixel, stored in `xs' and `ys'.
         - Do remember that IMAGE coordinates are not the same thing as array indices!
        Create a `mask' array indicating whether a given image pixel is actually illuminated (exptime>0).
         - More generally, this can be used to indicate whether we include a given pixel in the analysis.
         - This is real valued with 0's and 1's indicating exclusion and inclusion. There were once reasons for this.
        """
        ### read in the data
        self.imfits = pyfits.open(imagefile)
        self.im = self.imfits[0].data
        self.exfits = pyfits.open(expmapfile)
        self.ex = self.exfits[0].data
        ### make helper arrays of x and y coordinates
        self.xs = np.array([np.arange(self.im.shape[0]) for j in np.arange(self.im.shape[1])])
        self.ys = np.array([[j for i in np.arange(self.im.shape[1])] for j in np.arange(self.im.shape[0])])
        ### create a mask
        self.mask = self.ex * 0.0
        self.mask[self.ex > 0.0] = 1.0
    def mask_circle(self, x, y, r):
        """
        Change the `mask' array such that a circular region with radius `r' (in pixels) centered on `x' and `y'
        (in IMAGE coordinates) will be flagged to be ignored. For good measure, set `ex' to 0.0 in the same region.
        """
        there = ( (self.xs-x)**2 + (self.ys-y)**2 <= r**2 )
        self.mask[there] = 0.0
        self.ex[there] = 0.0
    def show(self):
        """Plot all the arrays we're holding onto, applying the current mask."""
        plt.rcParams['figure.figsize'] = (16.0, 10.0)
        fig, axs = plt.subplots(2, 3)
        axs[0,0].imshow(logstretch(self.im*self.mask), cmap='gray', origin='lower');
        axs[0,0].set_title("Masked image");
        axs[0,1].imshow(self.ex*self.mask, cmap='gray', origin='lower');
        axs[0,1].set_title("Masked exposure map");
        axs[0,2].imshow(self.mask, cmap='gray', origin='lower');
        axs[0,2].set_title("Mask");
        axs[1,0].imshow(self.xs*self.mask, cmap='gray', origin='lower');
        axs[1,0].set_title("Masked X ramp");
        axs[1,1].imshow(self.ys*self.mask, cmap='gray', origin='lower');
        axs[1,1].set_title("Masked Y ramp");
        axs[1,2].set_title("Nothing");

Try reading in the data and displaying it as a sanity check.



In [ ]:

    
try:
    exec(open('Solution/where_the_data_at.py').read())
except IOError:
    data_path = REPLACE_WITH_YOUR_SOLUTION() # relative or absolute path to the folder where the data files are
                                             # e.g., "./"
data = XrayData(data_path+"P0098010101M2U009IMAGE_3000.FTZ", data_path+"P0098010101M2U009EXPMAP3000.FTZ")



In [ ]:

    
data.show()

Now let's define the point source masks, using the list of sources we saw previously.



In [ ]:

    
regions_string = \
u"""232 399 9.6640149
188 418 6.6794089
362 474 7.8936824
336 417 5.8364482
381 359 3.8626665
391 418 5.5885783
398 294 3.538914
417 209 5.2474726
271 216 5.3269609
300 212 6.0974003
286 162 3.7078355
345 153 4.8141418
168 361 5.6344116
197 248 4.6760734
277 234 5.0308471
241 212 4.1267598
251 379 4.4363759
310 413 2.5081459
460 287 5.9048854
442 353 4.6259039
288 268 4.4204645
148 317 4.7704631
151 286 7.9281045
223 239 5.561259
490 406 4.0450217
481 318 4.7402437
"""
regions = np.loadtxt(StringIO(regions_string))



In [ ]:

    
# apply each region as a mask
for reg in regions:
    data.mask_circle(reg[0], reg[1], reg[2])



In [ ]:

    
# double check that it worked as expected (compare with above)
data.show()

Define the model

We will use a common parametric model for the surface brightness of galaxy clusters: the azimuthally symmetric beta model.

$S_\beta(r) = S_0 \left[1.0 + \left(\frac{r}{r_c}\right)^2\right]^{-3\beta + 1/2}$,

where $r$ is projected distance from the cluster center. Note that this model describes a 2D surface brightness distribution, with $r^2 = (x-x_0)^2 + (y-y_0)^2$ and $(x_0,y_0)$ being the cluster center. Also note that pixels are perfectly respectable units for coordinates and projected distance for our purposes.

The model should also include bacgkround, which for simplicity we will take to be uniform. So the total model surface brightness is

$S(r) = S_0 \left[1.0 + \left(\frac{r}{r_c}\right)^2\right]^{-3\beta + 1/2} + b$,

with model parameters $x_0$, $y_0$, $S_0$, $r_c$, $\beta$ and $b$. From experience, we know that the beta model parameters are degenerate such that $\ln(S_0)$ is much easier to sample than $S_0$, so we'll do that below.

Finally, we'll work with $S$ in units of counts/second/pixel, such that

$S(x_i,y_i)$ $\times$ ex$_i$ is the expected number of counts in the $i$th pixel, i.e. the mean of a Poisson likelihood function for the data im$_i$.

When working with this data set previously, we worried about the point spread function (PSF) blurring the image before we recorded it. Since we're now looking at an object which is significantly more extended than the PSF, we'll simplify by neglecting the blurring in this problem (not a great idea in real life, but it will suit us fine).

Useful functions follow:



In [ ]:

    
def betaModelFun(r, S0, rc, beta):
    '''
    The beta model function itself
    '''
    return S0 * (1.0 + (r/rc)**2)**(-3.0*beta + 0.5)

def betaModelImage(xs, ys, x0, y0, S0, rc, beta):
    '''
    Returns an image with values given by the (azimuthally symmetric) beta model parameters.
    The inputs `xs' and `ys' should be arrays of x and y coordinate values (in the same coordinate
      system as x0 and y0), and the rest are beta model parameters.
    The shape of the output is the same as that of `xs' and `ys' (which had better be the same).
    '''
    return betaModelFun(np.sqrt( (xs-x0)**2 + (ys-y0)**2 ), S0, rc, beta)

def modelImage(data, x0, y0, S0, rc, beta, bg):
    '''
    Returns an image of the whole model (beta model plus background).
    `data' should be of XrayData type.
    Note that the output is *not* masked according to `data.mask'.
    '''
    return (betaModelImage(data.xs, data.ys, x0, y0, S0, rc, beta) + bg) * data.ex

And even a PGM:



In [ ]:

    
plt.rcParams['figure.figsize'] = 14, 8
rc("font", family="serif", size=12)
rc("text", usetex=True)
pgm = daft.PGM([8,5], observed_style="inner", label_params={'fontsize':15})
pgm.add_node(daft.Node("b", r"$b$", 0.5, 0.5))
pgm.add_node(daft.Node("beta", r"$\beta$", 0.5, 1.25))
pgm.add_node(daft.Node("rc", r"$r_c$", 0.5, 2.0))
pgm.add_node(daft.Node("s0", r"$S_0$", 0.5, 2.75))
pgm.add_node(daft.Node("y0", r"$y_0$", 0.5, 3.5))
pgm.add_node(daft.Node("x0", r"$x_0$", 0.5, 4.35))
pgm.add_plate(daft.Plate([1.0, 1.0, 5.0, 3.0], position="bottom right", label=r"unmasked pixels $i$"))
pgm.add_node(daft.Node("Sbeta", r"$S_\beta$", 2.0, 3.0, fixed=True, offset=[5.0, 0.0]))
pgm.add_node(daft.Node("xi", r"$x_i$", 1.5, 2.0, fixed=True, offset=[0.0, -20.0]))
pgm.add_node(daft.Node("yi", r"$y_i$", 2.25, 2.0, fixed=True, offset=[0.0, -20.0]))
pgm.add_node(daft.Node("Stot", r"$S$", 4.0, 2.5, fixed=True))
pgm.add_node(daft.Node("ex", r"ex$_i$", 5.0, 3.25, fixed=True))
pgm.add_node(daft.Node("im", r"im$_i$", 5.0, 2.25, observed=True))
pgm.add_edge("x0", "Sbeta")
pgm.add_edge("y0", "Sbeta")
pgm.add_edge("s0", "Sbeta")
pgm.add_edge("rc", "Sbeta")
pgm.add_edge("beta", "Sbeta")
pgm.add_edge("xi", "Sbeta")
pgm.add_edge("yi", "Sbeta")
pgm.add_edge("Sbeta", "Stot")
pgm.add_edge("b", "Stot")
pgm.add_edge("Stot", "im")
pgm.add_edge("ex", "im")
pgm.render();

Choose priors

As we have emphasized in class, the prior encodes what information we bring to the analysis before gathering the data. As such, this is the appropriate time to decide what priors we'd like to use. Even though you may not have any relevant domain knowledge (and we don't expect you to gain any just for this problem), the model itself motivates certain prior constraints. For example, $r_c$ is a scale for the radius, and therefore can't be negative or zero. A (weak) statement similar to this can be made about most of the model parameters.

Beyond such physically/mathematically motivated bounds, next the task is to decide on a specific prior distribution for each parameter. Please don't kill yourselves trying to come up with the most sophisticated or information theoretically motivated thing. Imagine trying to justify your choice to a thesis advisor who, we'll say hypothetically, has no appreciation for such subtleties and just wants to hear something that sounds uninformative.

Note that one of the tasks in this week's HW will be to justify the choice of priors.



In [ ]:

Come up with a set of parameter values that are broadly good

Because the likelihood in this problem (or at least my naive implementation of it) is relatively slow to evaluate, it is well worth spending some time coming up with a decent ballpark guess at the solution. You do not want to be sitting around for an hour only to find that your chain was started in such a poor position that it still hasn't made it anywhere useful.

Broadly sensible values of the other parameters can be estimated by looking at the data in various ways. The least obvious in this context is probably $\beta$, so we'll mention that the value $\beta \sim 2/3$ is a canonical one that comes from the simple spherical collapse model of cluster formation.

The cell below provides code to inspect the residual image (data-model) for a given set of parameters. This isn't the only, nor probably the best, way to inspect the data for this purpose, but it's one that we have to hand already. If you can find model parameters that remove most of the structure from the residual image (it doesn't have to be perfect), then you'll be broadly in the right neighborhood.



In [ ]:

    
try:
    exec(open('Solution/starting_point.py').read())
except IOError:
    start = REPLACE_WITH_YOUR_SOLUTION() # a list, NOT a numpy array

x0, y0, lnS0, rc, beta, bg = start
model_image = modelImage(data, x0, y0, np.exp(lnS0), rc, beta, bg)

plt.rcParams['figure.figsize'] = (16.0, 5.0)
fig, axs = plt.subplots(1, 3)
axs[0].imshow(logstretch(data.im*data.mask), cmap='gray', origin='lower');
axs[0].set_title("Data");
axs[1].imshow(logstretch(model_image*data.mask), cmap='gray', origin='lower');
axs[1].set_title("Model");
axs[2].imshow(logstretch((data.im-model_image)*data.mask), cmap='gray', origin='lower');
axs[2].set_title("Residual");

Implement the posterior

Write functions to evaluate the prior, likelihood and posterior as a function of params = ($x_0$, $y_0$, $\ln S_0$, $r_c$, $\beta$, $b$), taking note of the use of $\ln S_0$ rather than $S_0$ (including/especially in your prior).



In [ ]:

    
try:
    exec(open('Solution/prior.py').read())
except IOError:
    REMOVE_THIS_LINE()
    def ln_prior(params):
        x0, y0, lnS0, rc, beta, bg = params
        S0 = np.exp(lnS0)
        # ...

Similarly, the likelihood function. Note we need to eventually take a sum that counts only some pixels, according to data.mask.



In [ ]:

    
try:
    exec(open('Solution/likelihood.py').read())
except IOError:
    REMOVE_THIS_LINE()
    def ln_likelihood(params, data): # `data' of XrayData type
        x0, y0, lnS0, rc, beta, bg = params
        S0 = np.exp(lnS0)
        # ...

Similarly the posterior:



In [ ]:

    
try:
    exec(open('Solution/posterior.py').read())
except IOError:
    REMOVE_THIS_LINE()
    def ln_posterior(params, data): # `data' of XrayData type
        pass # ...

Test these functions out with some reasonable parameter values before moving on. Also, make sure that unreasonable values (in terms of numerical evaluation) will be caught and dealt with somehow in your implementations above.



In [ ]:

Sample from the posterior using `emcee`

Below, we set up emcee and run it to produce a small number of samples. Do run for longer if you want something to compare your homework results to, but be aware that it may take a while.

The package authors advise us to start all walkers in a clump around a single starting point, so let's do that, using the "reasonable fit" parameters you found above. Maybe it would make sense to expand the clump if we're not so sure about how good the starting position is? (I haven't tried it.)



In [ ]:

    
nwalkers = 20 # an arbitrary number I chose
ndim = len(start) # dimension of the parameter space

# Starting points for all walkers, the 1% Gaussian offsets being another arbitrary choice
p0 = np.array([start*(1.0 + 0.01*np.random.rand(ndim)) for j in range(nwalkers)])

More setup. Set nthreads appropriately for your machine to take advantage of parallelism.



In [ ]:

    
nthreads = 1 # Set for your system
sampler = emcee.EnsembleSampler(nwalkers, ndim, ln_posterior, args=[data], threads=nthreads)

Run the sampler. Use a small Nsteps at first, just to get an idea of how long a real run will take. (On my laptop, it's ~1s per step, for my particular likelihood implementation and nwalkers=20.) Then do a longer run to see what happens as far as convergence. Eventually, run long enough to get real results.

Note that subsequent calls to the same sampler will continue an existing run, which is convenient. If you want to start over with a new starting point, for e.g., it makes sense to re-instantiate sampler with the cell above.



In [ ]:

    
Nsteps = 10 # Adjust this

%time sampler.run_mcmc(p0, Nsteps);
ens = sampler.chain

Plot traces



In [ ]:

    
burn = 0 # initial inspection with burn=0, then make it larger
N2show = 8 # number of walkers to plot
param_names = [r'$x_0$', r'$y_0$', r'$\ln S_0$', r'$r_c$', r'$\beta$', r'$b$']
symbol = ""
if Nsteps < 100: symbol = 'o'

plt.rcParams['figure.figsize'] = (12.0, 20.0)
for i in range(ndim):
    plt.subplot(ndim, 1, i+1)
    for j in range(N2show):
        plt.plot(ens[j,burn:,i], symbol);
    plt.ylabel(param_names[i], fontsize=20);

Make a corner plot, because why not. (If your chain is extremely long, you might want to do some thinning.)



In [ ]:

    
corner.corner(ens[:,burn:,:].reshape((-1, ndim)), labels=param_names,
              color='blue', show_titles=True, title_fmt='.2g');

This is the end of the tutorial! Hopefully you've got good results from a long, converged chain. If so, it would be good to compare them with what you get in the homework portion, e.g. with contours from both in the same corner plot.

You might for example, want to save the flattened ensemble ("chain"), the fancy subscripted and reshaped ens that's passed to corner above, with np.savetxt or similar.