REMINDER!!!

Before doing anything,

  1. Make a copy of this notebook in your clone of the private course repository. Edit that copy, not the version in the public repository clone.

  2. Remember that you will submit your solution by pull request to the private repository.

Once you've submitted your solution, don't forget to also fill out the very quick (and required!) weekly feedback form.

Week 2 Homework

Inference on a Grid

In the tutorial we learned about the properties of images taken with X-ray telescopes, and generated a (small!) mock X-ray image dataset. In this problem, you will infer the parameters of your model from your simulated data, by computing their posterior PDF on a regular grid of parameter values.

1. Forced Photometry: Inferring the AGN Flux given a Fixed PSF model and Source Position

The simplest inference we can do is one where we assume that almost all our model parameters are fixed, and the only free parameter to infer is the flux. This measurement process is sometimes known as "forced photometry".

  • Draw the PGM for this situation, showing separate nodes for each model parameter
  • Write out the expression for the factorized joint PDF for all data and parameters.
  • What should we assume for the form of $P(N_k|\mu_k(S))$ - the sampling distribution for the observed pixel values $N_k$ given predicted counts $\mu_k(S)$? This is also the likelihood for the AGN flux $S$.
  • What are you going to assign for $p(S)$, the prior PDF for $S$?

Set up a 1D array of 1000 values of $S$, spanning the range of your assumed prior PDF.

  • Compute the value of the log prior at each point on the grid.
  • Compute the value of the log likelihood at each point on the grid. How do you combine all the pixels?
  • Combine them and re-normalize the result to give the posterior PDF for $S$ given all the data $N$, $p(S|N)$.

Plot the posterior PDF. What is the posterior mode and highest-posterior-density 68% credible interval of $S$ (subject to your grid spacing, of course)?

You'll probably want to re-use some code from the tutorial. This could be done by copying/pasting, or you can save the tutorial notebook as Python code (.py) and re-execute all of it here with a line like

exec(open('my_tutorial.py').read())

If you do this, then make sure to submit the additional code file as part of your solution.


In [ ]:

2. Inference in multiple dimensions

What if you now let the PSF width $\sigma$ be a free parameter, subject to the prior constraints that $\sigma = 1.0 \pm 0.3$ (Gaussian)? Repeat the inference, now on a 2D, 1000x1000 grid of $S$ and $\sigma$. Make a simple 2D contour or heatmap plot of the posterior PDF, then marginalize over $\sigma$ to get $p(S|N)$. Compute the posterior mode and credible interval of the marginalized distribution again. What do you notice about the width of the 1D posterior distribution compared to the version in part 1?

Note: the numerical integral can be well approximated with a simple sum.


In [ ]:

Bonus: The curse of dimensionality

(To be clear, "bonus" problems don't count for anything grade-wise. They're just for fun.)

If you are feeling energetic, go ahead and let all your model parameters be free, and redo your inference on an N-dimensional grid, and again plot the 1D marginalized posterior PDF for $S$. What do you notice about the width of the distribution compared to the version in part 1? What do you notice about the time it takes to compute $p(S|N)$?

Note: putting %time at the start of a Python line will get Jupyter to print out the execution time used by the code on that line.


In [ ]: