In [ ]:
exec(open('../code/bayes_theorem.py').read()) # see code here for later demos
%matplotlib inline
You've just been introduced to PGMs - these are a visual representation of how data are generated.
$P($params$)$
$P(\mathrm{params}|\mathrm{data}) = \frac{P(\mathrm{data}|\mathrm{params})~P(\mathrm{params})}{P(\mathrm{data})}$
Say we want to measure the flux of a galaxy. In a given integration time, $T$, the number of counts, $N$, that we collect in our fancy CCD will be Poisson distributed
$N|\mu \sim \mathrm{Poisson}(\mu)$
where $\mu=FAT$ is the average number of counts we would expect in time $T$, the product of the integration time, the source flux ($F$, counts per unit time and area), and the collecting area of our telescope ($A$).
Presumably we know $A$ and $T$ well, so for convenience we can make $\mu$ rather than $F$ the free parameter of our model.
We'll talk more about how to choose a prior in a few minutes. For now, we'll make a common choice, the uniform distribution (for $\mu\geq0$ in this case).
What about the evidence, $P(N)$?
Conjugate distributions are like eigenfunctions of Bayes Theorem. These are special cases for which the form of the posterior is the same as the prior, for a specific sampling distribution.
The Poisson distribution is conjugate to the Gamma distribution
$P(x) = \frac{1}{\Gamma(\alpha)}\beta^\alpha x^{\alpha-1} e^{-\beta x}$ for $x\geq0$
Our Uniform prior is a limiting case of Gamma (with $\beta\rightarrow0$), so we can take advantage of this.
If we take the prior $\mu \sim \mathrm{Gamma}(\alpha_0,\beta_0)$, the posterior will be $\mu|N \sim \mathrm{Gamma}(\alpha_0+N,\beta_0+1)$.
In [ ]:
plt.rcParams['figure.figsize'] = (10.0,7.0)
# returns alpha,beta describing the posterior
bayesDemo(alpha0=1.0, beta0=0.001, N=5, showLikelihood=False)
$P(\mathrm{params}|\mathrm{data}) = \frac{P(\mathrm{data}|\mathrm{params})~P(\mathrm{params})}{P(\mathrm{data})}$
Assuming independent measurements, $P(\theta|x_1,x_2) \propto P(x_2|\theta)P(x_1|\theta)P(\theta) = P(x_2|\theta)P(\theta|x_1)$
X-ray imaging data for 361 galaxy clusters were analyzed, and 57 of them were found to be morphologically "relaxed" according to some metric (arxiv:1502.06020). We want to constrain the fraction of relaxed clusters in the Universe (the probability that a randomly chosen cluster is relaxed), $f$, assuming that the data set is representative.
The prior distribution encodes "what we know before doing the experiment at hand".
Common choices are
The last 2 are not seen as often in astrophysics, but are worth reading up on.
The posterior for model parameters may be "the answer" to an inference problem, but we often want a summary of the form $\theta = \theta_0 \pm \Delta\theta$.
There are several conventions in use for how to come up with an estimate and credible interval:
And some things you probably shouldn't do (valid only in certain limits)
Adapt the code in bayes_theorem.py and above for the "flux of a source" demo to produce an equivalent demo for the "relaxed galaxy cluster" example. It should take as input the prior hyperparameters and data (number of relaxed clusters and total number of clusters), produce plots of the prior and posterior (optionally also the likelihood), and return the parameters describing the posterior distribution.
In [ ]:
So far, we've used examples where the posterior is a standard distribution. In general this is not the case, and we resort to methods which produce samples from the posterior in order to estimate its density and/or integrate it (e.g. to marginalize some parameters). Sampling can be useful even in simple cases, especially for downstream propagation of uncertainties. This exercise will get you some practice working with scipy
, and introduce the simplest possible sampling method.
For either the Poisson or Binomial examples above, make up some reasonable data and generate $10^4$ samples from the posterior distribution. Then do some post-processing:
Visualize the distributions for these derived parameters by plotting histograms of the samples.
In [ ]:
Write a function for determining the maximum-probability best fit and 1-dimensional confidence interval at a given level from an array of samples. (You can also write one that takes a posterior function as input instead of samples, but in the future we'll be using samples a lot.) Note that some kind of intelligent smoothing or kernel density estimation is generally needed to suppress numerical noise in histograms. Feel free to do the same for the percentile-based confidence intervals described above.
MegaBonus: Write a function for generating 2-dimensional confidence contours the same way. (Note that the corner
package provides a nifty way to do this already.)
In [ ]: