This notebook is adapted from a lesson from the 2019 KIPAC/StatisticalMethods course, (c) 2019 Adam Mantz and Phil Marshall, licensed under the GPLv2.
You've already seen Bayes Theorem and the role of the prior, sampling and posterior distributions.
Fully specifying these requires us to write down (or at least approximate) how the data set comes to exist, e.g.
A generative model is formally the joint distribution of all our data and model parameters,
$P(\mathrm{data},\mathrm{params}) = P(\mathrm{params}) P(\mathrm{data}|\mathrm{params})$
It encodes the modelling information needed for both inference and creating mock data.
What are generative models useful for?
$P(\mathrm{data},\mathrm{params})$ is an abstract beast, but usually it factorizes in helpful ways.
This is where probabilistic graphical models (PGMs) come in. A PGM is a sketch that encodes the conditional dependences within a generative model.
PGMs are to inference what free-body diagrams are to kinematics. Everyone hates having to draw them, yet everyone makes fewer mistakes when they do.
Let's do a simple example, first specifying a problem in words and then building the PGM.
Here's an image (and a zoom-in):
Our measurement is the number of counts in each pixel. Here is a generative model in words:
Notice that the model was described in terms of conditional relationships.
For every pixel, $k$
NB: $\Leftarrow$ indicates a deterministic dependence, $\sim$ means "is distributed as".
The PGM shows most of the same information, visually:
Ingredients of a PGM (sometimes also called a directed acyclic graph):
Types of nodes:
Q: According to this PGM, how can we factorize $p(\theta,T,\{F_k, \mu_k, N_k\})$?
What does all this imply in the context of
The key here is that the PGM shows conditional dependences - therefore, it also shows (by omission) where parameters are conditionally independent.
That feature, plus the directional aspect, mean that the PGM is a map to the most logical sequence of steps (lines in code) for either generating mock data or evaluating the posterior density of real data.
Q: How are these PGMs different, and what does the difference mean?
|
|
In this case, some PDFs are delta functions, so we can straightforwardly marginalize over such deterministic variables:
$p(\theta,\{N_k\}) = $
$\quad \int dF_k\,d\mu_k\,dT\; p(\theta)p(T) \prod_k P(N_k|\mu_k)p(\mu_k|F_k,T)p(F_k|\theta)$
$= \underbrace{p(\theta)} ~ \underbrace{\prod_k P\left(N_k|\mu_k(\theta,T)\right)}$
$= \mathrm{prior}(\theta) ~\times~ (\mathrm{sampling~distribution~of~}\vec{N})$
We could have written/drawn everything without explicitly mentioning $F_k$ (or even $\mu_k$). Like all simplifications, this is sometimes helpful and sometimes a pitfall.
Note: the daft
Python package can be useful for making PGMs programatically, though it's no substitute for paper.
Your data is a list of $\{x_k,y_k,\sigma_k\}$ triplets, where $\sigma_k$ is some estimate of the "error" on $y_k$. You think a linear model, $y(x)=a+bx$, might explain these data. To start exploring this idea, you decide to generate some simulated data, to compare with your real dataset.
In the absence of any better information, assume that $\vec{x}$ and $\vec{\sigma}$ are (somehow) known precisely, and that the "error" on $y_k$ is Gaussian (mean of $a+bx_k$ and standard deviation $\sigma_k$).
Draw the PGM, and write down the corresponding probability expressions, for this problem.
What (unspecified) assumptions, if any, would you have to make to actually generate data? Which assumptions do you think are unlikely to hold in practice? Choose one (or more) of these assumptions and work out how to generalize the PGM/generative model to avoid making it.
In [ ]:
'''
import numpy as np
import scipy.stats as st
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
plt.rcParams['xtick.labelsize'] = 'x-large'
plt.rcParams['ytick.labelsize'] = 'x-large'
%matplotlib inline
''';
In [ ]:
"""
# Choose some linear model parameters, somehow
a =
b =
# Choose some x and sigma values... somehow
n = 10 # Number of data points. Feel free to change.
x = np.array([
sigma = np.array([
# Work out the values for any intermediate nodes in your PGM
# generate the "observed" y values
y = st.norm.rvs(
""";
In [ ]:
"""
# plot x, y and sigma in the usual way
plt.rcParams['figure.figsize'] = (12.0, 5.0)
plt.errorbar(x, y, yerr=sigma, fmt='none');
plt.plt(x, y, 'bo');
plt.xlabel('x', fontsize=14);
plt.ylabel('y', fontsize=14);
""";
You've taken several images of a particular field, in order to record the transit of an exoplanet in front of a star (resulting in a temporary decrease in its brightness). Some kind of model, parametrized by $\theta$, describes the time series of the resulting flux. Before we get to measure a number of counts, however, each image is affected by time-specific variables, e.g. related to changing weather. To account for these, you've also measured a second star in the same field in every exposure. The assumption is that the average intrinsic flux of this second star is constant in time, so that it can be used to correct for photometric variations, putting the multiple measurements of the target star on the same scale.
Draw a PGM and write down the corresponding probability expressions for this problem.
Thanks to Anja von der Linden for inspiring (and then correcting) the above problem.
Note: Sketchy solutions for the PGM-drawing exercises can be found with the corresponding material from DSFP Session 4.