Hierarchical Models

Goals:

Introduce ourselves to the use of nuisance parameters and structure in models.
Practice devising models for more complex situations than we've seen so far.

References

Gelman Ch. 5 (5.1-5.3)

Nuisance parameters

A nuisance parameter is any model parameter that we are not particularly interested in at the end of the day.

This is an entirely subjective label - nuisance parameters are not formally treated any differently than other parameters.

Nuisance parameters to encode uncertainty

Including nuisance parameters in a model provides a way to explicitly account for and marginalize over systematic uncertainties.

This can be as simple as assigning a prior distribution to some quantity that would otherwise remain fixed, or it can involve a more explicit expansion of the model.

Often nuisance parameters represent latent variables - things that are logically or physically part of a model, but which cannot be directly measured.

Example: measuring flux with a background

Recall our simple measurement of a galaxy's flux based on a number of detected counts from the Bayes Theorem chunk:

$N|\mu \sim \mathrm{Poisson}(\mu)$
$\mu \sim \mathrm{Gamma}(\alpha,\beta)$

Example: measuring flux with a background

Let's explicitly include a constant conversion between counts and flux:

$N|\mu \sim \mathrm{Poisson}(\mu)$
$\mu = C\,F$
$F \sim \mathrm{Gamma}(\alpha,\beta)$

Q: Why does it make sense to keep a Gamma prior on $F$ (with slightly redefined parameters)?

Example: measuring flux with a background

The colored arrow here means "deterministically related".

Q: What secret message describing this class is encoded in the PGM?

Example: measuring flux with a background

Now let's expand the model to account for the fact that we actually measure counts from both the galaxy and the background. Being good astronomers, we also remembered to take a second measurement of a source-free (background-only) patch of sky.

Subscripts: $b$ for background, $g$ for galaxy, $s$ for the science observation that includes both.

Example: measuring flux with a background

$N_b|\mu_b \sim \mathrm{Poisson}(\mu_b)$
$\mu_b = C_b\,F_b$
$F_b \sim \mathrm{Gamma}(\alpha_b,\beta_b)$
$N_s|\mu_s \sim \mathrm{Poisson}(\mu_s)$
$\mu_s = C_s(F_g+F_b)$
$F_g \sim \mathrm{Gamma}(\alpha_g,\beta_g)$

The background quantities could be regarded as nuisance parameters here - we need to account for our uncertainty in $F_b$, but measuring it isn't the point of the analysis.

Hierarchical models

Often, especially in physics, the model for a data set naturally takes a hierarchical structure.

e.g. measurements of multiple sources inform us about a source class

In statistics, this is related to the concept of exchangeability - as far as we know, individual sources of a given class are equivalent (until we measure them).

Hierarchical models

In practice, the hierarchy usually takes the form of common priors for the individual measured sources.

The prior parameters describe the statistical properties of the source class, and are often what we're trying to measure.
Those prior parameters are therefore left free, with priors of their own, aka hyperpriors.

Hierarchical models

General form:

$P(x|\theta)$ describes the measurement process
$P(\theta)$ decomposes as $P(\theta|\phi_1)\,P(\phi_1)$
$P(\phi_1)$ decomposes as $P(\phi_1|\phi_2)\,P(\phi_2)$
$\ldots$ $P(\phi_n)$, usually taken to be "uninformative".

Example: galaxy luminosity function

Let's modify the previous example as follows

We're interested in luminosity rather than flux - if we know the distance to the target, this just means including another known factor in $C_g$, which now converts counts to $L$.
We'll measure $m>1$ galaxies, and are interested in constraining the luminosity function, traditionally modelled as

$n(x) = \phi^\ast x^\alpha e^{-x}; \quad x=\frac{L}{L^\ast}$

Here $n$ is the number density of galaxies.

Example: galaxy luminosity function

For simplicity, we'll assume that we've measured every galaxy above a given luminosity in some volume. This is not very realistic, but we'll tackle the issues raised by incomplete data sets some other time.
Let's also assume that the same background applies to each galaxy measurement, and that we have a galaxy-free observation of it, as before.

Now - what does the PGM for our experiment look like?

Example: measuring flux with a background

Compressing the $L\rightarrow N$ and $F\rightarrow N$ conversions,

Exercises

The exercises below will get you working with PGMs and model expressions (the $A\sim B$ type relationships) for a few scenarios that illustrate advanced (but common) features of data analysis.

When drawing a PGM, do note which parameters would need to have a prior defined, but don't worry about explicitly including prior parameters that are not part of the described model.

Exercise: Intrinsic scatter

Produce the PGM and model expressions for the following scenario:

Type Ia supernova luminosities are "standardizable". This means that, using observable properties of the lightcurve (color $c$ and "stretch" $s$), SNIa luminosities can be corrected to a scale where they are almost equal. Almost, but not quite - there is still a residual intrinsic scatter, even after this correction.

The problem, described in a non-generative way, is as follows. We measure a number of counts for each of many SNIa, and can use the known telescope properties to convert that to a flux. The flux is related to the SNIa's intrinsic luminosity by a distance which is a function of its redshift, $z$. The color and stretch of the SNIa are also measured; these relate the actual luminosity to the "standard" SNIa luminosity. However, there is a scatter, which we assume to be Gaussian, between the actual luminosities of SNIa and this standard luminosity. The parameters that we ultimately care about are the standard luminosity, $\bar{L}$, and the intrinsic scatter, $\sigma$.

In practice, a common use of SNIa involves adding free cosmological parameters in the distance-redshift relation, but we can leave that out here.

Exercise: Measurement uncertainties

It's pretty common for multiple quantities that can be derived from the same observation to have correlated measurement errors. X-ray spectroscopy of galaxy clusters provides a good example. From this type of data, we can measure gas temperatures and estimate the total gravitating mass (assuming the gas is in equilibrium with the gravitational potential). However, temperature is one of the observables needed in the hydrostatic equation to compute the mass (the other being gas density), so any uncertainty in the temperature measurement propagates to the mass.

Naturally, determining the gas temperature as a function of total mass for galaxy clusters is interesting. A standard model for this is a power-law, $T\propto M^\beta$, with a log-normal intrinsic scatter, $\tau$ (see last exercise). Slightly more simply, we could write $t=\alpha+\beta\,m$, with normal scatter $\tau$, for $t=\log(T)$ and $m=\log(M)$.

The new wrinkle is that we have correlated measurement errors on $t$ and $m$. We'll follow the common practice of assuming that the measurement errors are Gaussian and that we can use a fixed estimate of them - but instead of simply having some $\sigma_t$ and $\sigma_m$ in the problem, we actually have an estimated $2\times2$ covariance matrix, $\Sigma$.

Draw the PGM and write down the model for the above scenario.

Exercise: Group membership

Future time-domain photometric surveys will discover too many supernovae to follow up and type spectroscopically. This means that if we want to do our SNIa experiment from the "intrinsic scatter" example, our data set will be contaminated by other supernova types.

Here we simplify things a little by eliminating the standardization process, and just assume that each of $m$ SN types have luminosities that follow their own distributions. We also introduce group membership as a latent parameter.

Based on the model written out below, draw a PGM and come up with a more complete narration of what's going on than is provided above.

Some priors on $q_i$, $\bar{L}_i$, $\sigma_i$, for $i=1,\ldots,m$.

For each SN observed,

$g \sim \mathrm{Multinomial}(\mathbf{q})$ (sometimes measured)
$L \sim \mathrm{Normal}(\bar{L}_g, \sigma_g)$
$\mu = C\,L$, with $C$ a known constant
$N \sim \mathrm{Poisson}(\mu)$ (measured)

Hierarchical models for linear regression

An extremely common task is fitting a linear model to data that have measurement uncertainties in both the $x$ and $y$ values, and intrinsic scatter. Non-Bayesian approaches were developed in the 90's, but have serious flaws (e.g., they cannot fit for the scatter), whereas this problem is neatly addressed by hierarchical modeling (see exercises).

For the also-common assumptions of Gaussian uncertainties and scatter, generic codes that fit these models using Gibbs sampling (see Efficient Monte Carlo) have been written.

Codes for hierarchical linear regression

Code	Language	multiple $x$'s	multiple $y$'s
`linmix_err`	IDL	no	no
`mlinmix_err`	IDL	yes	no
`linmix`	Python	no	no
`lrgs`	R	yes	yes