A Short Note About Detection

Source detection is different in character from parameter estimation: we are less interested in the flux of the source than we are in its existence.

Detection is a model comparison problem:
- $H_0$: there is no source present
- $H_1$: there is a source present, with flux $f$ and position $(x,y)$

One way to quantify the significance of the source's detection is to calculate and compare the evidences for each model, ${\rm Pr}(d\,|\,H_0)$ and ${\rm Pr}(d\,|\,H_1)$

Notice that these will involve marginalizing over the model parameter prior PDFs that you assigned in writing down $H_1$: the probability of getting the data given model $H_1$ depends on your prior uncertainty on its parameters (because that is part of $H_1$).

What does this mean? Increasing the prior ranges on the source position and flux makes any given point in parameter space less probable, making the detected model seem ever more contrived - and so this will decrease the evidence for $H_1$, making the detection less significant (but only linearly, remember).

Likewise, the maximum value that the evidence for $H_1$ can take occurs when the prior PDF is a delta function at the maximum likelihood point. At this point, the evidence ratio equals the likelihood ratio used in classical statistics.

The frequentist procedure is to approximate the (maximum) likelihood ratio test statistic as being drawn from a $\chi^2$ distribution with 3 degrees of freedom (the difference between the two models), and ask for the probability of getting a likelihood ratio larger than the observed value by chance, if in fact the source with the estimated flux is present. This calculation involves the same notion of a distribution of replica datasets as we saw in the posterior predictive model checks, but in frequentism only the sampling distribution is used (not the posterior PDF). This is because the only question being asked is "What's the probability of getting the data if the truth is exactly X?"

The Bayesian evidence ratio is more conservative, because it takes into account the uncertainties on the source parameters - that were both present before the data were taken, and remain after they have been included. However, it's expensive to compute, and it depends on the prior assignment made. We will see later that the Fermi group opted for the classical route because of this second factor.

In the limit of high signal to noise, the conclusions about detection significance made by analysts in both groups should agree, because in both approaches it is being dominated by the narrow sampling distribution, which will overwhelm almost any prior assigned.

Project Pitch: explore the relationship between Bayesian model selection and classical hypothesis testing in more detail, by setting up some simple toy problems and computing both the evidences, and the frequentist $p$-values.



In [ ]: