Model Comparison with the Bayesian Evidence

The evidence for model $H$, ${\rm Pr}(d\,|\,H)$, enables a form of Bayesian hypothesis testing: model comparison with the "evidence ratio" or "Bayes Factor":

$R = \frac{{\rm Pr}(d\,|\,H_1)}{{\rm Pr}(d\,|\,H_0)}$

This quantity is similar to a likelihood ratio, but it's a fully marginalized likelihood ratio - which is to say that it takes into account our uncertainty about values of the parameters of each model by integrating over them all.

As well as predictive accuracy, the other virtue a model can have is efficiency. Typically we are interested in models that both fit the data well, and are also somehow "natural" - that is, not contrived or fine-tuned.

Contrived models have high likelihood in only small regions of their parameter spaces - and it turns out such models are penalized automatically by the Bayesian evidence.

The evidence is able to capture both a model's accuracy and its efficiency because it summarizes all the information we put into our model inferences, via both the data and our prior beliefs.
You can see this by inspection of the evidence, or fully marginalized likelihood (#FML), integral:

${\rm Pr}(d\,|\,H) = \int\;{\rm Pr}(d\,|\,\theta,H)\;{\rm Pr}(\theta\,|\,H)\;d\theta$

Illustration

The following figure might help illustrate how the evidence depends on both goodness of fit (through the likelihood) and the complexity of the model (via the prior). In this 1D case, a Gaussian likelihood (red) is integrated over a uniform prior (blue): the evidence can be shown to be given by $E = f \times L_{\rm max}$, where $L_{\rm max}$ is the maximum possible likelihood, and $f$ is the fraction of the blue dashed area that is shaded red. $f$ is 0.31, 0.98, and 0.07 in each case.



In [1]:

    
from IPython.display import Image
Image('evidence.png')









    Out[1]:

The illustration above shows us a few things:

1) The evidence can be made arbitrarily small by increasing the prior volume: the evidence is more conservative than focusing on the goodness of fit ($L_{\rm max}$) alone. Of course if you assign a prior you don't believe, then you should not expect to get out a meaningful answer for ${\rm Pr}(d\,|\,H)$.

2) The evidence is linearly sensitive to prior volume ($f$), but exponentially sensitive to goodness of fit ($L_{\rm max}$). It's still a likelihood, after all.

The evidence ratio can, in principle, be combined with the ratio of priors for each model to give us the relative probability for each model being true, given the data:

$\frac{{\rm Pr}(H_1|d)}{{\rm Pr}(H_0|d)} = \frac{{\rm Pr}(d|H_1)}{{\rm Pr}(d|H_0)} \; \frac{{\rm Pr}(H_1)}{{\rm Pr}(H_0)}$

Prior probabilities for models are very difficult to assign in most practical problems (notice that no theorist ever provides them). So, one way to interpret the evidence ratio is to note that:

If you think that having seen the data, the two models are still equally probable,
then the evidence ratio in favor of $H_1$ is you the odds that you would have had to have been willing to take against $H_1$, before seeing the data.
That is: the evidence ratio updates the prior ratio into a posterior one - as usual.



In [ ]: