Prior distributions

One important aspect of Bayesian inference has not yet been discussed in this tutorial: prior distributions. In Bayesian statistics, one has to provide probability (density) values for every possible parameter value before taking into account the data at hand. This prior distribution thus reflects all prior knowledge of the system that is to be investigated. In the case that no prior knowledge is available, a non-informative prior in the form of the so-called Jeffreys prior allows to minimize the effect of the prior on the results. The next two sub-sections discuss how one can set custom prior distributions for the parameters of the observation model and for hyper-parameters in a hyper-study or change-point study.



In [1]:

    
%matplotlib inline
import matplotlib.pyplot as plt # plotting
import seaborn as sns           # nicer plots
sns.set_style('whitegrid')      # plot styling

import numpy as np
import bayesloop as bl

# prepare study for coal mining data
S = bl.Study()
S.loadExampleData()









    



+ Created new study.
+ Successfully imported example data.

Parameter prior

bayesloop employs a forward-backward algorithm that is based on Hidden Markov models. This inference algorithm iteratively produces a parameter distribution for each time step, but it has to start these iterations from a specified probability distribution - the parameter prior. All built-in observation models already have a predefined prior, stored in the attribute prior. Here, the prior distribution is stored as a Python function that takes as many arguments as there are parameters in the observation model. The prior distributions can be looked up directly within observationModels.py. For the Poisson model discussed in this tutorial, the default prior distribution is defined in a method called jeffreys as

def jeffreys(x):
    return np.sqrt(1. / x)

corresponding to the non-informative Jeffreys prior, $p(\lambda) \propto 1/\sqrt{\lambda}$. This type of prior can also be determined automatically for arbitrary user-defined observation models, see here.

Prior functions and arrays

To change the predefined prior of a given observation model, one can add the keyword argument prior when defining an observation model. There are different ways of defining a parameter prior in bayesloop: If prior=None is set, bayesloop will assign equal probability to all parameter values, resulting in a uniform prior distribution within the specified parameter boundaries. One can also directly supply a Numpy array with prior probability (density) values. The shape of the array must match the shape of the parameter grid! Another way to define a custom prior is to provide a function that takes exactly as many arguments as there are parameters in the defined observation model. bayesloop will then evaluate the function for all parameter values and assign the corresponding probability values.

**Note:** In all of the cases described above, *bayesloop* will re-normalize the provided prior values, so they do not need to be passed in a normalized form. Below, we describe the possibility of using probability distributions from the SymPy stats module as prior distributions, which are not re-normalized by *bayesloop*.

Next, we illustrate the difference between the Jeffreys prior and a flat, uniform prior with a very simple inference example: We fit the coal mining example data set using the Poisson observation model and further assume the rate parameter to be static:



In [2]:

    
# we assume a static rate parameter for simplicity
S.set(bl.tm.Static())

print 'Fit with built-in Jeffreys prior:'
S.set(bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000)))
S.fit()
jeffreys_mean = S.getParameterMeanValues('accident_rate')[0]
print('-----\n')
                      
print 'Fit with custom flat prior:'
S.set(bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000), 
                                     prior=lambda x: 1.))
# alternatives: prior=None, prior=np.ones(1000)
S.fit()
flat_mean = S.getParameterMeanValues('accident_rate')[0]









    



+ Transition model: Static/constant parameter values. Hyper-Parameter(s): []
Fit with built-in Jeffreys prior:
+ Observation model: Poisson. Parameter(s): ['accident_rate']
+ Started new fit:
    + Formatted data.
    + Set prior (function): jeffreys. Values have been re-normalized.

    + Finished forward pass.
    + Log10-evidence: -88.00564

    + Finished backward pass.
    + Computed mean parameter values.
-----

Fit with custom flat prior:
+ Observation model: Poisson. Parameter(s): ['accident_rate']
+ Started new fit:
    + Formatted data.
    + Set prior (function): <lambda>. Values have been re-normalized.

    + Finished forward pass.
    + Log10-evidence: -87.98915

    + Finished backward pass.
    + Computed mean parameter values.

First note that the model evidence indeed slightly changes due to the different choices of the parameter prior. Second, one may notice that the posterior mean value of the flat-prior-fit does not exactly match the arithmetic mean of the data. This small deviation shows that a flat/uniform prior is not completely non-informative for a Poisson model! The fit using the Jeffreys prior, however, succeeds in reproducing the frequentist estimate, i.e. the arithmetic mean:



In [3]:

    
print('arithmetic mean     = {}'.format(np.mean(S.rawData)))
print('flat-prior mean     = {}'.format(flat_mean))
print('Jeffreys prior mean = {}'.format(jeffreys_mean))









    



arithmetic mean     = 1.69090909091
flat-prior mean     = 1.7
Jeffreys prior mean = 1.69090909091

SymPy prior

The second option is based on the SymPy module that introduces symbolic mathematics to Python. Its sub-module sympy.stats covers a wide range of discrete and continuous random variables. The keyword argument prior also accepts a list of sympy.stats random variables, one for each parameter (if there is only one parameter, the list can be omitted). The multiplicative joint probability density of these random variables is then used as the prior distribution. The following example defines an exponential prior for the Poisson model, favoring small values of the rate parameter:



In [4]:

    
import sympy.stats
S.set(bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000), 
                                     prior=sympy.stats.Exponential('expon', 1)))
S.fit()









    



+ Observation model: Poisson. Parameter(s): ['accident_rate']
+ Started new fit:
    + Formatted data.
    + Set prior (sympy): exp(-x)

    + Finished forward pass.
    + Log10-evidence: -87.94640

    + Finished backward pass.
    + Computed mean parameter values.

Note that one needs to assign a name to each sympy.stats variable. In this case, the output of bayesloop shows the mathematical formula that defines the prior. This is possible because of the symbolic representation of the prior by SymPy.

**Note:** The support interval of a prior distribution defined via SymPy can deviate from the parameter interval specified in *bayesloop*. In the example above, we specified the parameter interval ]0, 6[, while the exponential prior has the support ]0, $\infty$[. SymPy priors are not re-normalized with respect to the specified parameter interval. Be aware that the resulting model evidence value will only be correct if no parameter values outside of the parameter boundaries gain significant probability values. In most cases, one can simply check whether the parameter distribution has sufficiently *fallen off* at the parameter boundaries.

Hyper-parameter priors

As shown before, hyper-studies and change-point studies can be used to determine the full distribution of hyper-parameters (the parameters of the transition model). As for the time-varying parameters of the observation model, one might have prior knowledge about the values of certain hyper-parameters that can be included into the study to refine the resulting distribution of these hyper-parameters. Hyper-parameter priors can be defined just as regular priors, either by an arbitrary function or by a list of sympy.stats random variables.

In a first example, we return to the simple change-point model of the coal-mining data set and perform to fits of the change-point: first, we specify no hyper-prior for the time step of our change-point, assuming equal probability for each year in our data set. Second, we define a Normal distribution around the year 1920 with a (rather unrealistic) standard deviation of 5 years as the hyper-prior using a SymPy random variable. For both fits, we plot the change-point distribution to show the differences induced by the different priors:



In [5]:

    
print 'Fit with flat hyper-prior:'
S = bl.ChangepointStudy()
S.loadExampleData()

L = bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000))
T = bl.tm.ChangePoint('tChange', 'all')

S.set(L, T)
S.fit()

plt.figure(figsize=(8,4))
S.plot('tChange', facecolor='g', alpha=0.7)
plt.xlim([1870, 1930])
plt.show()
print('-----\n')
                      
print 'Fit with custom normal prior:'
T = bl.tm.ChangePoint('tChange', 'all', prior=sympy.stats.Normal('norm', 1920, 5))
S.set(T)
S.fit()

plt.figure(figsize=(8,4))
S.plot('tChange', facecolor='g', alpha=0.7)
plt.xlim([1870, 1930]);









    



Fit with flat hyper-prior:
+ Created new study.
  --> Hyper-study
  --> Change-point analysis
+ Successfully imported example data.
+ Observation model: Poisson. Parameter(s): ['accident_rate']
+ Transition model: Change-point. Hyper-Parameter(s): ['tChange']
+ Detected 1 change-point(s) in transition model: ['tChange']
+ Set hyper-prior(s): ['uniform']
+ Started new fit.
    + 109 analyses to run.

    + Computed average posterior sequence
    + Computed hyper-parameter distribution
    + Log10-evidence of average model: -75.71555
    + Computed local evidence of average model
    + Computed mean parameter values.
+ Finished fit.






    












    



-----

Fit with custom normal prior:
+ Transition model: Change-point. Hyper-Parameter(s): ['tChange']
+ Detected 1 change-point(s) in transition model: ['tChange']
+ Set hyper-prior(s): ['sqrt(2)*exp(-(x - 1920)**2/50)/(10*sqrt(pi))']
+ Started new fit.
    + 109 analyses to run.

    + Computed average posterior sequence
    + Computed hyper-parameter distribution
    + Log10-evidence of average model: -80.50692
    + Computed local evidence of average model
    + Computed mean parameter values.
+ Finished fit.

Since we used a quite narrow prior (containing a lot of information) in the second case, the resulting distribution is strongly shifted towards the prior. The following example revisits the two break-point-model from here and a linear decrease with a varying slope as a hyper-parameter. Here, we define a Gaussian prior for the slope hyper-parameter, which is centered around the value -0.2 with a standard deviation of 0.4, via a lambda-function. For simplification, we set the break-points to fixed years.



In [6]:

    
S = bl.HyperStudy()
S.loadExampleData()

L = bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000))
T = bl.tm.SerialTransitionModel(bl.tm.Static(),
                                bl.tm.BreakPoint('t_1', 1880),
                                bl.tm.Deterministic(lambda t, slope=np.linspace(-2.0, 0.0, 30): t*slope, 
                                                    target='accident_rate',
                                                    prior=lambda slope: np.exp(-0.5*((slope + 0.2)/(2*0.4))**2)/0.4),
                                bl.tm.BreakPoint('t_2', 1900),
                                bl.tm.Static()
                                )

S.set(L, T)
S.fit()









    



+ Created new study.
  --> Hyper-study
+ Successfully imported example data.
+ Observation model: Poisson. Parameter(s): ['accident_rate']
+ Transition model: Serial transition model. Hyper-Parameter(s): ['slope', 't_1', 't_2']
+ Set hyper-prior(s): ['<lambda> (re-normalized)', 'uniform', 'uniform']
+ Started new fit.
    + 30 analyses to run.

    + Computed average posterior sequence
    + Computed hyper-parameter distribution
    + Log10-evidence of average model: -74.84129
    + Computed local evidence of average model
    + Computed mean parameter values.
+ Finished fit.

Finally, note that you can mix SymPy- and function-based hyper-priors for nested transition models.