One important aspect of Bayesian inference has not yet been discussed in this tutorial: prior distributions. In Bayesian statistics, one has to provide probability (density) values for every possible parameter value before taking into account the data at hand. This prior distribution thus reflects all prior knowledge of the system that is to be investigated. In the case that no prior knowledge is available, a non-informative prior in the form of the so-called Jeffreys prior allows to minimize the effect of the prior on the results. The next two sub-sections discuss how one can set custom prior distributions for the parameters of the observation model and for hyper-parameters in a hyper-study or change-point study.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt # plotting
import seaborn as sns # nicer plots
sns.set_style('whitegrid') # plot styling
import numpy as np
import bayesloop as bl
# prepare study for coal mining data
S = bl.Study()
S.loadExampleData()
bayesloop employs a forward-backward algorithm that is based on Hidden Markov models. This inference algorithm iteratively produces a parameter distribution for each time step, but it has to start these iterations from a specified probability distribution - the parameter prior. All built-in observation models already have a predefined prior, stored in the attribute prior
. Here, the prior distribution is stored as a Python function that takes as many arguments as there are parameters in the observation model. The prior distributions can be looked up directly within observationModels.py
. For the Poisson
model discussed in this tutorial, the default prior distribution is defined in a method called jeffreys
as
def jeffreys(x):
return np.sqrt(1. / x)
corresponding to the non-informative Jeffreys prior, $p(\lambda) \propto 1/\sqrt{\lambda}$. This type of prior can also be determined automatically for arbitrary user-defined observation models, see here.
To change the predefined prior of a given observation model, one can add the keyword argument prior
when defining an observation model. There are different ways of defining a parameter prior in bayesloop: If prior=None
is set, bayesloop will assign equal probability to all parameter values, resulting in a uniform prior distribution within the specified parameter boundaries. One can also directly supply a Numpy array with prior probability (density) values. The shape of the array must match the shape of the parameter grid! Another way to define a custom prior is to provide a function that takes exactly as many arguments as there are parameters in the defined observation model. bayesloop will then evaluate the function for all parameter values and assign the corresponding probability values.
Next, we illustrate the difference between the Jeffreys prior and a flat, uniform prior with a very simple inference example: We fit the coal mining example data set using the Poisson
observation model and further assume the rate parameter to be static:
In [2]:
# we assume a static rate parameter for simplicity
S.set(bl.tm.Static())
print 'Fit with built-in Jeffreys prior:'
S.set(bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000)))
S.fit()
jeffreys_mean = S.getParameterMeanValues('accident_rate')[0]
print('-----\n')
print 'Fit with custom flat prior:'
S.set(bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000),
prior=lambda x: 1.))
# alternatives: prior=None, prior=np.ones(1000)
S.fit()
flat_mean = S.getParameterMeanValues('accident_rate')[0]
First note that the model evidence indeed slightly changes due to the different choices of the parameter prior. Second, one may notice that the posterior mean value of the flat-prior-fit does not exactly match the arithmetic mean of the data. This small deviation shows that a flat/uniform prior is not completely non-informative for a Poisson model! The fit using the Jeffreys prior, however, succeeds in reproducing the frequentist estimate, i.e. the arithmetic mean:
In [3]:
print('arithmetic mean = {}'.format(np.mean(S.rawData)))
print('flat-prior mean = {}'.format(flat_mean))
print('Jeffreys prior mean = {}'.format(jeffreys_mean))
The second option is based on the SymPy module that introduces symbolic mathematics to Python. Its sub-module sympy.stats covers a wide range of discrete and continuous random variables. The keyword argument prior
also accepts a list of sympy.stats
random variables, one for each parameter (if there is only one parameter, the list can be omitted). The multiplicative joint probability density of these random variables is then used as the prior distribution. The following example defines an exponential prior for the Poisson
model, favoring small values of the rate parameter:
In [4]:
import sympy.stats
S.set(bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000),
prior=sympy.stats.Exponential('expon', 1)))
S.fit()
Note that one needs to assign a name to each sympy.stats
variable. In this case, the output of bayesloop shows the mathematical formula that defines the prior. This is possible because of the symbolic representation of the prior by SymPy
.
As shown before, hyper-studies and change-point studies can be used to determine the full distribution of hyper-parameters (the parameters of the transition model). As for the time-varying parameters of the observation model, one might have prior knowledge about the values of certain hyper-parameters that can be included into the study to refine the resulting distribution of these hyper-parameters. Hyper-parameter priors can be defined just as regular priors, either by an arbitrary function or by a list of sympy.stats
random variables.
In a first example, we return to the simple change-point model of the coal-mining data set and perform to fits of the change-point: first, we specify no hyper-prior for the time step of our change-point, assuming equal probability for each year in our data set. Second, we define a Normal distribution around the year 1920 with a (rather unrealistic) standard deviation of 5 years as the hyper-prior using a SymPy random variable. For both fits, we plot the change-point distribution to show the differences induced by the different priors:
In [5]:
print 'Fit with flat hyper-prior:'
S = bl.ChangepointStudy()
S.loadExampleData()
L = bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000))
T = bl.tm.ChangePoint('tChange', 'all')
S.set(L, T)
S.fit()
plt.figure(figsize=(8,4))
S.plot('tChange', facecolor='g', alpha=0.7)
plt.xlim([1870, 1930])
plt.show()
print('-----\n')
print 'Fit with custom normal prior:'
T = bl.tm.ChangePoint('tChange', 'all', prior=sympy.stats.Normal('norm', 1920, 5))
S.set(T)
S.fit()
plt.figure(figsize=(8,4))
S.plot('tChange', facecolor='g', alpha=0.7)
plt.xlim([1870, 1930]);
Since we used a quite narrow prior (containing a lot of information) in the second case, the resulting distribution is strongly shifted towards the prior. The following example revisits the two break-point-model from here and a linear decrease with a varying slope as a hyper-parameter. Here, we define a Gaussian prior for the slope hyper-parameter, which is centered around the value -0.2 with a standard deviation of 0.4, via a lambda-function. For simplification, we set the break-points to fixed years.
In [6]:
S = bl.HyperStudy()
S.loadExampleData()
L = bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000))
T = bl.tm.SerialTransitionModel(bl.tm.Static(),
bl.tm.BreakPoint('t_1', 1880),
bl.tm.Deterministic(lambda t, slope=np.linspace(-2.0, 0.0, 30): t*slope,
target='accident_rate',
prior=lambda slope: np.exp(-0.5*((slope + 0.2)/(2*0.4))**2)/0.4),
bl.tm.BreakPoint('t_2', 1900),
bl.tm.Static()
)
S.set(L, T)
S.fit()
Finally, note that you can mix SymPy- and function-based hyper-priors for nested transition models.