The model evidence cannot only be used to compare different kinds of time series models, but also to optimize the hyper-parameters of a given transition model by maximizing its evidence value. The Study
class of bayesloop contains a method optimize
which relies on the minimize
function of the scipy.optimize
module. Since bayesloop has no gradient information about the hyper-parameters, the optimization routine is based on the COBYLA algorithm. The following two sections introduce the optimization of hyper-parameters using bayesloop and further describe how to selectively optimize specific hyper-parameters in nested transition models.
In [1]:
%matplotlib inline
import matplotlib.pyplot as plt # plotting
import seaborn as sns # nicer plots
sns.set_style('whitegrid') # plot styling
import numpy as np
import bayesloop as bl
# prepare study for coal mining data
S = bl.Study()
S.loadExampleData()
L = bl.om.Poisson('accident_rate', bl.oint(0, 6, 1000))
S.set(L)
The optimize
method supports all currently implemented transition models with continuous hyper-parameters, as well as combinations of multiple models. The change-point model as well as the serial transition model represent exceptions here, as their parameters tChange
and tBreak
, respectively, are discrete. These discrete parameters are ignored by the optimization routine. See the tutorial on change-point studies for further information on how to analyze structural breaks and change-points. By default, all continuous hyper-parameters of the transition model are optimized. bayesloop further allows to selectively optimize specific hyper-parameters, see below. The parameter values set by the user when defining the transition model are used as starting values. During optimization, only the log-evidence of the model is computed. When finished, a full fit is done to provide the parameter distributions and mean values for the optimal model setting.
We take up the coal mining example again, and stick with the serial transition model defined here. This time, however, we optimize the slope of the linear decrease from 1885 to 1895 and the magnitude of the fluctuations afterwards (i.e. the standard deviation of the Gaussian random walk):
In [2]:
# define linear decrease transition model
def linear(t, slope=-0.2):
return slope*t
T = bl.tm.SerialTransitionModel(bl.tm.Static(),
bl.tm.BreakPoint('t_1', 1885),
bl.tm.Deterministic(linear, target='accident_rate'),
bl.tm.BreakPoint('t_2', 1895),
bl.tm.GaussianRandomWalk('sigma', 0.1, target='accident_rate'))
S.set(T)
S.optimize()
plt.figure(figsize=(8, 4))
plt.bar(S.rawTimestamps, S.rawData, align='center', facecolor='r', alpha=.5)
S.plot('accident_rate')
plt.xlim([1851, 1962])
plt.xlabel('year');
The optimal value for the standard deviation of the varying disaster rate is determined to be $\approx 0.23$, the initial guess of $\sigma = 0.1$ is therefore too restrictive. The value of the slope is only optimized slightly, resulting in an optimal value of $\approx -0.22$. The optimal hyper-parameter values are displayed in the output during optimization, but can also be inspected directly:
In [3]:
print 'slope =', S.getHyperParameterValue('slope')
print 'sigma =', S.getHyperParameterValue('sigma')
The previous section introduced the optimize
method of the Study
class. By default, all (continuous) hyper-parameters of the chosen transition model are optimized. In some applications, however, only specific hyper-parameters may be subject to optimization. Therefore, a list of parameter names (or a single name) may be passed to optimize
, specifying which parameters to optimize. Note that all hyper-parameters have to be given a unique name. An example for a (quite ridiculously) nested transition model is defined below. Note that the deterministic transition models are defined via lambda
functions.
In [4]:
T = bl.tm.SerialTransitionModel(bl.tm.CombinedTransitionModel(
bl.tm.GaussianRandomWalk('early_sigma', 0.05, target='accident_rate'),
bl.tm.RegimeSwitch('pmin', -7)
),
bl.tm.BreakPoint('first_break', 1885),
bl.tm.Deterministic(lambda t, slope_1=-0.2: slope_1*t, target='accident_rate'),
bl.tm.BreakPoint('second_break', 1895),
bl.tm.CombinedTransitionModel(
bl.tm.GaussianRandomWalk('late_sigma', 0.25, target='accident_rate'),
bl.tm.Deterministic(lambda t, slope_2=0.0: slope_2*t, target='accident_rate')
)
)
S.set(T)
This transition model assumes a combination of gradual and abrupt changes until 1885, followed by a deterministic decrease of the annual disaster rate until 1895. Afterwards, the disaster rate is modeled by a combination of a decreasing trend and random fluctuations. Instead of discussing exactly how meaningful the proposed transition model really is, we focus on how to specify different (groups of) hyper-parameters that we might want to optimize.
All hyper-parameter names occur only once within the transition model and may simply be stated by their name: S.optimize('pmin')
. Note that you may also pass a single or multiple hyper-parameter(s) as a list: S.optimize(['pmin'])
, S.optimize(['pmin', 'slope_2'])
. For deterministic models, the argument name also represents the hyper-parameter name:
In [5]:
S.optimize(['slope_2'])
Although the optimization of hyper-parameters helps to objectify the choice of hyper-parameter values and may even be used to gain new insights into the dynamics of systems, optimization alone does not provide any measure of uncertainty tied to the obtained, optimal hyper-parameter value. The next tutorial discusses an approach to infer not only the time-varying parameter distributions, but also the distribution of hyper-parameters.