ここで事前分布は、
$ \bf{\beta} \sim \it{Normal}( \bf{\mu}_\pi = \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \bf{\sigma}_\pi = \begin{bmatrix} 1000 & 0 \\ 0 & 1000 \end{bmatrix} ) $
$\sigma^2 \sim InverseGamma(\alpha_\pi = 0.001, \beta_\pi = 0.001)$
where $ \bf{\beta} = (\beta_0, \beta_1)^\T $, $\bf{X}$ は第1列に切片ベクトルを持ち、第2列以降に$x$ベクトルをもつ。$\beta_1$ と$\sigma^2$は、主要な関心は事後分布に基づいて各パラメーターの推定を行うことである。事後分布に基づいたパラメータである。この例における計算的な問題は、推論は直接的に結合事後分布から得られないということである。なぜなら because of its nonstandard form, derived below up to a constant of proportionality. しかも、βとσの具体的な推定を行うためにはさらにその周辺確率密度関数を求めなくてはならない。
$ p(\bf{\beta}, \sigma^2 | \bf{y}) \propto p(\bf{y} | \bf{\beta}, \sigma^2) p(\bf{\beta}) p(\sigma^2) $
$ \propto \left(\sigma^2\right)^{-n/2} \exp\left\{-\frac{1}{2 \sigma^2} (\bf{y} - \bf{X} \bf{\beta})^\top (\bf{y} - \bf{X} \bf{\beta}) \right\} $
$ \quad \times \exp\left\{-\frac{1}{2} (\bf{\beta} - \bf{\mu}_\pi)^\top \bf{\Sigma}_\pi^{-1} (\bf{\beta} - \bf{\mu}_\pi) \right\} \left(\sigma^2\right)^{-\alpha_\pi - 1} \exp\left\{-\beta_\pi / \sigma^2\right\} $
代替的な手法としては、MCMCを用いて事後分布からパラメーターの値についての近似的な推論を行うことである。
Mamba パッケージの中では、ベイジアンモデルの推定の際に現れる用語は"nodes"として参照される。Nodesは次の3つのタイプのどれかに分類される。
尤度や事前分布推定をされた値である。この回帰の例では、$\bf{y}$, $\bf{\beta}$、と $\sigma^2$ がstochastic nodesである。
are terms, like :math:\bm{\mu}, that are deterministic functions of other nodes.
are any remaining model terms (:math:\bm{X}) and are considered to be fixed quantities in the analysis.
Note that the :math:\bm{y} node has both a distributional specification and is a fixed quantity. It is designated a stochastic node in Mamba because of its distributional specification. This allows for the possibility of model terms with distributional specifications that are a mix of observed and unobserved elements, as in the case of missing values in response vectors.
Model implementation begins by instantiating stochastic and logical nodes using the Mamba--supplied constructors Stochastic and Logical. Stochastic and logical nodes for a model are defined with a call to the Model constructor. The model constructor formally defines and assigns names to the nodes. User-specified names are given on the left-hand sides of the arguments to which Logical and Stochastic nodes are passed.
In [1]:
using Mamba
## Model Specification
model = Model(
y = Stochastic(1,
(mu, s2) -> MvNormal(mu, sqrt(s2)),
false
),
mu = Logical(1,
(xmat, beta) -> xmat * beta,
false
),
beta = Stochastic(1,
() -> MvNormal(2, sqrt(1000))
),
s2 = Stochastic(
() -> InverseGamma(0.001, 0.001)
)
)
Out[1]:
A single integer value for the first Stochastic constructor argument indicates that the node is an array of the specified dimension. Absence of an integer value implies a scalar node. The next argument is a function that may contain any valid julia code. Functions should be defined to take, as their arguments, the inputs upon which their nodes depend and, for stochastic nodes, return distribution objects or arrays of objects compatible with the Distributions package [2]. Such objects represent the nodes’ distributional specifications. An optional boolean argument after the function can be specified to indicate whether values of the node should be monitored (saved) during MCMC simulations (default: true).
Stochastic functions must return a single distribution object that can accommodate the dimensionality of the node, or return an array of (univariate or multivariate) distribution objects of the same dimension as the node. Examples of alternative, but equivalent, prior distributional specifications for the beta node are shown below.
In [2]:
# Case 1: Multivariate Normal with independence covariance matrix
beta = Stochastic(1,
() -> MvNormal(2, sqrt(1000))
)
# Case 2: One common univariate Normal
beta = Stochastic(1,
() -> Normal(0, sqrt(1000))
)
# Case 3: Array of univariate Normals
beta = Stochastic(1,
() -> UnivariateDistribution[Normal(0, sqrt(1000)), Normal(0, sqrt(1000))]
)
# Case 4: Array of univariate Normals
beta = Stochastic(1,
() -> UnivariateDistribution[Normal(0, sqrt(1000)) for i in 1:2]
)
Out[2]:
Case 1 is one of the multivariate normal distributions available in the Distributions package, and the specification used in the example model implementation. In Case 2, a single univariate normal distribution is specified to imply independent priors of the same type for all elements of beta. Cases 3 and 4 explicitly specify a univariate prior for each element of beta and allow for the possibility of differences among the priors. Both return arrays of Distribution objects, with the last case automating the specification of array elements.
In summary, y and beta are stochastic vectors, s2 is a stochastic scalar, and mu is a logical vector. We note that the model could have been implemented without mu. It is included here primarily to illustrate use of a logical node. Finally, note that nodes y and mu are not being monitored.
The package provides a flexible system for the specification of schemes to sample stochastic nodes. Arbitrary blocking of nodes and designation of block-specific samplers is supported. Furthermore, block-updating of nodes can be performed with samplers provided, defined by the user, or available from other packages. Schemes are specified as vectors of Sampler objects. Constructors are provided for several popular sampling algorithms, including adaptive Metropolis, No-U-Turn (NUTS), and slice sampling. Example schemes are shown below. In the first one, NUTS is used for the sampling of beta and slice for s2. The two nodes are block together in the second scheme and sampled jointly with NUTS.
In [4]:
## Hybrid No-U-Turn and Slice Sampling Scheme
scheme1 = [NUTS(:beta),
Slice(:s2, 3.0)]
## No-U-Turn Sampling Scheme
scheme2 = [NUTS([:beta, :s2])]
Out[4]:
Additionally, users are free to create their own samplers with the generic Sampler constructor. This is particularly useful in settings were full conditional distributions are of standard forms for some nodes and can be sampled from directly. Such is the case for the full conditional of \bm{\beta} which can be written as
p(\bm{\beta} | \sigma^2, \bm{y}) &\propto p(\bm{y} | \bm{\beta}, \sigma^2) p(\bm{\beta}) \ &\propto \exp\left{-\frac{1}{2} (\bm{\beta} - \bm{\mu})^\top \bm{\Sigma}^{-1} (\bm{\beta} - \bm{\mu})\right},
where \bm{\Sigma} = \left(\frac{1}{\sigma^2} \bm{X}^\top \bm{X} + \bm{\Sigma}\pi^{-1}\right)^{-1} and \bm{\mu} = \bm{\Sigma} \left(\frac{1}{\sigma^2} \bm{X}^\top \bm{y} + \bm{\Sigma}\pi^{-1} \bm{\mu}_\pi\right), and is recognizable as multivariate normal. Likewise,
p(\sigma^2 | \bm{\beta}, \mathbf{y}) &\propto p(\bm{y} | \bm{\beta}, \sigma^2) p(\sigma^2) \ &\propto \left(\sigma^2\right)^{-(n/2 + \alpha\pi) - 1} \exp\left{-\frac{1}{\sigma^2} \left(\frac{1}{2} (\bm{y} - \bm{X} \bm{\beta})^\top (\bm{y} - \bm{X} \bm{\beta}) + \beta\pi \right) \right},
whose form is inverse gamma with n / 2 + \alpha\pi shape and (\bm{y} - \bm{X} \bm{\beta})^\top (\bm{y} - \bm{X} \bm{\beta}) / 2 + \beta\pi scale parameters. A user-defined sampling scheme to generate draws from these full conditionals is constructed below. Another example is given in the Pollution example for the sampling of multiple nodes.
In [5]:
## User-Defined Samplers
Gibbs_beta = Sampler([:beta],
(beta, s2, xmat, y) ->
begin
beta_mean = mean(beta.distr)
beta_invcov = invcov(beta.distr)
Sigma = inv(xmat' * xmat / s2 + beta_invcov)
mu = Sigma * (xmat' * y / s2 + beta_invcov * beta_mean)
rand(MvNormal(mu, Sigma))
end
)
Gibbs_s2 = Sampler([:s2],
(mu, s2, y) ->
begin
a = length(y) / 2.0 + shape(s2.distr)
b = sumabs2(y - mu) / 2.0 + scale(s2.distr)
rand(InverseGamma(a, b))
end
)
## User-Defined Sampling Scheme
scheme3 = [Gibbs_beta, Gibbs_s2]
Out[5]:
In these samplers, the respective MvNormal(2, sqrt(1000)) and InverseGamma(0.001, 0.001) priors on stochastic nodes beta and s2 are accessed directly through the distr fields. Features of the Distributions objects returned by beta.distr and s2.distr can, in turn, be extracted with method functions defined in that package or through their own fields. For instance, mean(beta.distr) and invcov(beta.distr) apply method functions to extract the mean vector and inverse-covariance matrix of the beta prior. Whereas, shape(s2.distr) and scale(s2.distr) extract the shape and scale parameters from fields of the inverse-gamma prior. Distributions method functions can be found in that package’s documentation; whereas, fields are found in the source code.
When possible to do so, direct sampling from full conditions is often preferred in practice because it tends to be more efficient than general-purpose algorithms. Schemes that mix the two approaches can be used if full conditionals are available for some model parameters but not for others. Once a sampling scheme is formulated in Mamba, it can be assigned to an existing model with a call to the setsamplers! function.
In [6]:
## Sampling Scheme Assignment
setsamplers!(model, scheme1)
Out[6]:
Alternatively, a predefined scheme can be passed in to the Model constructor at the time of model implementation as the value to its samplers argument.
One of the internal structures created by Model is a graph representation of the model nodes and their associations. Graphs are managed internally with the Graphs package [96]. Like OpenBUGS, JAGS, and other BUGS clones, Mamba fits models whose nodes form a directed acyclic graph (DAG). A DAG is a graph in which nodes are connected by directed edges and no node has a path that loops back to itself. With respect to statistical models, directed edges point from parent nodes to the child nodes that depend on them. Thus, a child node is independent of all others, given its parents.
The DAG representation of a Model can be printed out at the command-line or saved to an external file in a format that can be displayed with the Graphviz software.
In [7]:
## Graph Representation of Nodes
draw(model)
In [9]:
draw(model, filename="lineDAG.dot")
In [10]:
draw(model)
In [24]:
using GraphViz
display(Graph(graph2dot(model)))
Stochastic, logical, and input nodes are represented by ellipses, diamonds, and rectangles, respectively. Gray-colored nodes are ones designated as unmonitored in MCMC simulations. The DAG not only allows the user to visually check that the model specification is the intended one, but is also used internally to check that nodal relationships are acyclic.
In [11]:
## Data
line = Dict{Symbol, Any}(
:x => [1, 2, 3, 4, 5],
:y => [1, 3, 3, 3, 5]
)
line[:xmat] = [ones(5) line[:x]]
Out[11]:
In [12]:
line
Out[12]:
In [13]:
line[:xmat]
Out[13]:
A julia vector of dictionaries containing initial values for all stochastic nodes must be created. The dictionary keys should match the node names, and their values should be vectors whose elements are the same type of structures as the nodes. Three sets of initial values for the regression example are created in with the following code.
In [14]:
## Initial Values
inits = [
Dict{Symbol, Any}(
:y => line[:y],
:beta => rand(Normal(0, 1), 2),
:s2 => rand(Gamma(1, 1))
)
for i in 1:3
]
Out[14]:
In [25]:
Dict(
:y => line[:y],
:beta => rand(Normal(0, 1), 2),
:s2 => rand(Gamma(1, 1))
)
Out[25]:
In [15]:
inits
Out[15]:
Initial values for y are those in the observed response vector. Likewise, the node is not updated in the sampling schemes defined earlier and thus retains its initial values throughout MCMC simulations. Initial values are generated for beta from a normal distribution and for s2 from a gamma distribution.
MCMC simulation of draws from the posterior distribution of a declared set of model nodes and sampling scheme is performed with the mcmc() function. As shown below, the first three arguments are a Model object, a dictionary of values for input nodes, and a dictionary vector of initial values. The number of draws to generate in each simulation run is given as the fourth argument. The remaining arguments are named such that burnin is the number of initial values to discard to allow for convergence; thin defines the interval between draws to be retained in the output; and chains specifies the number of times to run the simulator. Results are retuned as Chains objects on which methods for posterior inference are defined.
In [16]:
## MCMC Simulations
setsamplers!(model, scheme1)
sim1 = mcmc(model, line, inits, 10000, burnin=250, thin=2, chains=3)
setsamplers!(model, scheme2)
sim2 = mcmc(model, line, inits, 10000, burnin=250, thin=2, chains=3)
setsamplers!(model, scheme3)
sim3 = mcmc(model, line, inits, 10000, burnin=250, thin=2, chains=3)
Out[16]:
In [17]:
model, line, inits
Out[17]:
In [18]:
setsamplers!(model, scheme1)
Out[18]:
In [21]:
setsamplers!
Out[21]:
The simulation of multiple chains will be executed in parallel automatically if julia is running in multiprocessor mode on a multiprocessor system. Multiprocessor mode can be started with the command line argument julia -p n, where n is the number of available processors. See the julia documentation on parallel computing for details.
Checks of MCMC output should be performed to assess convergence of simulated draws to the posterior distribution. Checks can be performed with a variety of methods. The diagnostic of Gelman, Rubin, and Brooks [32][11] is one method for assessing convergence of posterior mean estimates. Values of the diagnostic’s potential scale reduction factor (PSRF) that are close to one suggest convergence. As a rule-of-thumb, convergence is rejected if the 97.5 percentile of a PSRF is greater than 1.2.
In [23]:
gelmandiag(sim1, mpsrf=true, transform=true) |> showall
The diagnostic of Geweke [36] tests for non-convergence of posterior mean estimates. It provides chain-specific test p-values. Convergence is rejected for significant p-values, like those obtained for s2.
In [30]:
gewekediag(sim1) |> showall
The diagnostic of Heidelberger and Welch [45] tests for non-convergence (non-stationarity) and whether ratios of estimation interval halfwidths to means are within a target ratio. Stationarity is rejected (0) for significant test p-values. Halfwidth tests are rejected (0) if observed ratios are greater than the target, as is the case for s2 and beta[1].
In [31]:
heideldiag(sim1) |> showall
The diagnostic of Raftery and Lewis [72][73] is used to determine the number of iterations required to estimate a specified quantile within a desired degree of accuracy. For example, below are required total numbers of iterations, numbers to discard as burn-in sequences, and thinning intervals for estimating 0.025 quantiles so that their estimated cumulative probabilities are within 0.025±0.005 with probability 0.95.
In [32]:
rafterydiag(sim1) |> showall
More information on the diagnostic functions can be found in the Convergence Diagnostics section.
Once convergence has been assessed, sample statistics may be computed on the MCMC output to estimate features of the posterior distribution. The StatsBase package [51] is utilized in the calculation of many posterior estimates. Some of the available posterior summaries are illustrated in the code block below.
In [33]:
## Summary Statistics
describe(sim1)
In [34]:
## Highest Posterior Density Intervals
hpd(sim1)
Out[34]:
In [35]:
## Cross-Correlations
cor(sim1)
Out[35]:
In [36]:
## Lag-Autocorrelations
autocor(sim1)
Out[36]:
In [37]:
## State Space Change Rate (per Iteration)
changerate(sim1)
Out[37]:
In [38]:
## Deviance Information Criterion
dic(sim1)
Out[38]:
In [39]:
## Subset Sampler Output
sim = sim1[1000:5000, ["beta[1]", "beta[2]"], :]
describe(sim)
In [40]:
## Restart the Sampler
sim = mcmc(sim1, 5000)
describe(sim)
In [41]:
## Default summary plot (trace and density plots)
p = plot(sim1)
## Write plot to file
draw(p, filename="summaryplot.svg")
In [42]:
draw(p)
The plot function can also be used to make autocorrelation and running means plots. Legends can be added with the optional legend argument. Arrays of plots can be created and passed to the draw function. Use nrow and ncol to determine how many rows and columns of plots to include in each drawing.
In [44]:
## Autocorrelation and running mean plots
p = plot(sim1, [:autocor, :mean], legend=true)
draw(p, nrow=3, ncol=2, filename="autocormeanplot.svg")
In [46]:
draw(p)
In [47]:
## Pairwise contour plots
p = plot(sim1, :contour)
draw(p, nrow=2, ncol=2, filename="contourplot.svg")
In [48]:
draw(p)
In [50]:
:c => [1, 2]
Out[50]:
In [51]:
Dict{}
Out[51]:
In [35]:
collect(range(1, 2, 10))
Out[35]:
In [36]:
collect(1:2:10)
Out[36]:
In [38]:
a = [1 for i in 1:10]
Out[38]:
In [ ]: