Sigma relies on measure theory as a mathematical framework to describe probability and probabilistic inference. We've defined our probability measure $\mathbb{P}$ as a uniform measure. This does not preclude non-uniform distributions, simply because random variables can be non-linear functions.
Density functions are not the same as random variables. But often we want to construct a random variable that is distributed according to some density. There are many ways to do this, many of which are applicable only to a particular density function - for instance the box-muller transform for normal distributions. One general method is inverse transform sampling, which we can use when we have access to the inverse of the cumulative density function, known as the quantile function.
If we can generate random numbers uniformly between $0$ and $1$ we can generate samples from an arbitrary distribution by applying its quantile function to the uniformly distributed samples.
We can demonstrate this using the Distributions
package in Julia, which provides quantile functions for many common distributions such as the Normal:
In [1]:
using Distributions
using Gadfly
# Generate Samples form standard normal distribution
samples = rand(10000)
normal_quantile(x) = quantile(Normal(0,1),x)
normal_samples = map(normal_quantile, samples)
plot(x = normal_samples, Geom.density)
Out[1]:
This method to define non-uniform distributions invoked sampling, and hence randomness. We can more generally define a non-uniformly distributed random variable without involving any randomness at all. In Sigma, since $\Omega = [0,1]$ and $\mathbb{P}$ is a uniform measure, given a quantile function we define a random variable as:
$$X(\omega) = quantile(\omega)$$If you have read section (?) you will know that Sigma does inference by evaluating random variables with infinite sets, often represented as intervals. It should be possible to treat the quantile function as any other function and use abstract interpretation or SMT to evaluate the quantile with a set (e.g. interval) input. However, quantile functions can be extremely conplex and this approach would in many cases lead to a great deal of imprecision. Also, from a practical perspective, many quantile functions are written using a variety of imperative tricks and numerical hacks to improve performance or accuracy; reimplementing them as pure functions would be cumbersome. It would be preferable if we could treat the quantile functions as black-boxes.
Black-box functions are usually anathematic to Sigma, because given no additional information it is impossible to assert anything meaningful about the output of a function applied to an input set. Fortunately, in the case of quantile functions, we know that they are 1. monotonic 2. total. To evaapproximationluate a quantile function on an interval $A = [a,b]$ we simply evaluate it on lower and upper bounds to yield $B = [quantile(a),quantile(b)]$ as the result.
This is valid because monotonicity ensures that $B$ is a sound over approximation - there can exist no $x \in A$ where $x < b$ and $quantile(x) \ge quantile(b)$. Similarly there is no $x \in A$ where $x > a$ and $quantile(x) \le quantile(a)$. That the quantile is total ensures that the approximation is precise, i.e. there are no spurrious elements in the interval $B$ that do not belong to the true image of $A$ under the quantile function.
In inference, especially when taking a fully Bayesian approach, the parameters to our random variables are not values but also themselves random variables. Often the inference task is inferring the posterior distribution of the parameter random variables after observing evidence on random variables "lower down".
Supporting parameters as random variables in Sigma poses some challenges which depend upon the actual distributions of interest. In general, when the random variable belongs to a certain family of distributions which are parameterised by a scaling (multiplicative) and shifting (additive) parameter, we can generate arbitrary distributions within the family by transforming a standard random variable. For example, in the uniform case this standard random variable is defined on the unit interval, in the normal case we have mean 0 and variance 1. We illustrate this by sampling from a normal distribution using inverse transform sampling and shifting and scaling:
In [4]:
# Generate samples from normal(5,3)
samples = rand(10000)
std_normal_quantile(x) = quantile(Normal(0,1),x)
std_normal_samples = map(normal_quantile, samples)
shifted_normal_samples = std_normal_samples * 3 + 5
plot(x = shifted_normal_samples, Geom.density)
Out[4]:
Because shifting and scaling are involes simply multiplication and addition, which are implementing in Sigma, we can implement this approach directly.
In [6]:
using Sigma
A = normal(5,2)
rand(A)
Out[6]:
Unfortunately there are many distributions which do not belong to this scale-shift family, but where we would still like to use random variables as their parameters.
The gamma, beta, categorical, geometric, and poisson distributions are examples in this category; we would like to be able to write categorical(diriclet([0.2,0.5,0.3]))
or geometric(beta(0.5,1.0))
.
We shall call the parameter random variable (dirichlet and beta in the above examples) $\Phi$. For a random variable $X$ parametrised with $\Phi$, to evaluate $X(A)$ where $A \in \Sigma$. we first consider the quantile function as a function of both the parameters and $p$ - $quantile(\phi,p)$. Then $X(A) = [a,b]$, where
$$a = \min_{\phi \in \Phi^\rightarrow(A),p \in A} quantile(\phi,p)$$$$b = \max_{\phi \in \Phi^\rightarrow(A),p \in A} quantile(\phi,p)$$For reasons which depend upon the continuity and totality arguments given above (TODO: Describe), it suffices to consider only the bounds of $p$. If $p=[i,j]$, we can find $B$ by instead finding
$$a = \min_{\phi \in \Phi^\rightarrow(A)} quantile(\phi,i)$$$$b = \max_{\phi \in \Phi^\rightarrow(A)} quantile(\phi,j)$$The categorical family takes a real valued weight vector $w$ as its parameter. It returns an integer-valued random variable which takes value $j$ with a probability $w_j$.
In the Bayesian setting $w$ is replaced with multivariate random variable $\Phi$, a RandArray
in Sigma. One common example is a Dirichlet random variable, e.g.:
X = categorical(dirichlet([1,2]))
Φ = dirichlet([1,2])
returns a RandArray
in Sigma, if we apply it to an event $A$, i.e., $B_{\Phi} = \Phi(A)$, the result is a subset of $\mathbb{R}^2$:
In [6]:
A = [Interval(0,0.5),Interval(0.5,0.9)]
Φ = dirichlet(1,[1.,2.])
call(Φ, A)
Out[6]:
By virtue of the fact that it is represented by a pair of intervals, this subset is rectangular. In higher dimensions - when the parameter vector is longer - it will be a high dimensional rectangle.
It's important to remember that this subset is an over-approximation. For a categorical distribution each $w_j$ must each lie between 0 and 1, and $w$ must sum to 1. These constraints define a plane within the unit hypercube; only points on this plane are valid parameters. Rather than consider the entire plane, we will need to consider its intersection with the parameter over-approximation $B_{\phi}$:
$$B'_{\Phi} = \left\{w \in \Phi(A) : 0 \le w_i \le 1 , \sum_i w_i = 1 \right\}$$Given a probability $p$, a categorical quantile function, determines
$$quantile(\phi,p) = \operatorname*{arg\,min}_j \sum_j \phi_j \ge p$$Recalling that $A$ also induces an interval $p=[i,j]$, to find $X(A) = [a,b]$ we then must find
$$a = \min_{\phi \in B'_{\Phi}} quantile(\phi,i)$$$$b = \max_{\phi \in B'_{\Phi}} quantile(\phi,j)$$To find $a$ my idea was basically enumeration. Enumerate $j$ from $1$ to $n$, each time using a linear solver to determine whether there exists a point in $B'_\phi$ such that the sum of its first $j$ components is greater than i. The enumeration implements the arg min, and the linear solver implments the minimisation over all of $B'_\phi$
To find $b$, we might think to ask a series of complementary questions in the same way - "For all the possible parameters in this set, do any have a quantile of n, of n-1", then stop at the first point the answer is "yes". This is valid, but this approach breaks down when we try to implement the method of finding the quantile function as before, that is if we say "For all the possible parameters in this set, do any have a sum of their first of n compnents being greater than p, of n-1".
In [64]:
r = 0.99
plot([p->cdf(Categorical([r,1-r]),p)],0,10)
Out[64]:
In [46]:
quantile_fs = ϕ->quantile(Geometric(ϕ),0.5)
plot([quantile_fs],0.2,.3)
Out[46]:
In [30]:
quantile_fs = [x -> quantile(Normal(i,1),x) for i = 0.5:0.5:10.0]
plot(quantile_fs,0.0001,0.99999)
Out[30]:
In [32]:
quantiles = [x -> quantile(Normal(i,j),x) for i = 0.5:3:10.0, j = 0.5:3:10.0]
plot(quantiles[:],0.0001,0.99999)
Out[32]:
In [36]:
quantiles = [x -> quantile(Beta(0.5,i),x) for i = 0.5:0.5:10.0]
plot(quantiles[:],0.0001,0.99999)
Out[36]:
In [45]:
quantiles = [x -> quantile(Geometric(i),x) for i = 0.5:0.1:0.9999]
plot(quantiles[:],0.0001,0.99999)
Out[45]: