In [ ]:
using Sigma
loadvis()

Sampling Algorithms

Often all we want is a single sample or set of samples from a random variable $X$ conditioned on $Y$ being true. Additionally, other statistics such as conditional expectation, variance and conditional probability can be estimated from sets of samples.

Unconditional Samples

In Sigma, to sample from $X$ unconditionally means to

  1. Sample an $\omega \in \Omega$ according to $\mathbb{P}$
  2. Evaluate $X(\omega)$

In [4]:
X = uniform(1,0.0,1.0)
ω = [rand()]
call(X,ω)


Out[4]:
0.6341940443630785

This is achieved more conviently simply using rand(X)


In [5]:
rand(X)


Out[5]:
0.7908536833896622

Expectations of Real-valued random variables can be found using sample_mean(X), which takes an additional keyword parameter nsamples.

sample_mean(X; nsamples = 1000)

Conditional Samples

Conditionally sampling is a little more complex. To conditionally sample from $X$ given $Y$, simply call rand(X,Y)


In [ ]:
X = normal(0,1)
Y = X > 0
samples = rand(X,Y,1000)
plot(x=samples,Geom.histogram)

rand(X,Y) is just a convenient synonym for cond_sample_bfs(X,Y) which draws exact samples. The alternative is cond_sample_mh(X,Y). We shall explain the difference between these two, for many it will suffice to know that cond_sample_bfs samples are exact, whereas cond_sample_mh using a variant of the Metropolis Hastings algorithm to handle higher dimensional problems, but as a result does not guarantee exact samples. To understand the differences, and how Sigma generates these samples we should first ask what does it even mean to conditionaly sample?

Conditional Sampling Theory

Recall $\mathbb{P}:\Sigma \to [0,1]$ is our probability measure, i.e., a function which assigns probabilities to subsets of omega. When we conditionally sampled from $X$ using rand(X) above, we really sampling an infinitesimly (but not zero) subset of. To conditionally sample we first construct a new probability measure $\mathbb{P_{Y}}$

To construct a conditional query we first construct a conditional measure $\mathbb{P}_{cond}$ finds subsets of $\Omega$. Given a set of disjoint subsets, we can sample by sampling an abstraction. Intuitively this is correct because larger regions.

Abstract Metropolis Hastings

Making better proposals

A major problem with abstract Metropolis Hastings is that a poor choice of where to split an abstraction can lead to rejected proposals. Consider the following preimage plot. If we split down the middle two-ways to create four sub-boxes and use a uniform transition probability of 1/4 for each sub-box, the tiny preimage region in the upper-left hand corner will receive the same amout of proposal mass as each of the three other regions, which are all fully satisfiable. The result is that the small box (or some subset of it) will be proposed approximately 1/4 of the time and almost always rejected.

This behaviour is a combination of two causes:

  1. our refinement strategy has no-backtracking, and hence if a sub-box is chosen, we will only further refine that subregion until a sat box is found
  2. an arbitrary splitting strategy will not in general split accoring to the actual underlying volumes.

In [3]:
X = uniform(0,1)
Y = uniform(0,1)
Z = (X > 0.5) | (Y < 0.5) | ((X  [0.25, 0.26]) & (Y  [0.75, 0.76]))
plot_preimage(Z)


Reached Max iterations - 1000
Out[3]:
xmin -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 -1 0 1 2 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 ymin 1.0 0.0 0.5 Color -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0 2.5 -1.00 -0.95 -0.90 -0.85 -0.80 -0.75 -0.70 -0.65 -0.60 -0.55 -0.50 -0.45 -0.40 -0.35 -0.30 -0.25 -0.20 -0.15 -0.10 -0.05 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 -1 0 1 2 -1.0 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

The fundamental problem is found in all MCMC algorithms, our proposal distribution poorly reflects the distribution of interest. In our case it can be off by several (10s or 100s) orders of magnitude. We first address this with methods which are non-adaptive, that is, the proposal distribution does not change. Later we can conider adaptive methods, which have addition constraints to ensure correctness.

Estimating Relative Volumes

Consider a subdivision of an abstraction into sub-abstractions $\tilde{A}$ and $\tilde{B}$, a good proposal distribution will assign mass to these according to the volume of the true preimage, each subatraction subsumes. This brings a circularity; since if we knew the volumes within each subregion there would be no need for metropolis hastings at all.

Bounding box as volume statistic

The idea here is just to find the a bounding box. We can do this using a binary search on each dimension individually

Refine each sub-abstraction further to get an estimate of relative volumes

Refine each box a few times and see how many boxes we can remove

RRT as a volume statistic

Rapidly exploring random trees (RRT) and its variants are sample based methods which work well for motion planning in high dimensional spaces. The fact that they are able to fill high dimensional spaces with a tree is motivation for their use in this case.

Basic algorithm:

  1. Sample a point in the coordinate space
  2. Find the nearest point in the current tree to that point
  3. Generate a node as close to point found in 1 as possible under constraint that an edge between it and node from 2 can be drawn.

The challenges in adapting this method to our needs are that

  1. RRT relies on being able to draw points uniformly over the space. The spaces used in motion planning have feasible regions large enough that simple rejection sampling is good enough to get points. This property does not hold for us, where the feasible regions are likely tiny.

  2. RRT is primarily concerned with expansion into continuous regions. If there are discontinuities in the state space, they are not of importance because it means the robot can not possible reach the target.

  3. When a new node is sampled, an edge is grown from the closest node in the tree as close as possible to it. This requires that we know that the edge is not interesecting a non disjoint region. I'm not entirely sure how they check this, but for us it could be done with an SMT solver, but that seems expensive. The goal is to have a fast approximation.

  • Whether a line in the state space collides with the unfeasible region can be framed as an SMT problem. Does there exist a point on this line which is in the unfeasible regions.
  • Alternatively, one could just check points. Checking points is easy. One could also just check a lot of points along the line.

Heuristic to try:

When splitting a box A into $A_i,..,A_n$. For each $A_i$

  1. Sample a valid point using SMT or abstract interpretation
  2. Construct an nball or ncube of radius $r$ around it truncated to its subregion and sample within that: $x_rand$
  3. Find nearest node in tree to $x_rand$
  4. Grow node as near as possible to $x_rand$ using either SMT or simply sampling along line to ensure its fully contained.
  5. Periodically, try to find point outside of convex hull of tree.