In Sigma, random variables are functions; to answer inference queries we compute preimages of sets under these functions. Computing preimages directly, i.e. running the function backwards is difficult; we instead compute images of sets. That is, given a function $f:A \to B$, we compute $f^\rightarrow(A') = \{f(a) : a \in A'\}$.
Unfortunately even this image function is difficult to find, and the best we can do is over-approximate it. That is, we define an $f^\sharp$ which approximates $f^\rightarrow$. Imprecision occurs when $f^\sharp$ deviates from $f^\rightarrow$.
There are numerous causes of imprecision currently in Sigma.
Merging in Discontinuous Functions: Currently in Sigma
Treating Dependent Variables Independently:
Here I review the problem of dependencies between variables, random or otherwise.
In [ ]:
using Gadfly
using PyPlot
using Sigma
Consider the example:
In [2]:
function f(x)
a = round(x)
b = round(x)
c = a + b
end
Out[2]:
suppose we want to evaluate the image of a set, i.e. $f^\rightarrow(X)$ where for example $X = [0, 3]$
In [3]:
f(Interval(0,3))
Out[3]:
The true image of $[0,3]$ under $f$ is $\{0,2,4,6\}$. We can actually show this in a round about way using Sigma
In [6]:
x = uniform(0,0.0,3.0)
y = f(x)
prob((y == 0) | (y == 2) | (y == 4) | (y == 6))
Out[6]:
However by means of showing this we do lots of things, some of which are not the best.
First we represent a and b with normal Intervals. Typically this means a closed interval on the real line, but it should be just fine to let it represent an interval of integers. If we wanted to we could parameterise the types of Interval to make this explicit.
Suppose for the sake of argument we represent a and b as nonrelational sets $\{0,1,2,3\}$.
When we evaluate c we must consider all combinations of a and b because we have no more information.
In [9]:
using Iterators
c = [+(xy...) for xy in product([0,1,2,3],[0,1,2,3])]
@show length(c)
Set(c)
Out[9]:
c is now an imprecise overapproximation of the true image of $[0, 3]$
There are two main objectives. How can we:
The extraneous values, for example 3 in c in the example above, arise because when consider domains nonrelationally, when infact there are dependencies between variables, we are forced to consider possibilities that can not occur in reality.
What are impossible scenarious exactly? Considering the variables $a = round(x), x \in [0,3]$, we can express these variables as subset $A$ of $\mathbb{R}^2$, where each axis corresponds to a variable.
We can visualise this subset
In [15]:
Gadfly.plot(round,0,3)
Out[15]:
When we consider a variable non-relationally, we effectively project $A$ to the respective component. E.g., $b = proj_b(A) = \{0,1,2,3\}$. Equivalently, it's like considering $A$ as rectangular sets.
In [24]:
layers = [Gadfly.layer(x=[0,3], y=[i,i], Geom.line) for i = 0:3]
Gadfly.plot(layers...)
Out[24]:
When we apply functions to variables, another way to think of it is as updating $A$ to $A'$ where $A'$ is embedded in a higher dimensional space, which has a new dimension for the new variable.
That is, if $A \subseteq \mathbb{R}^i$, then $A' \subseteq \mathbb{R}^{i+1}$ where if we repsent $a \in A$ as $(a_1,..,a_i)$ then $a' \in A' = (a_1,..,a_i,f(a))$
Impossible scenarios occur when we construct new $A'$ by function of applications, we consider subsets of $A$ which are not in the true $A$.