In [ ]:
using Sigma
Sigma.loadvis()
There are various forms of independence. Starting with a probability space $(\Omega, \Sigma, \mathbb{P})$, we need a few extra definitions to make things precise.
Sigma Fields
Let $\cal{C}$ be a collection of subsets of $\Omega$. The $\sigma$-field generated by $\cal{C}$, denoted $\sigma(\cal{C})$, is a $\sigma$-field satisfying:
I.e. $\sigma(\cal{C})$ is the minimal $\sigma$-field covering the class $\cal{C}$.
Borel Sets
Suppose $\Omega = \mathbb{R}$ and let $\cal{C} = \{(a,b], -\infty \leq a \leq b \leq \infty\}$ where $\Sigma(\mathbb{R}) := \sigma(\cal{C})$ are the Borel subsets of $\mathbb{R}$. That is, $\sigma(\cal{C})$ is the $\sigma$-field generated by the set of all intervals that are open on the left and closed on the right. This may seem like a peculiar choice, but any way we might define $\cal{C}$ by permuting our choice for the left and right bound being open or closed, $\sigma(\cal{C})$ are equivalent.
Independent Events Events $A, B \in \Sigma$ are independent iff $\mathbb{P}(A \cap B)$ = $\mathbb{P}(A)\mathbb{P}(B)$. Equivalently $\mathbb{P}(A) = \mathbb{P}(A \vert B)$ and $\mathbb{P}(B) = \mathbb{P}(B \vert A)$. That is, conditioning on one, does not affect the probability of the other. With Sigma we can see both visually and numerically that given a pair of independent standard uniformly distributed random variables $X,Y \sim \cal{U}(0,1)$, the events corresponding to $X>Y$ and $X+Y <1$.
In [40]:
set_default_plot_size(20cm, 8cm)
X = uniform(0,0.,1.)
Y = uniform(1,0.,1.)
plot_preimage(Y>X)
Out[40]:
In [34]:
plot_preimage(((Y + X) < 1.))
Out[34]:
In [35]:
plot_preimage((Y > X) & ((Y + X) < 1.))
Out[35]:
When independent $P(X>Y) - P(X+Y <1) = 0$
In [37]:
prob((Y > X) & ((Y + X) < 1.)) - prob(Y > X) * prob((Y + X) < 1.)
Out[37]:
Note this independence may not (and probably will not) hold under a different choice of $\mathbb{P}$.
Independent set of events:
A finite set of events $A_1,..,A_n$ are mutually independent iff
$ \mathbb{P}(\bigcap_{i \in I} A_i) = \prod_{i \in I} \mathbb{P}(A_i) \; \; \text{for all finite } I \subset \{1,..,n\} $
Independent Classes:
Let $\cal{C}_i \subseteq \Sigma$,$i=1,..,n$ be a finite set of classes (sets of events). The classes $\cal{C}_i$ are independent, if for any choice $A_i,..,A_n$, with $A_i \in \cal{C}_i$, $i=1,..,n$, we have events $A_i,..,A_n$ are independent events. That is, we can take any event in class $1$, any event in class $2$, ..., any event in class $n$, and this set of events will be independent. Note, it does not imply that we can take any number of events within a given class.
Two Random Variables $X$ and $Y$ are independent iff the elements of the π-system generated by them are independent; that is to say, for every $a$ and $b$, the events $\{X ≤ a\}$ and ${Y ≤ b}$ are independent events.
Independent Random Variables: Random Variables are independent if their induced $\sigma$ fields are independent. The information provided by any individual random variable should not affect the behaviour of other random variables in the family.
If $X:\Omega \to T$ is a random variable, the $\sigma$-algebra generated by $X$, denoted $\sigma(X)$, is defined as
$ \sigma(X) = X^{\leftarrow}(\Sigma(A))) $
Where if $\cal{C}' \subseteq \Sigma$ is a class of subsets, we use the notation
$ X^{\leftarrow}(\mathcal{C}) = \{X^\leftarrow(C'):C' \in \cal{C}'\} $
Constructing Independent Random Variables:
Random Variables can be made independent by making them map from disjoint components of $\Omega$. We can interpret this geometrically - such random variables correspond to events which are rectangles. Rectangular events are always independent; by their nature, knowing something about one component tells us nothing about the other. note that the definition of independence corresponds to the method for finding the volume of a rectangle - take the product of the side lengths.
In [49]:
plot_preimage((uniform(0,0.,1.) > .4) & (uniform(1,0.,1.) > .3))
Out[49]:
Conditional Independence
Contextual Independence
Independence is useful if we can exploit it to make inference more efficient. This occurs when the events $A$ and $B$ are more easily approximated individually than the event $A \cap B$. This may happen when
A = Beta(0,0.1) B = flip(A) C = flip(A) D = flip(A) E = flip(A)
Suppose we want to find the probability of some set of observations. $P(C=true,D=false \vert A = 0.3) = P(C=true \vert A = 0.3)P(D=false \vert A = 0.3)$
The problem here is that this only factors for equality constraints. Suppose it did hold, for any subset. When refining. I would find $(C=0.3)^\leftarrow({true})$. This would be in two dimensios of $\Omega$, I'd do the normal stuff, and then I
In [71]:
X = normal(0,3.0,1.0)
plot_preimage(uniform(1,X,X) > 3.0)
In [46]:
prob(normal(1,3.0,4.0) > 5.0)
Out[46]: