Question: find the "posterior" probabilities if various outcomes $A_j$ for $j=1..d$ given some data $B$ and given lieklihood of this data $B$ for each outcome condition on each outcome and given prior information on the outcomes.
Answer: (Bayes theorem)
$$\mathbf{P}[A_j ~|~ B] = \frac{\displaystyle \mathbf{P}[B ~|~ A_j] ~ \mathbf{P}[A_j]}{\text{normalizing factor}}$$where the $\text{normalizing factor} = \displaystyle \sum_{i=1}^d \mathbf{P}[B~|~A_j] \mathbf{P}[A_j]$
Note: for this to work, the $A_j$ have to span all possibilities and are disjoint. For example, with $d=2$ we typically use the notation $A_1 = A$ and $A_2=A^c$.
$\mathbf{P}[A^c] = 1 - \mathbf{P}[A]$
$\mathbf{P}[A\cup B] = \mathbf{P}[A] + \mathbf{P}[B] - \mathbf{P}[A \cap B]$
0
$\mathbf{P}[A\cup B\caup C] = \mathbf{P}[A] + \mathbf{P}[B] + \mathbf{P}[C] - \mathbf{P}[A \cap B] - \mathbf{P}[A\cap C] - \mathbf{P}[B\cap C] + \mathbf{P}[A\cap B\cap C]$
If $A\&B$ are independent: $\mathbf{P}[A \cap B] = \mathbf{P}[A]~ \mathbf{P}[B]$ (this is in fact the definition of independence)
For discrete RV: we have probability mass function (PMF) $p_X(x) = \mathbf{P}[X=x]$
For continuous case: we have density $f_X(x) = \mathbf{P}[x-dx \le x \le x] \frac{1}{dx}$
In both cases, we have the cumulative distribution function (CDF) is $F_X(x) = \mathbf{P}[X\le x]$
For The continuous case, if density exists, then $\displaystyle f_X(x) = \frac{\displaystyle d~F_X(x)}{\displaystyle dx} = F'(x)$
Discrete: $\mathbf{E}[X] = \displaystyle \sum_x x~p_X(x)$
sum over all values of $x$ such that $p_X(x)\ne 0$
Continuous: $\mathbf{E}[X] =\displaystyle \int_{-\infty}^\infty x ~f_X(x) ~ dx$
Note: integrate over the range of the random variable which is the same as the interval(s) where $f_X$ is non-zero
It works for every random variable
Assume that the 1st moment of random variable $X$ exists; i.e., $\mathbf{E}[X] < \infty$
Then,
$$\mathbf{P}[|X| > u] \le \frac{\mathbf{E}[|X|]}{u}$$Remark: If $\mathbf{E}[Y^2] < \infty$, then
$$\displaystyle \mathbf{P}[|Y|> a] = \mathbf{P}[|Y|^2 > a^2] \text{applied Chebyshev to }X=|Y|^2 \text{&} u=a^2 \Longrightarrow\\ \le \frac{\mathbf{E}[|Y|^2]}{a^2}$$Assume $X_1,X_2,...X_n$ are $iid$ with $\mathbf{E}[|X_i|^2]<\infty$.
Then, compute $\displaystyle \mathbf{P}[|\frac{X_1 + X_2 + ... +X_n}{n} -\mathbf{E}[X_1]| > \epsilon]=?$
Answer: We use Chebyshev's inequality: Let's use $Y = \frac{X_1 + X_2 + ... +X_n}{n} -\mathbf{E}[X_1]$
We used the fact that $\mathbf{E}[Y] = 0$ and $\mathbf{E}[|Y|^2] = \mathbf{Var}[Y] + (\mathbf{E}[Y])^2$
We see that when $\epsilon$ is fixed, and $n\to \infty$, we see that the empirical mean converges to the expectation of that random variable.
This means (by definition) that $$\frac{X_1 + X_2 + ... + X_n}{n} \to \mathbf{E}[X_i] \ \ \text{ converges in probability}$$
p
$(n choose k)p^k (1-p)^{n-k}$
When $n$ is large like ($n=10000$, we can a[pproximate the binomial probability using two methods:
Central limit theorem (normal approximation)
Poisson approximation Let $X_n\sim Binomial(n, \frac{\lambda}{n})$, then $X_n$ is approximately $\sim Poisson(\lambda)$ for large $n$. (This is convergence in distribution, i.e. theor CDF will converge):
This approxim means that $X_n \to X$ in distribution, this really means $F_{X_n} \to F_X(x)$ at any value $x$
There are two ways to prove this:
Normal approximation Let $X_n\sim Binomial(n,p)$.
Standardize $X_n$, then with large $n$, the following convergence in distribution:
$$\frac{X_n -np}{\sqrt{np(1-p)}} \to \mathcal{N}(0,1) \ \ \text{in distribution}$$
Proof by CLT
Poisson process $N(t) = \text{# of successuve events in interval [0,t]}$ when $t$ changes
Moreover, let $X_1,X_2, ...$ be the times between each arrival. Then, these are $iid$ Exponential $\sim Exp(\lambda)$.
Note: $\lambda = \mathbf{E}[N(1)] = \frac{1}{\mathbf{E}[X_i]}$
This means average frequency of arrivals is $\frac{1}{\text{average time of arrivals}}$
Now let $Y_n = X_1 + X_2 + .. +X_n$ this is time of $n$th arrival. Then
$$Y_n \sim \Gamma(\alpha=n, \theta=\frac{!}{\lambda})$$
Indeed, each $X_i \sim \Gamma(n=1, \theta=\frac{1}{\lambda})$ and $X_i \sim Exp(\lambda)$
That means $Exp(\lambda) \text{eauivalent} \Gamma(\alpha=1, \theta=\frac{1}{\lambda})$
In general: the density of $\Gamma(\alpha,\theta)$ is $$f_X(x) = \frac{1}{\theta} \frac{1}{\Gamma(\alpha)} \left(\frac{x}{\theta}\right)^{\alpha-1} e^{-x/\theta}$$
when shape parameter $\theta=1$: $\displaystyle \Gamma(\alpha,1)$ is $f_X(x) = \frac{1}{\Gamma(\alpha)} \left(x\right)^{\alpha-1} e^{-x}$
A quick tip to remember the genral formula: basically, use the formulae for shape parameter 1, and when shape parameter is $\theta$, devide anywhere you see $x$ by $\theta$
In [ ]: