Bayes Rule (theorem)

Question: find the "posterior" probabilities if various outcomes $A_j$ for $j=1..d$ given some data $B$ and given lieklihood of this data $B$ for each outcome condition on each outcome and given prior information on the outcomes.

  • outcomes are $A_j$s (events)
  • data is $B$ (event which actually occured)
  • Posterior probability: $\mathbf{P}[A_j ~|~ B]$ for $j=1..d$
  • Likelihoods: $\mathbf{P}[B ~|~ A_j]$ for $j=1..d$
  • Priors: $\mathbf{P}[A_j]$ for $j=1..d$

Answer: (Bayes theorem)

$$\mathbf{P}[A_j ~|~ B] = \frac{\displaystyle \mathbf{P}[B ~|~ A_j] ~ \mathbf{P}[A_j]}{\text{normalizing factor}}$$

where the $\text{normalizing factor} = \displaystyle \sum_{i=1}^d \mathbf{P}[B~|~A_j] \mathbf{P}[A_j]$

Note: for this to work, the $A_j$ have to span all possibilities and are disjoint. For example, with $d=2$ we typically use the notation $A_1 = A$ and $A_2=A^c$.

Other stuff

  • $\mathbf{P}[A^c] = 1 - \mathbf{P}[A]$

  • $\mathbf{P}[A\cup B] = \mathbf{P}[A] + \mathbf{P}[B] - \mathbf{P}[A \cap B]$
    0

  • $\mathbf{P}[A\cup B\caup C] = \mathbf{P}[A] + \mathbf{P}[B] + \mathbf{P}[C] - \mathbf{P}[A \cap B] - \mathbf{P}[A\cap C] - \mathbf{P}[B\cap C] + \mathbf{P}[A\cap B\cap C]$

  • If $A\&B$ are independent: $\mathbf{P}[A \cap B] = \mathbf{P}[A]~ \mathbf{P}[B]$ (this is in fact the definition of independence)

Example: "Chevalia de Mere" problem
$$ \mathbf{P}[\text{at least one "six" in 5 die tosses}] = 1 - \mathbf{P}[\text{no "six" in 4 dies tosses}] = 1 - \left(\frac{5}{6}\right)^4 \approx 0.5177$$$$ \mathbf{P}[\text{at least one "six-six" in 24 tosses of two dice}] = 1 - \mathbf{P}[\text{nop "six-six" in 24 tosses of two dice}] = 1 - \left(\frac{35}{36}\right)^{24} \approx 0.4914$$

Discrete vs. Continuous Random Variable

  • For discrete RV: we have probability mass function (PMF) $p_X(x) = \mathbf{P}[X=x]$

  • For continuous case: we have density $f_X(x) = \mathbf{P}[x-dx \le x \le x] \frac{1}{dx}$

In both cases, we have the cumulative distribution function (CDF) is $F_X(x) = \mathbf{P}[X\le x]$

  • The CDF always exists.

For The continuous case, if density exists, then $\displaystyle f_X(x) = \frac{\displaystyle d~F_X(x)}{\displaystyle dx} = F'(x)$

Expectations

  • Discrete: $\mathbf{E}[X] = \displaystyle \sum_x x~p_X(x)$
    sum over all values of $x$ such that $p_X(x)\ne 0$

  • Continuous: $\mathbf{E}[X] =\displaystyle \int_{-\infty}^\infty x ~f_X(x) ~ dx$
    Note: integrate over the range of the random variable which is the same as the interval(s) where $f_X$ is non-zero

Chebyshev Inequality

  • It works for every random variable

  • Assume that the 1st moment of random variable $X$ exists; i.e., $\mathbf{E}[X] < \infty$

Then,

$$\mathbf{P}[|X| > u] \le \frac{\mathbf{E}[|X|]}{u}$$

Remark: If $\mathbf{E}[Y^2] < \infty$, then

$$\displaystyle \mathbf{P}[|Y|> a] = \mathbf{P}[|Y|^2 > a^2] \text{applied Chebyshev to }X=|Y|^2 \text{&} u=a^2 \Longrightarrow\\ \le \frac{\mathbf{E}[|Y|^2]}{a^2}$$

Application: Weak Law of Large Numbers

Assume $X_1,X_2,...X_n$ are $iid$ with $\mathbf{E}[|X_i|^2]<\infty$.

Then, compute $\displaystyle \mathbf{P}[|\frac{X_1 + X_2 + ... +X_n}{n} -\mathbf{E}[X_1]| > \epsilon]=?$

Answer: We use Chebyshev's inequality: Let's use $Y = \frac{X_1 + X_2 + ... +X_n}{n} -\mathbf{E}[X_1]$

$$\displaystyle \mathbf{P}[|Y| > \epsilon] \le \frac{\mathbf{E}[|Y|^2]}{\epsilon^2} = \frac{\mathbf{Var}[Y] + 0}{\epsilon^2} = \frac{\displaystyle \frac{n\mathbf{E}[X_1]}{n^2}}{\epsilon^2} = \frac{\mathbf{Var}[X_1]}{n\times \epsilon^2}$$

We used the fact that $\mathbf{E}[Y] = 0$ and $\mathbf{E}[|Y|^2] = \mathbf{Var}[Y] + (\mathbf{E}[Y])^2$

We see that when $\epsilon$ is fixed, and $n\to \infty$, we see that the empirical mean converges to the expectation of that random variable.

This means (by definition) that $$\frac{X_1 + X_2 + ... + X_n}{n} \to \mathbf{E}[X_i] \ \ \text{ converges in probability}$$

Special Discrete Random Variables

Bernoulli

p

Binomial:

  • Definition: sum of Bernoulli random variables

$(n choose k)p^k (1-p)^{n-k}$

When $n$ is large like ($n=10000$, we can a[pproximate the binomial probability using two methods:

  • Poisson approximation
  • Central limit theorem (normal approximation)

    Poisson approximation Let $X_n\sim Binomial(n, \frac{\lambda}{n})$, then $X_n$ is approximately $\sim Poisson(\lambda)$ for large $n$. (This is convergence in distribution, i.e. theor CDF will converge):

    This approxim means that $X_n \to X$ in distribution, this really means $F_{X_n} \to F_X(x)$ at any value $x$

    There are two ways to prove this:

    • write the CDF of both and use compute their limits
    • proof via moment generating functions and the $\lim_{n\to\infty} (1+\frac{x}{n})^n = e^x$

    Normal approximation Let $X_n\sim Binomial(n,p)$.
    Standardize $X_n$, then with large $n$, the following convergence in distribution: $$\frac{X_n -np}{\sqrt{np(1-p)}} \to \mathcal{N}(0,1) \ \ \text{in distribution}$$

    Proof by CLT

Recall on Mixtures

Probability Generating Functions (PGF)

$$P_X(z) = \mathbf{E}[z^X] = \mathbf{E}[e^{X~\ln z}] = M_X\left(\ln z\right)$$

Modeling with Poisson

  • Poisson process $N(t) = \text{# of successuve events in interval [0,t]}$ when $t$ changes

    Moreover, let $X_1,X_2, ...$ be the times between each arrival. Then, these are $iid$ Exponential $\sim Exp(\lambda)$.

    Note: $\lambda = \mathbf{E}[N(1)] = \frac{1}{\mathbf{E}[X_i]}$

    This means average frequency of arrivals is $\frac{1}{\text{average time of arrivals}}$

    Now let $Y_n = X_1 + X_2 + .. +X_n$ this is time of $n$th arrival. Then

    $$Y_n \sim \Gamma(\alpha=n, \theta=\frac{!}{\lambda})$$

    • $\alpha = \text{scale paramater}$
    • $\theta = \text{shape parameter}$

    Indeed, each $X_i \sim \Gamma(n=1, \theta=\frac{1}{\lambda})$ and $X_i \sim Exp(\lambda)$

    That means $Exp(\lambda) \text{eauivalent} \Gamma(\alpha=1, \theta=\frac{1}{\lambda})$

In general: the density of $\Gamma(\alpha,\theta)$ is $$f_X(x) = \frac{1}{\theta} \frac{1}{\Gamma(\alpha)} \left(\frac{x}{\theta}\right)^{\alpha-1} e^{-x/\theta}$$

when shape parameter $\theta=1$: $\displaystyle \Gamma(\alpha,1)$ is $f_X(x) = \frac{1}{\Gamma(\alpha)} \left(x\right)^{\alpha-1} e^{-x}$

A quick tip to remember the genral formula: basically, use the formulae for shape parameter 1, and when shape parameter is $\theta$, devide anywhere you see $x$ by $\theta$

  • $\mathbf{E}[X] = \alpha \theta$
  • $\mathbf{Var}[X] = \alpha \theta^2$
  • Meditate on these formulaes (expectation, variance): you can see that $\theta$ is a shape parameter, similar to $\sigma^2) in normal dist.$

Geometric

Negative Binomial

The sum of geomeric random variables


In [ ]: