Binomial Distribution

  • Efficient way of computing $Binom(k;N,p)$

In [ ]:

Hypergeometric Distributions

We have $R$ red balls, and $B$ black balls. Sample $n$ balls without replacement. Let $X=\#\text{ of red balls in sample }$.

Total number of balls: $N = R + B$

We find that $$\mathbf{P}[X = k] = \frac{\left(\begin{array}{c}n\\k\end{array}\right) \left(\begin{array}{c}N-n\\R-k\end{array}\right)}{\left(\begin{array}{c}N\\R\end{array}\right)}$$

This is the hypergemoetric distribution. As we mentioned before, $\mathbf{E}[X] = np$ and $\mathbf{Var}[X] = np(1-p) \frac{N-n}{N-1}$ where $p=\frac{R}{R+B}$

Note: if $n\ll N$ and $N\to \infty$ and $p$ is fixed, then the hypergeometric distribution tends to the binomial distribution.


For discrete distributions, we say that one distribution converges to another, (as a paramater "N" goes to infinity) if the probability mass function of the first one converges to the other's.

Geometric and Negative Binomial Distributions

Let $r$ be a fixed integer. Let $X_1,X_2,..X_n$ be an infinite sequence of i.i.d. Bernoulli random variables with success probability $p$.

Let $Y_r$ be the time at which the "rth" success occurs.

We say that $Y_r$ has the Negative Binomial Distribution with parameters $r$ and $p$.

Example:

Let $11000110000100$ be a sequence of coin tosses.

  • $r=5 \ \ \Rightarrow\ Y_5 = 12$
  • $r=2 \ \ \Rightarrow\ Y_2 = 2$
  • $r=3 \ \ \Rightarrow\ Y_3 = 6$

Special case: $r=1$

$Y_1$ is a geometric random variable. Recall $\mathbf{P}[Y_1 = k] = (1-p)^{k-1} \ p$

  • What about $\mathbf{P}[Y_r = k]$?
$$\mathbf{P}[Y_r = k] = \star \star \star (1-p)^{k-r} p^r$$

We need to find $\star \star \star$, which is the combination of ways to get $r-1$ sucesses chosen out of $k-1$ slots: $\left(\begin{array}{c}k-1\\r-1\end{array}\right)$

Therefore

$$\mathbf{P}[Y_r = k] = \left(\begin{array}{c}k-1\\r-1\end{array}\right) (1-p)^{k-r} p^r $$

Note: this argument works because for each configuration of successes and failures, the $probability=(1-p)^{k-1}p^r$, and also these configurations are all mutually incompatible.

How to compute $\mathbf{E}[Y_r]$ and $\mathbf{Var}[Y_r]$ efficiently?

  • Recall that $\mathbf{E}[Y_1] = \frac{1}{p}$ and $\mathbf{Var}[Y_1] = \frac{1-p}{p^2}$

    • Intuition: if the chance of winning is $0.05$, how many times you should expect to play to win for the first time? $\frac{1}{0.05} = 20$
  • Then, think of it this way for $r$. It takes $Y_1$ to win the first one. Now, we start fresh and from this time, we

    • therefore, $Y_r$ is eseentially the sum of how long it takes to get the first success, and how long it takes from 1st to the second, and ..... and how long it takes to get rth success from (r-1)th success.

    • Since, these times are independent of each other and all have Geometric distribution, $Geom(p)$, we can write

    $$Y_r = X_1 + X_2 + ... X_r\text{ where }X_i\text{'s are i.i.d. } \ Geom(p)$$

    Therefore,

    $$\mathbf{E}[Y_r] = r\frac{1}{p}\ \ \ \ \ \ \ \ \ \ \ \ \ \mathbf{Var}[Y_r] = r\frac{1-p}{p^2}$$

Poisson Distribution

Definition:

We say that $X$ is Poisson distributed with parameter $\lambda$ and we write $X\sim Poi(\lambda)$ if we have $$\mathbf{P}[X=k] = e^{-\lambda} \frac{\lambda^k}{k!}$$ for any $k=0,1,2,...$.

  • Let's compute $\mathbf{E}[X]$:

    $$\mathbf{E}[X] = \sum_{k=0}^{+\infty} k \mathbf{P}[X=k] = \sum_{k=0} k e^{-\lambda} \frac{\lambda^k}{k!} = \\ e^{-\lambda} \sum_{k=1} \frac{k}{k!} \lambda^k = e^{-\lambda} \sum_{k=1} \frac{\lambda^k}{(k-1)!} = \\ e^{\lambda} \sum_{k=0}\frac{\lambda \lambda^{k-1}}{(k-1)!} = \lambda e^{-\lambda} \sum_{k=0} \frac{\lambda^k}{k!} = \lambda$$

    Therefore, $\mathbf{E}[X] = \lambda$

  • Similarly, we find $\mathbf{Var}[X] = \lambda$

Very important property of Poisson distribution: If $X_n\sim Binom(n, \frac{\lambda}{n})$, then, as $n\to \infty$, the distribution of $X_n \to Poi(\lambda)$, i.e.:

$$\lim_{n\to \infty} \mathbf{P}[X_n = k] = e^{-\lambda} \frac{\lambda^k}{k!} \ \ \ \text{ for }k=0,1,2,..$$

Rememeber that $$\mathbf{P}[X_n = k] = \left(\begin{array}{c}n\\k\end{array}\right) (\frac{\lambda}{n})^k (1-\frac{\lambda}{n})^{n-k}$$

Note: for large $n$, the PMF of Binomials are exteremely unlikely. That makes this theorem a terrific approximiation.

Idea behind this theorem:

Time axis from t=0 to t=1 divided into $n$ small pieces. Now, let $p=\text{chance of success}$ in any of these small intervals $=\frac{\lambda}{n}$. with $\lambda$ fixed. So, small chance of success everytime.

Total number of successes has average value=$\mathbf{E}[Binom(n,\frac{\lambda}{n})] = n\times \frac{\lambda}{n} = \lambda$

Theorem says that total number of success is actually pretty much Poisson with average value $=\lambda$.

$\lambda=\text{average number of successes in my interval of length 1}$

This makes Poisson a good model for a number of events of a certain type occuring in a fixed interval of time.

Interesting properties of Poisson

  • If $X_1 \sim Poi(\lambda_1)$ and $X_2 \sim Poi(\lambda_2)$ and $X_1,X_2$ are independent, then $X_1+X_2\sim Poi(\lambda_1+\lambda_2)$

Inutition:

Assume we have a time axis, with two parts, one with length $\lambda_1$, and the other part od length $\lambda_2$. We divide them to $2n$ components. For each interval, probability of success is $p=\frac{1}{n}$:

Corollary:

Let $X_1,X_2,..X_n$ are independent $Poi(\lambda_i)$ respectively, then $X_1+X_2+..+X_n \sim Poi(\lambda_1+\lambda_2+...+\lambda_n)$

Poisson Process

We have a time axis. In each tiny interval of length $\frac{1}{n}$, we have probability of success $p=\frac{\lambda}{n}$. For any given time $t\in [0,+\infty)$, let $N(t)="\#\text{ of successes}"$ up to time $t$. This $N$ is actually a constructive way of defining the Poisson process.

In particular, average number of successes in the interval $[0,t)$ is $\lambda t$. And by the convergence theorem, $N(t) \sim Poi(\lambda t)$.

Another interesting property: consider $0 < s < t$. The number of successes in $[0,s]$ is $N(s)\sim Poi(\lambda s)$

Now, number of successes in $[s,t]$ is $N(t) - N(s)$ (not $N(t-s)$), and $N(t) - N(s) \sim Poi(\lambda (t-s))$

And moreover, $N(s)$ and $N(t)-N(s)$ are independent.

Note: The property that if $[s,t]$ & $[u,v]$ are two time intervals which share at most one point, then $N(v) - N(u)$ and $N(t) - N(s)$ are independent, is called independence of increments.

Example:

Average number of murders in a year: 540

(a):

  • average # of murders in a day: $\frac{540}{365}\approx \frac{3}{2}$

    $\mathbf{P}[N(1)\ge 2] = 1 - \mathbf{P}[N(1)=0 \text{ or } 1] = 1- \sum_{k=0}^1 e^{-\lambda }\frac{\lambda^k}{k!}$

(b): $()^3$

(c): $=\mathbf{P}[N(5) = 0] $ and $N(5)\sim Poi(5\times \frac{3}{2}) = e^{-5\times \frac{3}{2}}$

Preview of Wiener Process also know as Brownian Motion

$$\frac{X}{n} = \frac{X_1+ X_2+..X_n}{n} \to p$$

$\frac{X - np}{\sqrt{n}}$ will converge in distribution to a non-trivial random variable distribution.

$\mathbf{Var}\left[\frac{X - np}{\sqrt{n}} \right] = \frac{np(1-p)}{(\sqrt{n})^2} = p(1-p)$

$$\frac{X - np}{\sqrt{n}} \to \mathcal{N}(0, p(1-p))$$

This is because of "central limit theorem".

Modeling Ideas

  • If the knowedge of how many events have occured in a fixed time period in the past gives us no information about how many future events will occur in a future fixed time period, then, the Poisson process is usually an excellent model for the number of events as they accumulate and grows in time.
    Based on independence of increments

Preview of modeling exponential random variables

They are good models for the times at which the Poisson process jumps.

Here, the "inter-arrival times" $\tau_1, \tau_2, ... \tau_5$ are independent sequential random variables.

We will see which time


In [ ]: