In [ ]:
We have $R$ red balls, and $B$ black balls. Sample $n$ balls without replacement. Let $X=\#\text{ of red balls in sample }$.
Total number of balls: $N = R + B$
We find that $$\mathbf{P}[X = k] = \frac{\left(\begin{array}{c}n\\k\end{array}\right) \left(\begin{array}{c}N-n\\R-k\end{array}\right)}{\left(\begin{array}{c}N\\R\end{array}\right)}$$
This is the hypergemoetric distribution. As we mentioned before, $\mathbf{E}[X] = np$ and $\mathbf{Var}[X] = np(1-p) \frac{N-n}{N-1}$ where $p=\frac{R}{R+B}$
Note: if $n\ll N$ and $N\to \infty$ and $p$ is fixed, then the hypergeometric distribution tends to the binomial distribution.
For discrete distributions, we say that one distribution converges to another, (as a paramater "N" goes to infinity) if the probability mass function of the first one converges to the other's.
Let $r$ be a fixed integer. Let $X_1,X_2,..X_n$ be an infinite sequence of i.i.d. Bernoulli random variables with success probability $p$.
Let $Y_r$ be the time at which the "rth" success occurs.
We say that $Y_r$ has the Negative Binomial Distribution with parameters $r$ and $p$.
Let $11000110000100$ be a sequence of coin tosses.
$Y_1$ is a geometric random variable. Recall $\mathbf{P}[Y_1 = k] = (1-p)^{k-1} \ p$
We need to find $\star \star \star$, which is the combination of ways to get $r-1$ sucesses chosen out of $k-1$ slots: $\left(\begin{array}{c}k-1\\r-1\end{array}\right)$
Therefore
$$\mathbf{P}[Y_r = k] = \left(\begin{array}{c}k-1\\r-1\end{array}\right) (1-p)^{k-r} p^r $$Note: this argument works because for each configuration of successes and failures, the $probability=(1-p)^{k-1}p^r$, and also these configurations are all mutually incompatible.
Recall that $\mathbf{E}[Y_1] = \frac{1}{p}$ and $\mathbf{Var}[Y_1] = \frac{1-p}{p^2}$
Then, think of it this way for $r$. It takes $Y_1$ to win the first one. Now, we start fresh and from this time, we
therefore, $Y_r$ is eseentially the sum of how long it takes to get the first success, and how long it takes from 1st to the second, and ..... and how long it takes to get rth success from (r-1)th success.
Since, these times are independent of each other and all have Geometric distribution, $Geom(p)$, we can write
$$Y_r = X_1 + X_2 + ... X_r\text{ where }X_i\text{'s are i.i.d. } \ Geom(p)$$
Therefore,
$$\mathbf{E}[Y_r] = r\frac{1}{p}\ \ \ \ \ \ \ \ \ \ \ \ \ \mathbf{Var}[Y_r] = r\frac{1-p}{p^2}$$
We say that $X$ is Poisson distributed with parameter $\lambda$ and we write $X\sim Poi(\lambda)$ if we have $$\mathbf{P}[X=k] = e^{-\lambda} \frac{\lambda^k}{k!}$$ for any $k=0,1,2,...$.
Let's compute $\mathbf{E}[X]$:
$$\mathbf{E}[X] = \sum_{k=0}^{+\infty} k \mathbf{P}[X=k] = \sum_{k=0} k e^{-\lambda} \frac{\lambda^k}{k!} = \\ e^{-\lambda} \sum_{k=1} \frac{k}{k!} \lambda^k = e^{-\lambda} \sum_{k=1} \frac{\lambda^k}{(k-1)!} = \\ e^{\lambda} \sum_{k=0}\frac{\lambda \lambda^{k-1}}{(k-1)!} = \lambda e^{-\lambda} \sum_{k=0} \frac{\lambda^k}{k!} = \lambda$$
Therefore, $\mathbf{E}[X] = \lambda$
Very important property of Poisson distribution: If $X_n\sim Binom(n, \frac{\lambda}{n})$, then, as $n\to \infty$, the distribution of $X_n \to Poi(\lambda)$, i.e.:
$$\lim_{n\to \infty} \mathbf{P}[X_n = k] = e^{-\lambda} \frac{\lambda^k}{k!} \ \ \ \text{ for }k=0,1,2,..$$Rememeber that $$\mathbf{P}[X_n = k] = \left(\begin{array}{c}n\\k\end{array}\right) (\frac{\lambda}{n})^k (1-\frac{\lambda}{n})^{n-k}$$
Note: for large $n$, the PMF of Binomials are exteremely unlikely. That makes this theorem a terrific approximiation.
Idea behind this theorem:
Time axis from t=0 to t=1 divided into $n$ small pieces. Now, let $p=\text{chance of success}$ in any of these small intervals $=\frac{\lambda}{n}$. with $\lambda$ fixed. So, small chance of success everytime.
Total number of successes has average value=$\mathbf{E}[Binom(n,\frac{\lambda}{n})] = n\times \frac{\lambda}{n} = \lambda$
Theorem says that total number of success is actually pretty much Poisson with average value $=\lambda$.
$\lambda=\text{average number of successes in my interval of length 1}$
This makes Poisson a good model for a number of events of a certain type occuring in a fixed interval of time.
Assume we have a time axis, with two parts, one with length $\lambda_1$, and the other part od length $\lambda_2$. We divide them to $2n$ components. For each interval, probability of success is $p=\frac{1}{n}$:
We have a time axis. In each tiny interval of length $\frac{1}{n}$, we have probability of success $p=\frac{\lambda}{n}$. For any given time $t\in [0,+\infty)$, let $N(t)="\#\text{ of successes}"$ up to time $t$. This $N$ is actually a constructive way of defining the Poisson process.
In particular, average number of successes in the interval $[0,t)$ is $\lambda t$. And by the convergence theorem, $N(t) \sim Poi(\lambda t)$.
Another interesting property: consider $0 < s < t$. The number of successes in $[0,s]$ is $N(s)\sim Poi(\lambda s)$
Now, number of successes in $[s,t]$ is $N(t) - N(s)$ (not $N(t-s)$), and $N(t) - N(s) \sim Poi(\lambda (t-s))$
And moreover, $N(s)$ and $N(t)-N(s)$ are independent.
Note: The property that if $[s,t]$ & $[u,v]$ are two time intervals which share at most one point, then $N(v) - N(u)$ and $N(t) - N(s)$ are independent, is called independence of increments.
Average number of murders in a year: 540
(a):
average # of murders in a day: $\frac{540}{365}\approx \frac{3}{2}$
$\mathbf{P}[N(1)\ge 2] = 1 - \mathbf{P}[N(1)=0 \text{ or } 1] = 1- \sum_{k=0}^1 e^{-\lambda }\frac{\lambda^k}{k!}$
(b): $()^3$
(c): $=\mathbf{P}[N(5) = 0] $ and $N(5)\sim Poi(5\times \frac{3}{2}) = e^{-5\times \frac{3}{2}}$
$\frac{X - np}{\sqrt{n}}$ will converge in distribution to a non-trivial random variable distribution.
$\mathbf{Var}\left[\frac{X - np}{\sqrt{n}} \right] = \frac{np(1-p)}{(\sqrt{n})^2} = p(1-p)$
$$\frac{X - np}{\sqrt{n}} \to \mathcal{N}(0, p(1-p))$$This is because of "central limit theorem".
In [ ]: