Lecture 20: Expected distance between Normals, Multinomial, Cauchy

Stat 110, Prof. Joe Blitzstein, Harvard University


In lecture 19, we saw how to obtain the expected distance between two uniform r.v. How would we do the same, but with the Normal distribution?

Example: expected distance between 2 Normally distributed random points $\mathbb{E} | Z_1 - Z_2 |$

Given $Z_1,Z_2 \sim \mathcal{N}(0,1)$, $Z_1$ and $Z_2$ are i.i.d., can we find $\mathbb{E} | Z_1 - Z_2 |$?

Now, we could just jump in and try using 2D LOTUS, but let's take a step back and recall what we have seen earlier in Lecture 14 about the linearity of Normals, and see if we can apply what we know about MGFs.

Theorem: If $X \sim \mathcal{N}(\mu_1, \sigma_1^2)$ and $Y \sim \mathcal{N}(\mu_2, \sigma_2^2)$ are independent, then $X + Y \sim \mathcal{N}(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2)$

Recall from Lecture 17 that the MGF of a sum of independent r.v. is just the product of their respective MGF.

\begin{align} M(X + Y) &= \mathbb{E}(e^{t(X+Y)}) \\ &= \mathbb{E}(e^{tX}) \, \mathbb{E}(e^{tY}) \\ &= \mathbb{E}(e^{\mu_1 t + \frac{1}{2} \sigma_1^2 t^2 }) \, \mathbb{E}(e^{\mu_2 t + \frac{1}{2} \sigma_2^2 t^2 }) \\ &= \mathbb{E} \left( e^{(\mu_1 + \mu_2) t + \frac{1}{2} (\sigma_1^2 + \sigma_2^2) t^2 } \right) &\quad \blacksquare \end{align}

Returning now to our original question, we want to find $\mathbb{E} |Z_1 - Z_2|$, with $Z_1, Z_2 \sim \mathcal{N}(0,1)$.

Keep in mind that $Z_1 - Z_2 \sim \mathcal{N}(0, 2)$. But since this is just the standard normal scaled with $\sqrt{2}$, we can do this:

\begin{align} \mathbb{E} | Z_1 - Z_2 | &= \mathbb{E} | \sqrt{2} Z | & \quad \text{, where } Z \sim \mathcal{N}(0,1) \\ &= \sqrt{2} \mathbb{E}|Z| \\ &= \sqrt{2} \int_{-\infty}^{\infty} |z| \, \frac{1}{\sqrt{2 \pi}} \, e~{-\frac{z^2}{2}} \, dz \\ &= \sqrt{\frac{2}{\pi}} \end{align}

Multinomial Distribution

A generalization of the binomial distribution, $\operatorname{Mult}(n, \vec{p})$ has parameter $n$ for number of items observed, with probability vector $\vec{p} = p_1, p_2, \dots , p_k)$ such that $p_j \ge 0$ and $\sum{p_i} = 1$

So while the binomial has 2 classes, with success equating to $p$ and failure equating to $q$, the multinomial has $k$ classes, each with respective probability $p_i$.

We have $n$ objects which we are independently putting into $k$ categories.

\begin{align} \vec{X} &= \operatorname{Mul}t(n, \vec{p}) \quad \text{, with } \vec{X} = ( X_1, X_2, \dots , X_k) \\ \\ P_j &= P(\text{category }j) \\ \\ X_j &= \text{number of objects in category } j \end{align}

Joint PMF of Multinomial

Because this is a joint distribution, $\operatorname{Mult}(n, \vec{p})$ has a joint PMF.

\begin{align} P(X_1=n_1, X_2=n_2, \dots , X_k=n_k) &= \frac{n!}{n_{1}!n_{2}! \dots n_{k}!} \, p_1^{n_1} p_2^{n_2} \dots p_k^{n_k} \end{align}

Marginal distribution $X_j$ of Multinomial

Given $\vec{X} \sim \operatorname{Mult_k}(n, \vec{p})$, what is the marginal distribution of $X_j$?

Well, since we are interested in only the two cases of an object being in class $k$ or not, this is binomial.

So for the marginal distribution $X_j$ of $\operatorname{Mult}$,

\begin{align} &X_j \sim \operatorname{Bin}(n, p_j) \\ \\ &\mathbb{E}(X_j) = n \, p_j \\ \\ &\operatorname{Var}(X_j) = n \, p_j \, (1 - p_j) \end{align}

Lumping Property

Given a situation where there are n voters in a population sample, and there are 10 political parties, $\vec{X} = (X_1, X_2, \dots, X_10) \sim Mult(n, (p_1, p_2, \dots, p_10))$...

Let us say that political parties 3 through 10 are relatively minor compared to parties 1 and 2. We can describe this case where we want to lump together parties 3 through 10 with another multinomial $Y$ such that:

\begin{align} \vec{Y} = (Y_1, Y_2, Y_{3, \dots,10}) \sim \operatorname{Mult}(n, (p_1, p_2, p_{3, \dots , 10})) \end{align}

Here, we gather up the counts/probabilities for parties 3 through 10, and now we have a multinomial with essentially three classes.

Conditional Joint PMF

Say that we have a k-class $\vec{X} \sim \operatorname{Mult_k}(n, \vec{p})$, but we know $X_1 = n_1$, but we don't know about the other $k-1$ classes...

Given $X_1 = n_1$

\begin{align} (X_2, \dots , X_k) &\sim \operatorname{Mult_{k-1}}(n-n_1, (p'_2, \dots , p'_k )) & \quad \text{ where } p'_j = P(\text{in class j} | \text{not in class }1) \\ &\sim \operatorname{Mult}((n - n_1), (\frac{p_2}{1-p_1}, \dots , \frac{p_k}{1-p_1})) & \quad \text{ or alternatively... } \\ &\sim \operatorname{Mult}((n - n_1), (\frac{p_2}{p_2 + \dots + p_k}, \dots , \frac{p_k}{p_2 + \dots + p_k})) & \quad \text{ or alternatively... } \end{align}

All we are doing here is simply re-normalizing the probability vector to take into account that we have information that $X_1 = n_1$. Multinomials are simple and intuitive!

The Cauchy Interview Problem

Or a good example of working with a continuous joint PDF

The Cauchy Distribution is $T = \frac{X}{Y}$ with $X, Y$ i.i.d. $\mathcal{N}(0,1)$.

It looks simple enough and appears to be quite useful, but it does have some weird properties.

  • it has no mean
  • it has no variance
  • it defies the Law of Large Numbers; averaging up a bunch of Cauchy will not yield a normal distribution, but remains Cauchy!

Can you find the PDF of T?

Using double integrals

Let us try to find the CDF, and derive the PDF from that.

\begin{align} \text{CDF: }P(\frac{X}{Y} \le t) &= P\left(\frac{X}{|Y|}\right) & \quad \text{ following from the symmetry of the Normal} \\ &= P(X \le t \, |Y| ) \\ &= \int_{-\infty}^{\infty} \int_{-\infty}^{t|y|} \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} \, \frac{1}{\sqrt{2\pi}} e^{-\frac{y^2}{2}} \, dx \, dy \\ &= \frac{1}{\sqrt{2\pi}} \, \int_{-\infty}^{\infty} e^{-\frac{y^2}{2}} \, \int_{-\infty}^{t|y|} \frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}} \, dx \, dy & \quad \text{ starting from } x \\ &= \frac{1}{\sqrt{2\pi}} \, \int_{-\infty}^{\infty} e^{-\frac{y^2}{2}} \, \Phi(t|y|) \, dy & \quad \text{but we have an even function} \\ &= \sqrt{ \frac{2}{\pi}} \, \int_{0}^{\infty} e^{-\frac{y^2}{2}} \, \Phi(t|y|) \, dy \\ \\ \text{PDF: } F'(t) &= \sqrt{\frac{2}{\pi}} \int_{0}^{\infty} y \, e^{-\frac{y^2}{2}} \, \frac{1}{\sqrt{2\pi}} \, e^{-\frac{t^2y^2}{2}} \, dy \\ &= \frac{1}{\pi} \int_{0}^{\infty} y \, e^{-\frac{(1+t^2)y^2}{2}} \, dy & \quad \text{and with } u = \frac{(1+t^2)y^2}{2} \text{, } du = (1+t^2) \, y \, dy \\ &= \frac{1}{\pi} \int_{0}^{\infty} \frac{(1 + t^2)}{(1 + t^2)} \, y \, e^{-\frac{(1+t^2)y^2}{2}} \, dy \\ &= \frac{1}{\pi} \int_{0}^{\infty} \frac{1}{1 + t^2} \, e^{-u} \, du \\ &= \boxed{ \frac{1}{\pi (1+t^2)} } & \quad \text{ and just integrate this to get the CDF} \\ \\ \text{CDF: } &= \boxed{ \frac{tan^{-1}(t)}{\pi} } \end{align}

Using the Law of Total Probability

Recall that the PDF of $\mathcal{N}(0,1)$ is $\phi(y)$

\begin{align} P(X \le t|Y|) &= \int_{-\infty}^{^infty} P\left(X \le t \lvert Y \rvert \mid Y=y\right) \, \phi(y) \, dy \\ &= \int_{-\infty}^{\infty} \Phi \left(t|y\right) \, \phi(y) \, dy & \quad \text{ which is pretty much what we did above } \end{align}