Lecture 22: Transformations, Log-Normal, Convolutions, Proving Existence

Stat 110, Prof. Joe Blitzstein, Harvard University


Variance of Hypergeometric, con't

Returning to where we left off in Lecture 21, recall that we are considering $X \sim \operatorname{HGeom}(w, b, n)$ where $p = \frac{w}{w+b}$ and $w + b = N$.

\begin{align} Var\left( \sum_{j=1}^{n} X_j \right) &= \operatorname{Var}(X_1) + \dots + \operatorname{Var}(X_n) + 2 \, \sum_{i<j} \operatorname{Cov}(X_i, X_j) \\ &= n \, Var(X_1) + 2 \, \binom{n}{2} \operatorname{Cov} (X_1, X_2) & \quad \text{symmetry, amirite?} \\ &= n \, p \, (1-p) + 2 \, \binom{n}{2} \left( \frac{w}{w+b} \, \frac{w-1}{w+b-1} - p^2 \right) \\ &= \frac{N-n}{N-1} \, n \, p \, (1-p) \\ \\ \text{where } \frac{N-n}{N-1} &\text{ is known as the finite population correction} \end{align}

Note how this closely resembles the variance for a binomial distribution, except for scaling by that finite population correction.

Let's idiot-check this:

\begin{align} \text{let } n &= 1 \\ \\ \operatorname{Var}(X) &= \frac{N-1}{N-1} 1 \, p \, (1-p) \\ &= p \, (1-p) & \quad \text{ ... just a Bernoulli, since we only sample once!} \\ \\ \text{let } N &\gg n \\ \Rightarrow \frac{N-n}{N-1} &= 1 \\ \operatorname{Var}(X) &= \frac{N-n}{N-1} n \, p \, (1-p) \\ &= n \, p \, (1-p) & \quad \text{ ... Binomial, we probably never sample same element twice!} \\ \end{align}

Transformations

Or a change of variables

A function of an r.v. is itself an r.v., and we can use LOTUS to find mean and variance. But what if we want more than just the mean and variance? What if we want to know the entire distribution (PDF)?

Theorem

Let $X$ be a continuous r.v. with PDF $f_X, Y = g(X)$. Given that $g$ is differentiable, and strictly increasing (at least on the region in which we are interested), then the PDF of $Y$ is given by

\begin{align} f_Y(y) &= f_X(x) \, \frac{dx}{dy} & \quad \text{ where } y = g(x) \text{ , } x = g^{-1}(y) \end{align}

And since we know from the Chain Rule that $\frac{dx}{dy} = \left( \frac{dy}{dx} \right)^{-1}$, you can substitute $\left( \frac{dy}{dx} \right)^{-1}$ for $\frac{dx}{dy}$ if that makes things easier.

Proof

\begin{align} &\text{starting from the CDF...} \\ \\ F_Y(y) &= P(Y \le y) \\ &= P \left(g(x) \le y \right) \\ &= P \left(X \le g^{-1}(y) \right) \\ &= F_X \left( g^{-1}(y) \right) \\ &= F_X(x) \\ \\ &\text{and now differentiating to get the PDF...} \\ \\ \Rightarrow f_{Y}(y) &= f_{X}(x) \frac{dx}{dy} \end{align}

Log-Normal

Now let's try applying what we now know about transformations to get the PDF of a Log-Normal distribution.

Given the log-normal distribution $Y = e^{z}$, where $Z \sim (0,1)$, find the PDF.

Note that $\frac{dy}{dz} = e^z = y$.

\begin{align} f_Y(y) &= f_Z{z} \, \frac{dz}{dy} \\ &= \frac{1}{\sqrt{2\pi}} \, e^{-\frac{lny^2}{2}} \, \frac{1}{y} & \quad \text{where }y \gt 0 \end{align}

Transformations in $\mathbb{R}^n$

Multi-dimensional Example

Here's a multi-dimensional example.

Given the distribution $\vec{Y} = g(\vec{X})$, where $g \colon \mathbb{R}^n \rightarrow \mathbb{R}^n$, with continuous joint PDF $\vec{X} = \{ X_1, \dots , X_n \}$.

What is the joint PDF of $Y$ in terms of the joint PDF $X$?

\begin{align} f_Y(\vec{y}) &= f_X(\vec{x}) \, | \frac{d\vec{x}}{d\vec{y}} | \\ \\ \text{where } \frac{d\vec{x}}{d\vec{y}} &= \begin{bmatrix} \frac{\partial x_1}{\partial y_1} & \cdots & \frac{\partial x_1}{\partial y_n} \\ \vdots&\ddots&\vdots \\ \frac{\partial x_n}{\partial y_1}& \cdots &\frac{\partial x_n}{\partial y_n} \end{bmatrix} & \text{... is the Jacobian} \\ \\ \text{and } | \frac{d\vec{x}}{d\vec{y}} | &= \left| \, det \, \frac{d\vec{x}}{d\vec{y}} \, \right| & \quad \text{... absolute value of determinant of Jacobian} \end{align}

Similar to the previous explanation on transformations, you can substitute $\left( | \, \frac{d\vec{y}}{d\vec{x}} \, | \right)^{-1}$ for $\frac{d\vec{x}}{d\vec{y}}$ if that makes things easier.

Convolutions

Distribution for a Sum of Random Variables

Let $T = X + Y$, where $X,Y$ are independent.

\begin{align} P(T=t) &= \sum_{x} P(X=x) \, P(Y=t-x) & \quad \text{discrete case}\\ \\ f_T(t) &= \int_{-\infty}^{\infty} f_X(x) \, f_Y(t-x) \, dx & \quad \text{continuous case} \\ \end{align}

Proof of continuous case

\begin{align} &\text{starting from the CDF...} \\ \\ F_T(t) &= P(T \le t) \\ &= \int_{-\infty}^{\infty} P(X + Y \le t \, | \, X=x) \, f_X(x) \, dx & \quad \text{ law of total probability} \\ &= \int_{-\infty}^{\infty} P(Y \le t - x) \, f_X(x) \, dx \\ &= \int_{-\infty}^{\infty} F_Y(t-x) \, f_X(x) \, dx \\ \\ &\text{and now differentiating w.r.t. } T \text{ ...} \\ \\ \Rightarrow f_{T}(t) &= \int_{-\infty}^{\infty} f_Y(t-x) \, f_X(x) \, dx \end{align}

Proving Existence

Using Probability to Prove the Existence of Object with Desired Properties

Let us say that $A$ is our desired property.

Can we show that $P(A) \gt 0$ for a random object? For if $P(A) \gt 0$, it follows that there should be at least one object with property $A$.

Suppose each object has some associated "score". We can pick a random object, and use that to compute the average score. From there, we can reason that there must be an object where this score is $\ge \mathbb{E}(X)$

Suppose we have:

  • 100 people
  • 15 committees
  • each committee has 20 people
  • assume that each person is on 3 committees

Show that there exists 2 committees where a group of 3 people are on both committees (overlap $\ge 3$).

Rather than try to enumerate all possible committee permutations, find the average overlap of 2 random committees using indicator random variables.

Proof

\begin{align} \text{let } \, I_1 &= \text{person 1 on both the randomly chosen committees} \\ \\ \text{then } \, P(I_1) &= \frac{\binom{3}{2}}{\binom{15}{2}} \\ \\ \mathbb{E}(overlap) &= 100 \, \frac{\binom{3}{2}}{\binom{15}{2}} & \quad \text{... by symmetry} \\ &= 100 \, \frac{3}{105} \\ &= \frac{20}{7} \\ &= 2.857142857142857 \\ \end{align}

But if the average overlap is $\frac{20}{7}$, since overlap must be an integer, we can safely round up and assume that average overlap is 3. And so we conclude that there must be at least one pair of committees where the overlap $\ge 3$.

This is similar to how Shannon proved his theory on channel capacity.