Lecture 25: Beta-Gamma (bank-post office), order statistics, conditional expectation, two envelope paradox

Stat 110, Prof. Joe Blitzstein, Harvard University

Connecting the Gamma and Beta Distributions

Say you have to visit both the bank and the post office today. What can we say about the total times you have to wait in the lines?

Let $X \sim \operatorname{Gamma}(a, \lambda)$ be the total time you wait in line at the bank, given that there are $a$ people in line in front of you, and the waiting times are i.i.d $\operatorname{Expo}(\lambda)$; recall the analogies of geometric $\rightarrow$ negative binomial, and of exponential $\rightarrow$ gamma. The waiting time in line at the bank for everyone individually is $\operatorname{Expo}(\lambda)$, and as the $a+1^{th}$ person, your time in line is sum of those $a$ $\operatorname{Expo}(\lambda)$ times.

Similarly, let $Y \sim \operatorname{Gamma}(b, \lambda)$ be the total time you wait in line at the post office, given that there are $b$ people in line in front of you.

Assume that $X, Y$ are independent.

Questions

What is the distribution of $T = X + Y$?
Given $T = X + Y$ and $W = \frac{X}{X+Y}$, what is the joint distribution?
Are $T, W$ independent?

What is the distribution of $T$?

We immediately know that the total time you spend waiting in the lines is

\begin{align} T &= X + Y \\ &\sim \operatorname{Gamma}(a+b, \lambda) \end{align}

What is the distribution of $T,W$?

Let $\lambda = 1$, to make the calculation simpler. We do not lose any generality, since we can scale by $\lambda$ later.

So we are looking the joint PDF of $T,W$

\begin{align} \text{joint PDF } f_{T,W}(t,w) &= f_{X,Y}(x,y) \, \left| \frac{\partial(x,y)}{\partial(t,w)} \right| \\ &= \frac{1}{\Gamma(a) \Gamma(b)} \, x^a \, e^{-x} \, y^b \, e^{-y} \, \frac{1}{xy} \, \left| \frac{\partial(x,y)}{\partial(t,w)} \right| \\\\ \\ \text{for the Jacobian, let } x + y &= t \\ \frac{x}{x+y} &= w \\ \\ \Rightarrow x &= tw \\ \\ 1 - \frac{x}{x+y} &= 1 - w \\ \frac{x + y - x}{t} &= 1 - w \\ \\ \Rightarrow y &= t(1-w) \\\\ \\ \left| \frac{\partial(x,y)}{\partial(t,w)} \right| &= \begin{bmatrix} \frac{\partial x}{\partial t} & \frac{\partial x}{\partial w} \\ \frac{\partial y}{\partial t} & \frac{\partial y}{\partial w} \end{bmatrix} \\ &= \begin{bmatrix} w & t \\ 1-w & -t \end{bmatrix} \\ &= -tw - t(1-w) \\ &= -t \\\\ \\ \text{returning to PDF } f_{T,W}(t,w) &= \frac{1}{\Gamma(a) \Gamma(b)} \, x^a \, e^{-x} \, y^b \, e^{-y} \, \frac{1}{xy} \, \left| \frac{\partial(x,y)}{\partial(t,w)} \right| \\ &= \frac{1}{\Gamma(a) \Gamma(b)} \, (tw)^a \, e^{-(tw)} \, (t(1-w))^b \, e^{-t(1-w)} \, \frac{1}{tw \, t(1-w)} \, t \\ &= \frac{1}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \,\, t^{a+b} \, e^{-t} \, \frac{1}{t} \, c &\quad \text{ where } c \text{ is the normalizing constant for } T \\ &= \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \,\, \frac{t^{a+b} \, e^{-t} \, \frac{1}{t}}{\Gamma(a+b)} &\quad \text{ multiplying by } 1 \end{align}

Since we are able to successfully derive $f_{T,W}(t,w)$ in terms of $T \sim \operatorname{Gamma}(a,b)$ and $W \sim \operatorname{Beta}(a,b)$, this means we have also answered the third question: $T,W$ are independent.

Unexpected Discovery: Normalizing Constant for Beta

Now say we are interested in finding the marginal PDF for $W$

\begin{align} f_{W}(w) &= \int_{-\infty}^{\infty} f_{T,W}(t,w) dt \\ &= \int_{-\infty}^{\infty} \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \,\, \frac{t^{a+b} \, e^{-t} \, \frac{1}{t}}{\Gamma(a+b)} \, dt \\ &= \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \, \int_{-\infty}^{\infty} \frac{t^{a+b} \, e^{-t} \, \frac{1}{t}}{\Gamma(a+b)} \, dt\\ &= \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \end{align}

But notice that since marginal PDF $f_{W}(w)$ must integrate to 1, then $\frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)}$ is the normalizing constant for the Beta distribution! If this were not true, then $f_{W}(w)$ could not be a valid PDF.

Example Usage: Finding $\mathbb{E}(W), W \sim \operatorname{Beta}(a,b)$

There are two ways you could find $\mathbb{E}(W)$.

You could use LOTUS, where you would simply do:

\begin{align} \mathbb{E}(W) &= \int \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \, w \, dw \\ &= \int \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a} \, (1-w)^{b-1} \, dw \\ \end{align}

... and would not be so hard to handle, since that also is a $\operatorname{Beta}$.

Or, since we are continuing on the topic of $W = X + Y$, we have:

\begin{align} \mathbb{E}(W) &= \mathbb{E}\left( \frac{X}{X+Y} \right) \\ &= \frac{\mathbb{E}(X)}{\mathbb{E}(X+Y)} \quad \text{ which is true, under certain conditions} \end{align}

So why is $\mathbb{E}\left( \frac{X}{X+Y} \right) = \frac{\mathbb{E}(X)}{\mathbb{E}(X+Y)}$?

Facts

since $T$ is independent of $W$, $\frac{X}{X+Y}$ is independent of $X+Y$
since independence implies they are uncorrelated, $\frac{X}{X+Y}$ and $X+Y$ are therefore uncorrelated
by definition, uncorrelated means \begin{align} \mathbb{E}(AB) - \mathbb{E}(A) \, \mathbb{E}(B) &= 0 \\ \mathbb{E}(AB) &= \mathbb{E}(A) \, \mathbb{E}(B) \\\\ \\ \mathbb{E} \left( \frac{X}{X+Y} \, (X+Y) \right) &= \mathbb{E}(\frac{X}{X+Y}) \, \mathbb{E}(X+Y) \\ \mathbb{E}(X) &= \mathbb{E}(\frac{X}{X+Y}) \, \mathbb{E}(X+Y) \\ \Rightarrow \mathbb{E}\left( \frac{X}{X+Y} \right) &= \frac{\mathbb{E}(X)}{\mathbb{E}(X+Y)} \\\\ \\ \therefore \mathbb{E}(W) &= \mathbb{E} \left( \frac{X}{X+Y} \right) \\ &= \frac{\mathbb{E}(X)}{\mathbb{E}(X+Y)} \\ &= \frac{a}{a+b} \end{align}

Order Statistics

Let $X_1, X_2, \dots , X_n$ be i.i.d. The order statistics are $X_{(1)} \le X_{(2)}\le \dots \le X_{(n)}$, where

\begin{align} X_{(1)} &= min(X_1, X_2, \dots , X_n) \\ X_{(n)} &= max(X_1, X_2, \dots , X_n) \\ \\ \text{if } n \text{ is odd, } &\text{ median is } X_{( \frac{n+1}{2} )} \end{align}

Other statistics inlucde quartiles, etc.

Order statistis are difficult to work with, because they are dependent; knowing $X_{(1)}$ gives you information about $X_{(n)}$, for example
In the discrete case, things are tricky since you need to consider what to do in case of ties

Distribution of Order Statistics

Let $X_1, X_2, \dots , X_n$ be i.i.d. with PDF $f$ and CDF $F$. Find the CDF and PDF of marginal $X_{(j)}$ (we focus in only on $j$).

CDF: $P(X_{(j)} \le x)$

Looking at the image above...

\begin{align} \text{marginal CDF } P(X_{(j)} \le ) &= P(\text{at least } j \text{ of } X_i \le x) \\ &= \sum_{k=j}^n \binom{n}{k} \, F(x)^k \, \left( 1-F(x) \right)^{n-k} \\ \end{align}

PDF: $f_{X_{(j)}}(x)$

Rather than taking the derivative of the CDF (and avoiding working with tedious sums), let's once again look at an image and think about this...

Imagine a tiny interval about $x$ which we call $dx$. If we multiply the PDF by a infinitesimally small interval, we can calculate the probability that the order statistic of interest $j$ is in this tiny interval.

pick one of the $n$ statistics to land inside of $dx$
the probability that an order statistic lands inside of the $dx$ area is $f(x)dx$
there are $j-1$ to the left of $dx$
the remaining $n-j$ are to the right of $dx$

\begin{align} f_{X_{(j)}}(x) \, dx &= n \, \binom{n-1}{j-1} \left( f(x)dx \right) \, F(x)^{j-1} \, \left( 1-F(x) \right)^{n-j} \\ \\ \text{marginal PDF } f_{X_{(j)}}(x) &= n \, \binom{n-1}{j-1} \, F(x)^{j-1} \, \left( 1-F(x) \right)^{n-j} \, f(x) \\ \end{align}

Example: $\mathbb{E}|U_1 - U_2|$

Let $U_1, U_2, \dots , U_n$ be i.i.d. $\operatorname{Unif}(0,1)$.

Then the corresponding marginal PDF $f_{U_{(j)}}(x)$ is

\begin{align} f_{U_{(j)}}(x) &= n \, \binom{n-1}{j-1} \, x^{j-1} \, (1-x)^{n-j} \quad \text{for } 0 \le x \le 1 \\ \\ \Rightarrow U_{(j)} &\sim \operatorname{Beta}(j, n-j+1) \end{align}

Recall an earlier discussion of $\mathbb{E}|U_1 - U_2| = \mathbb{E}\left( max(U_1,U_2) \right) - \mathbb{E}\left( min(U_1,U_2) \right)$

But since

\begin{align} max(U_1,U_2) &= U_2 \, \text{ so } n = 2, j = 2 \\ \mathbb{E}\left( max(U_1,U_2) \right) &= \mathbb{E} \left(\operatorname{Beta}(2, 2-2+1) \right) \\ &= \mathbb{E} \left( \operatorname{Beta}(2, 1) \right) \\ &= \frac{2}{2+1} \\ &= \frac{2}{3} \\ \\ min(U_1,U_2) &= U_1 \, \text{ so } n = 2, j = 1 \\ \mathbb{E}\left( min(U_1,U_2) \right) &= \mathbb{E} \left(\operatorname{Beta}(1, 2-1+1) \right) \\ &= \mathbb{E} \left( \operatorname{Beta}(1,2) \right) \\ &= \frac{1}{2+1} \\ &= \frac{1}{3} \\ \\ \Rightarrow \mathbb{E}|U_1 - U_2| &= \frac{2}{3} - \frac{1}{3} \\ &= \boxed{\frac{1}{3}} \end{align}

Conditional Expection

If you understand conditional probability, then you can extend that to conditional expectation.

\begin{align} \text{consider } \mathbb{E}(X|A) &\text{where } A \text{ is an event } \\\\ \\ \mathbb{E}(X) &= \mathbb{E}(X|A)P(A) + \mathbb{E}(X|A^{\complement})P(A^{\complement}) \\ \\ \mathbb{E}(X) &= \sum_{x} x \, P(X=x) &\text{where you expand } P(X=x) \text{ with LOTP } \end{align}

We will go more into this next time...

Two-envelope Paradox

Now consider this paradox before we leave off.

There are two envelopes with cash inside them. You do not know how much is inside, only that one envelope has twice as much as the other.

Let's say you open up one of the envelopes and find $100 inside.

Should you switch?

Well, the other envelope could contain either 50 or it could contain 200. The mean of those two amounts is $125, so wouldn't that mean you should switch?

But then again, it doesn't matter that the envelope you opened contained 100: it could have been any amount $n$. So the other envelope could hold $\frac{n}{2}$ or $2n$, the average being $\frac{5n}{4}$, so you should switch. But then the same argument applies, so you should switch back. But then the same argument applies, so you should switch again...? And again...? And again...? Ad infinitum, ad nauseum.

To be continued.

View Lecture 25: Order Statistics and Conditional Expectation | Statistics 110 on YouTube.