Say you have to visit both the bank and the post office today. What can we say about the total times you have to wait in the lines?
Let $X \sim \operatorname{Gamma}(a, \lambda)$ be the total time you wait in line at the bank, given that there are $a$ people in line in front of you, and the waiting times are i.i.d $\operatorname{Expo}(\lambda)$; recall the analogies of geometric $\rightarrow$ negative binomial, and of exponential $\rightarrow$ gamma. The waiting time in line at the bank for everyone individually is $\operatorname{Expo}(\lambda)$, and as the $a+1^{th}$ person, your time in line is sum of those $a$ $\operatorname{Expo}(\lambda)$ times.
Similarly, let $Y \sim \operatorname{Gamma}(b, \lambda)$ be the total time you wait in line at the post office, given that there are $b$ people in line in front of you.
Assume that $X, Y$ are independent.
We immediately know that the total time you spend waiting in the lines is
\begin{align} T &= X + Y \\ &\sim \operatorname{Gamma}(a+b, \lambda) \end{align}Let $\lambda = 1$, to make the calculation simpler. We do not lose any generality, since we can scale by $\lambda$ later.
So we are looking the joint PDF of $T,W$
\begin{align} \text{joint PDF } f_{T,W}(t,w) &= f_{X,Y}(x,y) \, \left| \frac{\partial(x,y)}{\partial(t,w)} \right| \\ &= \frac{1}{\Gamma(a) \Gamma(b)} \, x^a \, e^{-x} \, y^b \, e^{-y} \, \frac{1}{xy} \, \left| \frac{\partial(x,y)}{\partial(t,w)} \right| \\\\ \\ \text{for the Jacobian, let } x + y &= t \\ \frac{x}{x+y} &= w \\ \\ \Rightarrow x &= tw \\ \\ 1 - \frac{x}{x+y} &= 1 - w \\ \frac{x + y - x}{t} &= 1 - w \\ \\ \Rightarrow y &= t(1-w) \\\\ \\ \left| \frac{\partial(x,y)}{\partial(t,w)} \right| &= \begin{bmatrix} \frac{\partial x}{\partial t} & \frac{\partial x}{\partial w} \\ \frac{\partial y}{\partial t} & \frac{\partial y}{\partial w} \end{bmatrix} \\ &= \begin{bmatrix} w & t \\ 1-w & -t \end{bmatrix} \\ &= -tw - t(1-w) \\ &= -t \\\\ \\ \text{returning to PDF } f_{T,W}(t,w) &= \frac{1}{\Gamma(a) \Gamma(b)} \, x^a \, e^{-x} \, y^b \, e^{-y} \, \frac{1}{xy} \, \left| \frac{\partial(x,y)}{\partial(t,w)} \right| \\ &= \frac{1}{\Gamma(a) \Gamma(b)} \, (tw)^a \, e^{-(tw)} \, (t(1-w))^b \, e^{-t(1-w)} \, \frac{1}{tw \, t(1-w)} \, t \\ &= \frac{1}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \,\, t^{a+b} \, e^{-t} \, \frac{1}{t} \, c &\quad \text{ where } c \text{ is the normalizing constant for } T \\ &= \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \,\, \frac{t^{a+b} \, e^{-t} \, \frac{1}{t}}{\Gamma(a+b)} &\quad \text{ multiplying by } 1 \end{align}Since we are able to successfully derive $f_{T,W}(t,w)$ in terms of $T \sim \operatorname{Gamma}(a,b)$ and $W \sim \operatorname{Beta}(a,b)$, this means we have also answered the third question: $T,W$ are independent.
Now say we are interested in finding the marginal PDF for $W$
\begin{align} f_{W}(w) &= \int_{-\infty}^{\infty} f_{T,W}(t,w) dt \\ &= \int_{-\infty}^{\infty} \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \,\, \frac{t^{a+b} \, e^{-t} \, \frac{1}{t}}{\Gamma(a+b)} \, dt \\ &= \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \, \int_{-\infty}^{\infty} \frac{t^{a+b} \, e^{-t} \, \frac{1}{t}}{\Gamma(a+b)} \, dt\\ &= \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \end{align}But notice that since marginal PDF $f_{W}(w)$ must integrate to 1, then $\frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)}$ is the normalizing constant for the Beta distribution! If this were not true, then $f_{W}(w)$ could not be a valid PDF.
There are two ways you could find $\mathbb{E}(W)$.
You could use LOTUS, where you would simply do:
\begin{align} \mathbb{E}(W) &= \int \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a-1} \, (1-w)^{b-1} \, w \, dw \\ &= \int \frac{\Gamma(a+b)}{\Gamma(a) \Gamma(b)} \, w^{a} \, (1-w)^{b-1} \, dw \\ \end{align}... and would not be so hard to handle, since that also is a $\operatorname{Beta}$.
Or, since we are continuing on the topic of $W = X + Y$, we have:
\begin{align} \mathbb{E}(W) &= \mathbb{E}\left( \frac{X}{X+Y} \right) \\ &= \frac{\mathbb{E}(X)}{\mathbb{E}(X+Y)} \quad \text{ which is true, under certain conditions} \end{align}So why is $\mathbb{E}\left( \frac{X}{X+Y} \right) = \frac{\mathbb{E}(X)}{\mathbb{E}(X+Y)}$?
Facts
Let $X_1, X_2, \dots , X_n$ be i.i.d. The order statistics are $X_{(1)} \le X_{(2)}\le \dots \le X_{(n)}$, where
\begin{align} X_{(1)} &= min(X_1, X_2, \dots , X_n) \\ X_{(n)} &= max(X_1, X_2, \dots , X_n) \\ \\ \text{if } n \text{ is odd, } &\text{ median is } X_{( \frac{n+1}{2} )} \end{align}Other statistics inlucde quartiles, etc.
Let $X_1, X_2, \dots , X_n$ be i.i.d. with PDF $f$ and CDF $F$. Find the CDF and PDF of marginal $X_{(j)}$ (we focus in only on $j$).
Looking at the image above...
\begin{align} \text{marginal CDF } P(X_{(j)} \le ) &= P(\text{at least } j \text{ of } X_i \le x) \\ &= \sum_{k=j}^n \binom{n}{k} \, F(x)^k \, \left( 1-F(x) \right)^{n-k} \\ \end{align}Rather than taking the derivative of the CDF (and avoiding working with tedious sums), let's once again look at an image and think about this...
Imagine a tiny interval about $x$ which we call $dx$. If we multiply the PDF by a infinitesimally small interval, we can calculate the probability that the order statistic of interest $j$ is in this tiny interval.
Let $U_1, U_2, \dots , U_n$ be i.i.d. $\operatorname{Unif}(0,1)$.
Then the corresponding marginal PDF $f_{U_{(j)}}(x)$ is
\begin{align} f_{U_{(j)}}(x) &= n \, \binom{n-1}{j-1} \, x^{j-1} \, (1-x)^{n-j} \quad \text{for } 0 \le x \le 1 \\ \\ \Rightarrow U_{(j)} &\sim \operatorname{Beta}(j, n-j+1) \end{align}Recall an earlier discussion of $\mathbb{E}|U_1 - U_2| = \mathbb{E}\left( max(U_1,U_2) \right) - \mathbb{E}\left( min(U_1,U_2) \right)$
But since
\begin{align} max(U_1,U_2) &= U_2 \, \text{ so } n = 2, j = 2 \\ \mathbb{E}\left( max(U_1,U_2) \right) &= \mathbb{E} \left(\operatorname{Beta}(2, 2-2+1) \right) \\ &= \mathbb{E} \left( \operatorname{Beta}(2, 1) \right) \\ &= \frac{2}{2+1} \\ &= \frac{2}{3} \\ \\ min(U_1,U_2) &= U_1 \, \text{ so } n = 2, j = 1 \\ \mathbb{E}\left( min(U_1,U_2) \right) &= \mathbb{E} \left(\operatorname{Beta}(1, 2-1+1) \right) \\ &= \mathbb{E} \left( \operatorname{Beta}(1,2) \right) \\ &= \frac{1}{2+1} \\ &= \frac{1}{3} \\ \\ \Rightarrow \mathbb{E}|U_1 - U_2| &= \frac{2}{3} - \frac{1}{3} \\ &= \boxed{\frac{1}{3}} \end{align}If you understand conditional probability, then you can extend that to conditional expectation.
\begin{align} \text{consider } \mathbb{E}(X|A) &\text{where } A \text{ is an event } \\\\ \\ \mathbb{E}(X) &= \mathbb{E}(X|A)P(A) + \mathbb{E}(X|A^{\complement})P(A^{\complement}) \\ \\ \mathbb{E}(X) &= \sum_{x} x \, P(X=x) &\text{where you expand } P(X=x) \text{ with LOTP } \end{align}We will go more into this next time...
Now consider this paradox before we leave off.
There are two envelopes with cash inside them. You do not know how much is inside, only that one envelope has twice as much as the other.
Let's say you open up one of the envelopes and find $100 inside.
Should you switch?
Well, the other envelope could contain either 50 or it could contain 200. The mean of those two amounts is $125, so wouldn't that mean you should switch?
But then again, it doesn't matter that the envelope you opened contained 100: it could have been any amount $n$. So the other envelope could hold $\frac{n}{2}$ or $2n$, the average being $\frac{5n}{4}$, so you should switch. But then the same argument applies, so you should switch back. But then the same argument applies, so you should switch again...? And again...? And again...? Ad infinitum, ad nauseum.
To be continued.