Lecture 17: Moment Generating Functions (MGFs), hybrid Bayes' rule, Laplace's rule of succession

Stat 110, Prof. Joe Blitzstein, Harvard University

$\operatorname{Expo}(\lambda)$ and the Memorylessness Property

Theorem: If $X$ is a positive, continuous r.v. with the memorylessness property, then $X \sim \operatorname{Expo}(\lambda)$ for some $\lambda$.

Let $F$ be the CDF of $X$, $G(x) = P(X \ge x) = 1 - F(x)$.

With the memoryless property, $G(s + t) = G(s) \, G(t)$.

This time, rather than trying to solve for $s$ or $t$, we are going to solve for the function $G$, in order to show that it is only the exponential function that has the memorylessness property (in the continuous case).

\begin{align} & \text{let } s = t & \quad \\ & \Rightarrow & G(2t) &= G(t + t) = G(t) \, G(t) = G(t)^{2} & \quad \\ & & G(3t) &= G(2t) \, G(t) = G(t)^{2} \, G(t) = G(t)^{3} & \quad \\ & &\dots & \quad \\ & & G(kt) &= G(t)^{k} & \quad \\ \\ & \text{case where } k = \frac{1}{n} & \quad \\ & \Rightarrow & G\left(2 \, \frac{t}{2}\right) &= G\left(\frac{t}{2}\right)^{2} \text{ so } G\left(\frac{t}{2}\right) = \sqrt{G(t)} = G(t)^{1/2} & \quad \\ & & G\left(\frac{t}{3}\right) &= G(t)^{1/3} & \quad \\ & &\dots & \quad \\ & & G\left(\frac{t}{k}\right) &= G(t)^{1/k} & \quad \\ \\ & \text{case where } k = \frac{m}{n} & \quad \\ & \Rightarrow & G\left(\frac{m}{n} \, t \right) &= G(t)^{m/n} & \quad \\ \\ & \text{let } x \in \mathbb{Q} & \quad \\ & \Rightarrow & G(x \, t) &= G(t)^{x} ~~~~ \text{ for all } x \ge 0\\ \\ \\ & \text{now let } t =1 & \quad \\ & \Rightarrow & G(x) &= G(1)^{x} & \quad \\ & & &= e^{x \, ln \, G(1)} ~~~~ \text{ where } ln \, G(1) \text{ is some negative real number } \\ & & &= e^{-\lambda x} & \quad \blacksquare \\ \end{align}

And so now we see that in the continuous case, $\operatorname{Expo}(\lambda)$ is the only distribution with the memorylessness property.

Moment Generating Function (MGF)

Moment generating functions are an alternative way to describe a distribution.

Definition

A random variable $X$ has MGF $M(t) = \mathbb{E}(e^{tX})$, as a function of $t$, if this is finite on some $(-a, a)$ where $a > 0$.

Note that any function of a random variable is itself a random variable, so it makes some sense that we can obtain the expected value $\mathbb{E}(e^{tX})$

But why is this called moment-generating?

\begin{align} \mathbb{E}(e^{tX}) &= \mathbb{E}\left(\sum_{n=0}^{\infty} \frac{X^{n} \, t^{n}}{n!} \right) &\quad \text{Taylor expand e} \\ &= \sum_{n=0}^{\infty} \left( \frac{\mathbb{E}(X^{n}) \, t^{n}}{n!}\right) &\quad \text{ where } \mathbb{E}(X^{n}) \text{ is called the } n^{th} \text{ moment} \\ \end{align}

Moments

the average value for a random variable $X$ $\mathbb{E}(X)$ is known as the first moment
the second moment of $X$ is $\mathbb{E}(X^{2})$ which helps use derive $\operatorname{Var}(X)$
higher moments are easily generated (derived), as well

3 reasons why MGF is important

Let $X$ have MGF $M(t)$.

The first moment $\mathbb{E}(^{n})$ is the coeficient of $\frac{t^{n}}{n!}$ in the Taylor series of $M$, i.e., $M^{n}(0) = \mathbb{E}(X^{n})$
MGF determines the distribution, i.e., if $X$ and $Y$ have the same MGF, then they have the same CDF
sums of random variables (convolutions) are difficult; but if we have MGFs, they are easy

Sums of MGFs

If we have independent r.v. $X$ and $Y$, and we know their respective moment generating functions, then we can easily find the moment generating function for $X + Y$

\begin{align} M(X + Y) &= \mathbb{E}(e^{t(X+Y)}) \\ &= \mathbb{E}(e^{tX}) \, \mathbb{E}(e^{tY}) &\quad \text{ by independence} \end{align}

MGF for $Bern(p)$

Given $X \sim \operatorname{Bern}(p)$, we obtain the MGF with

\begin{align} M(t) &= \mathbb{E}(e^{tX}) \\ &= p \, e^t * q &\quad \text{ where } q = 1-p \end{align}

MGF for $\operatorname{Bin}(p)$

Given $X \sim Bin(n,p)$, we obtain the MGF with

\begin{align} M(t) &= \mathbb{E}(e^{tX}) \\ &= \left( p \, e^t + q \right)^n &\quad \text{ by applying } G(kt) = G(t)^{k} \end{align}

MGF for standard normal $Z \sim \mathcal{N}(0,1)$

Given standard normal $Z \sim \mathcal{N}(0,1)$, we obtain the MGF with

\begin{align} M(t) &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} e^{tZ - Z^2/2} \, dZ \\ &= \frac{1}{\sqrt{2\pi}} ~~ e^{t^2/2} \int_{-\infty}^{\infty} e^{-\frac{1}{2}\,(Z-t)^2} \, dZ &\quad \text{ completing the square} \\ &= \frac{1}{\sqrt{2\pi}} ~~ e^{t^2/2} ~~ \sqrt{2\pi} &\quad \text{ recall the PDF of standard normal (Lec. 13)} \\ &= e^{t^2/2} \end{align}

*And just in case you've forgotten how to complete the square...

MGF for normal $X \sim \mathcal{N}(\mu, \sigma^2)$

\begin{align} M(t) &= \mathbb{E}(e^{tX}) \\ &= \int_{-\infty}^{\infty} e^{tx} \, \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2} \left(\frac{x - \mu}{\sigma}\right)^2} \, dx \\ &= \int_{-\infty}^{\infty} \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{\left( x^2 - 2x\mu + \mu^2 \right)}{2\sigma^2} + tx} \, dx \\ &= \int_{-\infty}^{\infty} \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{x^2 - 2x\mu - 2\sigma^{2}tx + \mu^2}{2\sigma^2}} \, dx \\ &= \int_{-\infty}^{\infty} \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2\sigma^2}x^2 - 2x(\mu + \sigma^{2}t) + \mu^2} \, dx \\ &= \int_{-\infty}^{\infty} \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2\sigma^2} (x - (\mu + \sigma^{2}t))^2 - (\mu + \sigma^{2}t)^2 + \mu^2} \, dx \\ &= e^{-\frac{1}{2\sigma^2} - (\mu + \sigma^{2}t)^2 + \mu^2} \int_{-\infty}^{\infty} \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2} \left( \frac{(x - (\mu + \sigma^{2}t))}{\sigma} \right)^2} \, dx \\ &= e^{-\frac{1}{2\sigma^2} (- \mu^2 - 2 \mu \sigma^{2}t - \sigma^{4}t^2 + \mu^2)} \\ &= e^{\frac{2 \mu \sigma^{2}t + \sigma^{4}t^2}{2\sigma^2} } \\ &= e^{\mu t + \frac{\sigma^2 t^2}{2} } \end{align}

Laplace's Rule of Succession

If we have observed the sun rising for the past $n$ days in succession, then what is the probability that the sun will rise tomorrow?

Given $p$ is the probability that the sun will rise on any given day $X_k$, we can consider a consecutive string of days $X_1, X_2, \dots \text{ i.i.d. } \operatorname{Bern}(p)$ which is conditional on $p$. But for the question above, we do not know what $p$ is. Bayesians treat $p$ as an r.v.

Problem structure

Let $p \sim \operatorname{Unif}(0,1)$ be our prior; we choose $\operatorname{Unif}(0,1)$ since $p$ could be anything
Let $S_n = X_1 + X_2 + \cdots + X_n$
So we then assume $S_n | p \sim \operatorname{Bin}(n,p) \text{, } p \sim \operatorname{Unif}(0,1)$

Questions

What is the posterior $p | S_n$?
What is $P(X_{n+1} | S_n)$, the probability that the sun will rise tomorrow given that we have observed so for the past $n$ days?

Solution

We use $f$ as a simple stand-in for the PDF $p$. We start with the general case:

\begin{align} f( p | S_n=k) &= \frac{P(S_n=k | p) f(p)}{P(S_n=k)} &\quad \text{ from Bayes' Rule} \\ &\propto p^k \, (1-p)^{n-k} \end{align}

But since

the prior $f(p) = 1$ since it is Uniform.
$P(S_n = k)$ does not depend on $p$
the binomial coefficient of $P(p | S_n=k) = \binom{n}{k} p^k \, (1-p)^{n-k}$ also does not depend on $p$, and can be treated as a constant

we can consider $f(p | S_n=k) $ with proportionality.

Now let's consider the case of our question, where the sun has risen for $n$ days straight:

\begin{align} \text{since } f(p) &= \int_{0}^{1} p^n \, dp \\ &= \frac{1}{n+1} \\ \\ \text{so } (p | S_n=n) &= \boxed{(n+1) \, p^n} &\quad \text{ normalizing for a valid PDF}\\ \\ \text{and } P(X_{n+1}=1 | S_n=n) &= \int_{0}^{1} (n+1) \, p \, p^n \, dp &\quad \text{ Fundamental Bridge, } \mathbb{E}(p | S_n=n) \\ &= \int_{0}^{1} (n+1) \, p^{n+1} \, dp \\ &= \boxed{\frac{n+1}{n+2}} \end{align}

View Lecture 17: Moment Generating Functions | Statistics 110 on YouTube.