Moment Generating Functions and Limit Theory

MGF: Moment Generating Function

Definition

The MGF of a random variable $X$ is:

$$M_X(t) = \mathbf{E}[e^{tX}]$$

for any $t\in \mathbb{R}$ where the expectation exists.

Some people call $M_X(.)$ the Laplace transform of the distribution of $X$.

Definition

The characteristic function of $X$ is $C(t) = \displaystyle \mathbf{E}[e^{itX}]$ where $i=\sqrt{-1}$ and $t\in \mathbb{R}$.

Some call this $X_X(.)$ the Fourier transform of $X$.

Note: Algenraically, $C_X(t)=M_X(it)$

Example:

$X\sim Exponential(\lambda=1)$

$$M_X(t) = \mathbf{E}[e^{tx}] = \int_0^\infty e^{-x} e^{tx}dx = \int_0^\infty e^{-(1-t)x} dx$$

In order for the above integral to exists: $t < 1$

and as a result, $\displaystyle M_X(t) = \frac{1}{1-t} ~\ \forall t<1$

Note: If $Y\sim Exponential(\lambda)$, then $Y = \frac{1}{\lambda} X$ therefore:

$$M_Y(t) = \mathbf{E}[e^{t\frac{X}{\lambda}}] = M_X(\frac{t}{\lambda}) = \frac{1}{1 - \displaystyle \frac{t}{\lambda}}$$

For Poisson: $N \sim Poisson(\lambda)$

$$M_N(t) = \mathbf{E}[e^{tN}] = \sum_{n=0}^\infty e^{tn} ~ e^{-\lambda} ~ \frac{\lambda^n}{n!} \\ e^{-\lambda} \sum_{n=0}^\infty \frac{(e^t\lambda)^n}{n!} \\= e^{-\lambda}\sum_{n=0}^\infty \frac{x^n}{n!} \text{ where }x=e^t\lambda$$

Reminder: $\displaystyle \sum_{n=0}^\infty \frac{x^n}{n!} = e^x$

So finally, we get: $$M_N(t) = e^{-\lambda}e^x = e^{\displaystyle \lambda(e^t - 1)} \ \ \forall t\in \mathbb{R}$$

Trick: for finding the value of $\displaystyle \sum_{n=0}^\infty \frac{x^n}{n!} = ?$

$\sum_{n=0}^\infty \frac{x^n}{n!} = \sum_{n=0}^\infty e^{-x}\frac{x^n}{n!} e^{x}$ and we know that $e^{-x}\frac{x^n}{n!}$ is the PMF of Posson random variable, so sum over all $n$ PMFs should always be $1$, therefore, $\displaystyle \sum_{n=0}^\infty \frac{x^n}{n!} = e^x$

Proposition

If $M_X(.)$ is defined near $0$ for every $t$ in an open interval containing $0$, then for any integer $k\ge 1$: $$\mathbf{E}[X^k] = \frac{d^k~M_X}{dt^k}(t=0)$$

The $k$th derivative of the function $M_X$ evaulated at $t=0$ is $=M_X^{(k)}(0)$.

The $\mathbf{E}[X^k]$ is called the moment of order $k$ for the distribution of $X$.

Proof:

$$\frac{d^kM_X}{dt^k}(t) = \frac{d^k}{dt^k} \mathbf{E}[e^{tX}] \\ = \mathbf{E}[\frac{d^k}{dt^k} e^{tX}] \\ = \mathbf{E}[X^k e^{tX}]$$

Then, we evaluate that at $t=0$: $$\frac{d^k M_X}{dt^k}(t=0) = \mathbf{E}[X^k e^0]$$

Example: $k$th moment of Exponential

For $X\sim Exponential(1)$ we know $M_X(t) = \frac{1}{1-t}$

$$M'_X(t) = \frac{1}{(1-t)^2}$$$$M''_X(t) = \frac{2}{(1-t)^3}$$$$M^{(k)}_X(t) = \frac{k!}{(1-t)^{k+1}}$$

Therefore, by that proposition:

$$\mathbf{E}[X^k] = M_X^{(k)}(0) = k!$$

Example: how about $Exponential(\lambda)$

$Y = \frac{X}{\lambda}$

$$\mathbf{E}[Y^k]= \mathbf{E}[\left(\frac{1}{\lambda}X\right)^k] = \frac{k!}{\lambda^k}$$

Example: $N \sim Poisson(\lambda)$

$N\sim Poisson(\lambda)$

$\mathbf{E}[N] = M'_N(0)$ and we know that $M_N(t) = e^{\displaystyle \lambda(e^t-1)}$

$M'_N(t) = \lambda e^t e^{\displaystyle \lambda(e^t-1)}$

$$\mathbf{E}[N] = \lambda e^0 e^{\displaystyle \lambda(e^0-1)} = \lambda$$

$\displaystyle M''_N(t) = \lambda e^t \lambda e^t e^{\displaystyle \lambda(e^t-1)} + \lambda e^t e^{\displaystyle \lambda(e^t-1)}$

$$\mathbf{E}[N^2] = \lambda^2 + \lambda$$

So variance of $N$ is:

$$\mathbf{Var}[N] = \lambda^2 + \lambda - \lambda^2 = \lambda$$

Proposition

$M_{aX+b}(t) =e^{tb}~M_X(ta)$
If $X\&Y$ are independent, then $\displaystyle M_{X+Y}(t) = M_X(t) M_Y(t)$

Indeed, $$\mathbf{E}[e^{t(X+Y)}] = \mathbf{E}[e^{tX} e^{tY}] = \mathbf{E}[e^{tX}] \mathbf{E}[e^{tY}]$$

This works for a sum of $n$ independent terms.

Theorem

$M_X$ characterizes the law of $X$; In other words, if $X\&Y$ are two random variables, then if $M_X(t) = M_Y(t)$ for all $t$ near $0$, then $X\&Y$ have the same distributions.

Even better, If $Y=X_1+X_2$ and $M_Y = M_X~M_Y$ then $X_1$ and $X_2$ are independent.

Example:

Let $X_1\sim Bernoulli(p)$, then $M_{X_1}(t) = 1-p + pe^t$.

Consequently, let $X\sim Binomial(n,p)$. We know that $X$ can be reprsented as $X=X_1+X_2+...+X_n$ where $X_i$'s are $i.i.d.$ Bernoulli nrandom variables with parameter $p$. Therefore, $M_X(t) = \prod_{i=1}^n M_{X_i}(t) = \left(1-p+pe^t\right)^n$

Note: Let's say we proved independently that $M_X(t)$ has the above formula. Then, the "even better" statement above proves that $X=\sum_i X_i$ with $iid$ Bernoulli random variable $X_i$'s.

Example: Normal

Let $Z\sim \mathcal{N}(0,1)$. Then, we know that density of $Z$ is $f_Z(z) = \displaystyle \frac{1}{\sqrt{2\pi}} e^{-z^2/2}$. Then, it's not too hard to show that $M_Z(t) = e^{t^2/2}$

$$Z\sim \mathcal{N}(0,1) \Longrightarrow M_Z(t) = \displaystyle e^{\frac{t^2}{2}} \ \forall t$$

By the scaling rule: If $X\sim \mathcal{N}(\mu,\sigma^2)$ then

$$X\sim \mathcal{N}(\mu,\sigma^2) \Longrightarrow M_X(t) = \displaystyle e^{\mu t + \frac{\sigma^2 t^2}{2}} \ \forall t$$

Convergence Types

Central Limit Theorem

Note: Let $X_i$ be $iid\ \ \sim \mathcal{N}(0,1)$. Let $S_n= \sum_{i=1}^n X_i$.

Let $Y_n = \frac{S_n}{\sqrt n}$. (we divided by the standard deviation of $S_n$)

We know that $S_n \sim \mathcal{N}(0,n)$ therefore, $Y_n \sim \mathcal{N}(0,1)$

The standradized partial sums of the $X_i$'s is normal, i.e. $\sim \mathcal{N}(0,1)$.

More generally, if the $X_i$'s are not normal, the nice conclusion remains true, but only approximately.

Theorem (Central Limit Theorem)

The most theorem in probability

Let $X_i, i=1,2,...n,...$ b $iid$ and assume $\sigma^2 = \mathbf{Var}[X_i]< \infty$. Notation: $\mu=\mathbf{E}[X_i]$

Then, let $S_n=X_1+X_2+...+X_n - n\mu$

Let $Z_n=\displaystyle \frac{S_n}{\sigma \sqrt{n}}$

So, we say "$Z_n$ os the standardized partial sum od the sequence of $X_i$"

Conclusion: the CDF of $Z_n$ converges to the CDF of $\mathcal{N}(0,1)$.

We say $Z_n$ converges to $\mathcal{N}(0,1)$. We can write:

$$\lim_{n\to \infty} \mathbf{P}[Z\le a] = \int_{-\infty}^a \frac{1}{\sqrt{2\pi}} e^{-z^2/2} dz$$

Some example properties of central limit theorem

Example:

Let $Y_n$ be $\sim \Gamma(n,\theta)$. Let $S_n=Y_n-n\theta$. However, we can represent $Y_n$ as

$$Y_n = X_1 + X_2 +... + X_n \text{ where }X_i\text{'s are iid }Exponential(\lambda=\frac{1}{\theta})$$

Therefore, $S_n$ is just like in the central limit theorem. Som by CLT, the CDF of $\frac{Y_n-n\theta}{\sigma \sqrt{n}}$ ( where $\sigma = \sqrt{\mathbf{Var}[X_i]} = \sqrt{(\frac{1}{1/\theta})^2} = \theta$) is approximately the standard normal CDF.

$$\lim_{n\to \infty } \mathbf{P}[\frac{Y_n - n\theta}{\sigma \sqrt{n}}\le a] = \int_{-\inty}^a \frac{1}{\sqrt{2\pi}} e^{-z^2/2}~dz = F_{\mathcal{N}(0,1)}(a)$$

ofcourse, in this case, $\sigma=\theta$ so you can replace in formula above.

Example:

Let $Y_n\sim Binomial(n,p)$

We know that $Y_n$ can be represneted as um of Bernoulli random variables: $Y_n = \sum_i X_i$ with $X_i\ iid \sim Bernouli(p)$. Also, $\mathbf{E}[X_i] = p$ and $\mathbf{Var}[X_i] = p(1-p)$

Letting $\sigma = \sqrt{p(1-p)}$, we get the CDF of $\frac{Y_n - np}{\sigma \sqrt{n}}$ is $\approx $ CDF of $\mathcal{N}(0,1)$

Example:

A pretty good rule of thumb for simulating an $\mathcal{N}(0,1)$ random variable.

Let $X_i$ ne $iid\ Uniform(0,1)$. So, we have $\mathbf{E}[X_i]=\frac{1}{2}$ and $\mathbf{Var}[X_i] = \frac{1}{12}$.

Let $S_{12} = X_1+X_2+...X_{12}$. Then, $\mathbf{E}[S_{12}] = 6$ and $\mathbf{E}[S_{12}] = 1$

Therefore, $\frac{S_{12}-6}{\sqrt{1}} = S_{12}-6$ is approximately $\mathcal{N}(0,1)$

A slightly more accurate version is $$\frac{S_{48}-24}{\sqrt{4}} \approx \mathcal{N}(0,1)$$

Major Question:

We want the CDF of $\frac{S_n - n\mu}{\sigma\sqrt{n}}$ to be within an error $\epsilon$ of CDF of $\mathcal{N}(0,1)$. How big does $n$ needs to be to achieve this?

Theorem: (Berry Esseen) If $\mathbf{E}[|X_i|^3] = c< \infty$ then the size of the error $\epsilon \le C_{univ} \frac{C}{\sigma^3 \sqrt{n}}$

$C_{univ}$ is a universal constant $\le \frac{1}{2}$.

To be safe, we should choose $n$ such that $C_{univ}\frac{C}{\sigma^3 \sqrt{n}} \le \text{out prefferred error tolerance, let's call it }\delta$.

So we need $\sqrt{N} \ge \frac{C.C_{univ}}{\sigma^3 \delta}$ and therefore, $n \ge \frac{C^2 C_{univ}^2}{\sigma^6 \delta^2}$

So, if $C\& \sigma^2$ are all in order of $1$ for example, we see that to get a tolerance of $\delta$ of about $\frac{1}{1000}$ we will need $10^6$ terms.



In [ ]: