Let $F$ be a continuous, strictly increasing CDF.
Then
\begin{align} &X = F^{-1}(U) \sim F & &\text{if } U \sim \operatorname{Unif}(0,1) & &\text{(1)}\\ \\ &F(X) \sim \operatorname{Unif}(0,1) & &\text{if } X \sim F & &\text{(2)} \end{align}$F$ is just a function, and hence we can treat it like an operator.
And like $X$, $F(X)$ is just another r.v.
How to interpret this universality?
\begin{align} &F(x) = P(X \le x) & &\text{does not imply that } \\ \\ \require{cancel} &\bcancel{F(X) = P(X \le X) = 1} \\ \\ \\ &F(x) = 1 - e^{-x} & &\text{where } x \gt 0 \\ &F(X) = 1 - e^{-X} \end{align}First evaluate $F$ at $x$, and then replace $x$ for $X$.
These 2 approaches to universality of the Uniform are quite useful, like when you want to simulate draws from another r.v.
Let $F(x) = 1 - e^{-x}$, where $x \gt 0$. This is the exponential random distribution $\operatorname{Expo}(1)$.
Say we have $U \sim \operatorname{Unif}(0,1)$.
How can we leverage universality of $\operatorname{Unif}(0,1)$ to simulate $X \sim F$?
\begin{align} u &= 1 - e^{-x} & &\text{first we obtain }F^{-1}(x) \\ e^{-x} &= 1 - u \\ -x &= ln(1 - u) \\ x &= - ln(1-u) \\ \\ \Rightarrow F^{-1}(u) &= -ln(1-u) \\ \Rightarrow F^{-1}(U) &= -ln(1-U) \sim F & &\text{by universality} \end{align}And so we could simulate 10 i.i.d. draws from $F(x) = 1 - e^{-x}$ by calculating $-ln(1-U)$ 10 times.
Note that
\begin{align} 1 - U \sim \operatorname{Unif}(0,1) & &\text{symmetry of Uniform} \\ \\ a + b U & &\text{linear transformations are also } \operatorname{Unif} \end{align}Non-linear transformations will, in general, not be $\operatorname{Unif}$.
Say we have r.v. $X_1, X_2, \dots , X_n$.
$X_1, X_2, \dots , X_n$ are independent if, for all $x_1, x_2, \dots, x_n$
\begin{align} P(X_1 \le x_1, X_2 \le x_2, \dots , X_n \le x_n) &= P(X_1 \le x_1) P(X_2 \le x_2) \cdots P(X_n \le x_n) \end{align}
Note that in this general case, $P(X_1 \le x_1, X_2 \le x_2, \dots , X_n \le x_n)$ is called the joint CDF.
Discrete r.v. $X_1, X_2, \dots , X_n$ are independent if, for all $x_1, x_2, \dots, x_n$
\begin{align} P(X_1 = x_1, X_2 = x_2, \dots , X_n = x_n) &= P(X_1 = x_1) P(X_2 = x_2) \cdots P(X_n = x_n) \end{align}
In the discrete case, $P(X_1 = x_1, X_2 = x_2, \dots , X_n = x_n)$ is called the joint PMF.
Simply put, in both the general and discrete cases, knowing any subset of the r.v. gives us no information about the rest.
This is stronger than pair-wise independence.
Consider a penny-matching game where 2 players flip a penny. If the faces showing match (HH or TT), then one of the players wins; else the other player wins.
Let
\begin{align} X_1, X_2 &\sim \operatorname{Bern}\left(\frac{1}{2}\right) ~~ \text{, i.i.d} \\ \\ X_3 &= \begin{cases} 1 & \quad \text{if } X_1 = X_2 \\ 0 & \quad \text{otherwise}\\ \end{cases} \end{align}$X_1, X_2, X_3$ are pair-wise independent, but not independent.
The most important distribution in all statistics, mostly due to the Central Limit Theorem.
The Central Limit Theorem is surprising; if you sum up a large number of i.i.d. random variables will always look like a bell-shaped curve. This is irrespective of continuous or discrete; beautiful or ugly.
$X \sim \mathcal{N}(\mu, \operatorname{Var})$
In [1]:
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import (MultipleLocator, FormatStrFormatter,
AutoMinorLocator)
from scipy.stats import norm
%matplotlib inline
plt.xkcd()
_, ax = plt.subplots(figsize=(12,8))
# seme Normal parameters
mu_values = [0., 0., 0., -2.]
var_values = [0.4, 1.0, 2.0, 0.5]
params = list(zip(mu_values, var_values))
# qualitative color scheme
colors = ['#66c2a5', '#fc8d62', '#8da0cb', '#e78ac3']
x = np.linspace(-6, 6, 500)
for i,(loc,scale) in enumerate(params):
pdf = norm.pdf(x, loc=loc, scale=scale)
ax.plot(x, pdf, color=colors[i], lw=3.2, label=r'$\mu = {}, \sigma^2 = {}$'.format(loc, scale))
# legend styling
legend = ax.legend()
for label in legend.get_texts():
label.set_fontsize('large')
for label in legend.get_lines():
label.set_linewidth(1.5)
# y-axis
ax.set_ylim([-0.01, 1.0])
ax.set_ylabel(r'$f(X)$')
ax.set_yticks(np.arange(0,1.1,.1))
# x-axis
ax.set_xlim([-5.0, 5.0])
ax.set_xlabel(r'$X$')
# x-axis tick formatting
majorLocator = MultipleLocator(1)
majorFormatter = FormatStrFormatter('%d')
minorLocator = MultipleLocator(1)
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_major_formatter(majorFormatter)
ax.xaxis.set_minor_locator(minorLocator)
ax.grid(color='grey', linestyle='-', linewidth=0.3)
plt.suptitle(r'Normal PDF: $f(x) = \frac{1}{\sqrt{2\pi}} e^{-\frac{z^{2}}{2}}$')
plt.show()
For the Standard Normal $\mathcal{N}(0,1)$
\begin{align} f(z) &= c e^{-\frac{z^{2}}{2}} \end{align}where $c$ is the normalizing constant that will let $f(x)$ integrate to 1.
But what is $c$?
If we can figure out what the integral of the PDF is, then the inverse of that will be the constant $c$ that ensures $f(z) = c e^{-\frac{z^{2}}{2}} = 1$.
Consider the indefinite integral of the PDF of $\mathcal{N}$.
\begin{align} \int_{-\infty}^{\infty} e^{-\frac{z^2}{2}} ~~ dz \end{align}There is a theorem that states that this integral, as an indefinite integral, cannot be solved in closed form. But this can be solved...
First, we start by multiplying the integral by itself...
\begin{align} \int_{-\infty}^{\infty} e^{-\frac{z^2}{2}} ~~ dz \int_{-\infty}^{\infty} e^{-\frac{z^2}{2}} ~~ dz &= \int_{-\infty}^{\infty} e^{-\frac{x^2}{2}} ~~ dx \int_{-\infty}^{\infty} e^{-\frac{y^2}{2}} ~~ dy \\ &= \iint_{-\infty}^{\infty} e^{-\frac{(x+y)^2}{2}} ~~ dxdy \\ &= \int_{0}^{2\pi} \int_{0}^{\infty} e^{-\frac{r^2}{2}}~ \underbrace{r}_{\text{jacobian}} ~ drd\theta & \quad \text{switch to polar} \\ &= \int_{0}^{2\pi} \left( \int_{0}^{\infty} e^{-u} du \right) ~ d\theta & \quad \text{let } u = \frac{r^2}{2} \text{ , } du = r \\ &= \int_{0}^{2\pi} \left( -e^{-\infty} + e^{-0} \right) ~ d\theta \\ &= \int_{0}^{2\pi} 1 ~ d\theta \\ &= 2 \pi - 0 \\ &= 2\pi \\ \\ \int_{-\infty}^{\infty} e^{-\frac{z^2}{2}} ~~ dz &= \sqrt{2\pi} \\ \\ \therefore c &= \boxed{\frac{1}{\sqrt{2\pi}} } \end{align}For the standard normal distribution, $\mathbb{E}(Z) = 0$. Let's prove that...
\begin{align} \mathbb{E}(Z) &= \int_{-\infty}^{\infty} z ~ \frac{e^{-\frac{z^2}{2}}}{\sqrt{2\pi}} ~ dz \\ &= \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{\infty} z ~ e^{-\frac{z^2}{2}} ~ dz \\ &= \frac{1}{\sqrt{2\pi}} (0) & \quad \text{since this is an odd function} \\ &= 0 & \quad \blacksquare \end{align}
In [2]:
x = np.linspace(-5, 5, 500)
y = x * (np.e ** ((-x**2)/2))
_, ax = plt.subplots(figsize=(12,8))
ax.plot(x, y, color='black')
ax.fill_between(x, y, where=y >= 0, facecolor='blue', interpolate=True)
ax.fill_between(x, y, where=y < 0, facecolor='red', interpolate=True)
ax.axhline(y=0, color='black')
ax.axvline(x=0, color='black')
# y-axis
ax.set_ylim([-0.7, 0.7])
ax.set_ylabel(r'$g(x)$')
ax.set_yticks(np.arange(-.7,.8,.1))
# x-axis tick formatting
majorLocator = MultipleLocator(1)
majorFormatter = FormatStrFormatter('%d')
minorLocator = MultipleLocator(1)
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_major_formatter(majorFormatter)
ax.xaxis.set_minor_locator(minorLocator)
plt.title("$g(x)$ is an odd function")
plt.show()
Recall that if $g(x)$ is an odd function, then $g(x) + g(-x) = 0$.
Integrating an odd function $g(x)$ from $-a$ to $a$ is 0, since the portion above the x-axis cancels out the portion below the x-axis.
For the standard normal distribution, $\operatorname{Var}(Z) = 1$. Let's prove this as well...
\begin{align} \operatorname{Var}(Z) &= \mathbb{E}Z^2 - \mathbb{E}(Z)^2 \\ &= \int_{-\infty}^{\infty} z^2 ~ \frac{e^{-\frac{z^2}{2}}}{\sqrt{2\pi}} ~ dz - (0)^2 & \quad \text{by LOTUS; and by calculations above} \\= &= 2 \int_{0}^{\infty} z^2 ~ \frac{e^{-\frac{z^2}{2}}}{\sqrt{2\pi}} ~ dz & \quad \text{this is an even function} \\ &= \frac{2}{\sqrt{2\pi}} \int_{0}^{\infty} z^2 ~ e^{-\frac{z^2}{2}} ~ dz & \quad \text{set us up for integration by parts} \\ &= \frac{2}{\sqrt{2\pi}} \int_{0}^{\infty} z ~ z ~ e^{-\frac{z^2}{2}} ~ dz \\ &= \frac{2}{\sqrt{2\pi}} \int_{0}^{\infty} u ~ du ~ dv & \quad \text{let } u = z \text{, } \quad dv = z ~ e^{-\frac{z^2}{2}} \\ & & \quad du = dz \text{, } \quad v = -e^{-\frac{z^2}{2}} \\ &= \frac{2}{\sqrt{2\pi}} \left( \left.(uv)\right\vert_{0}^{\infty} - \int_{0}^{\infty} v du \right) \\ &= \frac{2}{\sqrt{2\pi}} \left( 0 + \int_{0}^{\infty} e^{-\frac{z^2}{2}} ~ dz \right) \\ &= \frac{2}{\sqrt{2\pi}} \left( \frac{\sqrt{2\pi}}{2} \right) & \quad \text{from calculation of normalization constant }c \\ &= 1 & \quad \blacksquare \end{align}
In [3]:
x = np.linspace(-5, 5, 500)
y = (x**2) * (np.e ** ((-x**2)/2))
_, ax = plt.subplots(figsize=(12,8))
ax.plot(x, y, color='black')
ax.fill_between(x, y, where=y >= 0, facecolor='blue', interpolate=True)
ax.axhline(y=0, color='black')
ax.axvline(x=0, color='black')
# y-axis
ax.set_ylim([-0.01, .8])
ax.set_ylabel(r'$g(x)$')
ax.set_yticks(np.arange(0,.9,.1))
# x-axis tick formatting
majorLocator = MultipleLocator(1)
majorFormatter = FormatStrFormatter('%d')
minorLocator = MultipleLocator(1)
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_major_formatter(majorFormatter)
ax.xaxis.set_minor_locator(minorLocator)
plt.title("$g(x)$ is an even function")
plt.show()
This standard normal distribution $\mathcal{N}(0,1)$ is so important that it has its own name and notation.
$\mathcal{Z} \sim \mathcal{N}(0,1)$, where mean $\mu = 0$ and variance $\sigma^2 = 1$
The standard normal distribution's CDF is notated by $\mathbb{\Phi}(Z)$.
The cumulative distribution function of $\Phi(Z)$ is
\begin{align} \mathbb{\Phi}(Z) = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^{z} e^{-\frac{t^2}{2}} ~~ dt \end{align}View Lecture 13: Normal distribution | Statistics 110 on YouTube.