There are two basic types of random variables.
Discrete random variables are random variables whose values can be enumerated, such as $a_1, a_2, \dots , a_n$. Here, discrete does not necessarily mean only integers.
Continuous random variables, on the other hand, can take on values in $\mathbb{R}$.
Note that there are other types that might mix the two definitions.
Given $X \le x$ is an event, and a function $F(x) = P(X \le x)$.
Then $F$ is known as the cumulative distribution function (CDF) of $X$. In contrast with the probability mass function, the CDF is more general, and is applicable for both discrete and continuous random variables.
The probability mass function (PMF) is $P(X=a_j)$, for all $\text{j}$. A pmf must satisfy the following conditions:
\begin{align} P_j &\ge 0 \\ \sum_j P_j &= 1 \end{align}In practice, using a PMF is easier than a CDF, but a PMF is only applicable in the case of discrete random variables only.
$X \sim \operatorname{Bin}(n, p)$, where
Here are a few different ways to explain the Binomial distribution.
$X$ is the number of successes in $n$ independent $\operatorname{Bern}(p)$ trials.
A probability mass function sums to 1. Note that
\begin{align} \sum_{k=0}^n \binom{n}{k} p^k q^{n-k} &= (p + q)^n & &\text{by the Binomial Theorem} \\ &= (p + 1 - p)^n \\ &= 1^n \\ &= 1 \end{align}so the explanation by PMF holds. By the way, it is this relation to the Binomial Theorem that this distribution gets its name.
Recall that we left Lecture 7 with this parting thought:
\begin{align} & X \sim \operatorname{Bin}(n,p) \text{, } Y \sim \operatorname{Bin}(m,p) \rightarrow X+Y \sim \operatorname{Bin}(n+m, p) \end{align}Let's see how this is true by using the three explanation approaches we used earlier.
$X$ and $Y$ are functions, and since they both have the same sample space $S$ and the same domain.
$X$ is the number of successes in $n \operatorname{Bern}(p)$ trials.
$Y$ is the number of successes in $m \operatorname{Bern}(p)$ trials.
$\therefore X + Y$ is simply the sum of successes in $n+m \operatorname{Bern}(p)$ trials.
We start with a PMF of the convolution $X+Y=k$
\begin{align} P(X+Y=k) &= \sum_{j=0}^k P(X+Y=k|X=j)P(X=j) & &\text{wishful thinking, assume we know } x \text{ and apply LOTP} \\ &= \sum_{j=0}^k P(Y=k-j|X=j) \binom{n}{j} p^j q^{n-j} & &\text{... but since events }Y=k-j \text{ and } X=j \text{ are independent}\\ &= \sum_{j=0}^k P(Y=k-j) \binom{n}{j} p^j q^{n-j} & &P(Y=k-j|X=j) = P(Y=k-j) \\ &= \sum_{j=0}^k \binom{m}{k-j} p^{k-j} q^{m-k+j}~~~ \binom{n}{j} p^j q^{n-j} \\ &= p^k q^{m+n-k} \underbrace{\sum_{j=0}^k \binom{m}{k-j}\binom{n}{j}}_{\text{Vandermonde's Identity}} & &\text{see Lecture 2, Story Proofs, Ex.3} \\ &= \binom{m+n}{k} p^k q^{m+n-k} ~~~~ \blacksquare \end{align}A common mistake in trying to use the Binomial distribution is forgetting that
A 5-card hand out of a standard 52-card deck of playing cards, what is the the distribution (PMF or CDF) of the number of aces.
Let $X = (\text{# of aces})$.
Find $P(X=k)$ where $x \in \text{\{0,1,2,3,4\}}$. At this point, we can conclude that the distribution is not Binomial, as the trials are not independent.
The PMF is given by \begin{align} P(X=k) &= \frac{\binom{4}{k}\binom{48}{5-k}}{\binom{52}{5}} \end{align}
Suppose we have a total of $b$ black and $w$ white marbles. Pick a random sample of $n$ marbles. What is the distribution on the # white marbles?
\begin{align} P(X=k) &= \frac{\binom{w}{k}\binom{b}{n-k}}{\binom{w+b}{n}} \end{align}The distribution in the examples above is called the Hypergeometric distribution. In the hypergeometric distribution, we sample without replacement, and so the trials cannot be independent.
Note that when the total number of items in our sample is exceedingly large, it becomes highly unlikely that we would choose the same item more than once. Therefore, it doesn't matter if you are sampling with replacement or without, suggesting that in this case the hypergeometric behaves as the binomial.
Is the hypergeometric non-negative?
Yes, obviously.
Does the PMF sum to 1?
\begin{align} P(X=k) &= \sum_{k=0}^w \frac{\binom{w}{k}\binom{b}{n-k}}{\binom{w+b}{n}} \\ &= \frac{1}{\binom{w+b}{n}} \sum_{k=0}^w \binom{w}{k}\binom{b}{n-k} & &\text{since }\binom{w+b}{n} \text{ is constant} \\ &= \frac{1}{\binom{w+b}{n}} \binom{w+b}{n} & &\text{by Vandermonde's Identity } \\ &= 1 ~~~~ \blacksquare \end{align}
In [1]:
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import (MultipleLocator, FormatStrFormatter,
AutoMinorLocator)
from scipy.stats import hypergeom
%matplotlib inline
plt.xkcd()
_, ax = plt.subplots(figsize=(12,8))
# seme Hypergeometric parameters
# population and # white marbles are fixed
population_sizes = [200, 200, 200]
w_marbles = [30, 30, 30]
# let sample size n vary
sample_sizes = [30, 60, 90]
params = list(zip(population_sizes, w_marbles, sample_sizes))
# colorblind-safe, qualitative color scheme
colors = ['#1b9e77', '#d95f02', '#7570b3']
for i,(N,w,n) in enumerate(params):
rv = hypergeom(N, w, n)
x = np.arange(0, w+1)
pmf_w = rv.pmf(x)
ax.plot(x, pmf_w, 'o', ms=8, color=colors[i], label='n={}'.format(n))
ax.vlines(x, 0, pmf_w, lw=2, color=colors[i], alpha=0.3)
# legend styling
legend = ax.legend()
for label in legend.get_texts():
label.set_fontsize('large')
for label in legend.get_lines():
label.set_linewidth(1.5)
# y-axis
ax.set_ylim([0.0, 0.25])
ax.set_ylabel(r'$P(X=k)$')
# x-axis
ax.set_xlim([0, 25])
ax.set_xlabel('# of white marbles k out of {} in sample size n'.format(w))
# x-axis tick formatting
majorLocator = MultipleLocator(5)
majorFormatter = FormatStrFormatter('%d')
minorLocator = MultipleLocator(1)
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_major_formatter(majorFormatter)
ax.xaxis.set_minor_locator(minorLocator)
ax.grid(color='grey', linestyle='-', linewidth=0.3)
plt.suptitle(r'Hypergeometric PMF: $P(X=k) = \frac{\binom{w}{k}\binom{b}{n-k}}{\binom{w+b}{n}}$')
plt.title(r'({} white + {} black marbles)'.format(w, N-w))
plt.show()
A sequence of random variables are independent and identically distributed (i.i.d) if each r.v. has the same probability distribution and all are mutually independent.
Think of the Binomial distribution where we consider a string of coin flips. The probability $p$ for head's is the same across all coin flips, and seeing a head (or tail) in a previous toss does not affect any of the coin flips that come afterwards.
View Lecture 8: Random Variables and Their Distributions | Statistics 110 on YouTube.