Lecture 9: Expectation, Indicator Random Variables, Linearity

Stat 110, Prof. Joe Blitzstein, Harvard University

Independence of Random Variables

$X, Y$ are independent r.v. if

\begin{align} \underbrace{P(X \le x, Y \le y)}_{\text{joint CDF}} &= P(X \le x) P(Y \le y) & &\text{ for all x, y in the continuous case} \\ \\ \underbrace{P(X=x, Y=y)}_{\text{joint PMF}} &= P(X=x) P(Y=y) & &\text{ for all x, y in the discrete case} \end{align}

Averages of Random Variables (mean, Expected Value)

A mean is... well, the average of a sequence of values.

\begin{align} 1, 2, 3, 4, 5, 6 \rightarrow \frac{1+2+3+4+5+6}{6} = 3.5 \end{align}

In the case where there is repetition in the sequence

\begin{align} 1,1,1,1,1,3,3,5 \rightarrow & \frac{1+1+1+1+1+3+3+5}{8} \\ \\ & \dots \text{ or } \dots \\ \\ & \frac{5}{8} ~~ 1 + \frac{2}{8} ~~ 3 + \frac{1}{8} ~~ 5 & &\quad \text{ ... weighted average} \end{align}

where the weights are the frequency (fraction) of the unique elements in the sequence, and these weights add up to 1.

Expected value of a discrete r.v. $X$

\begin{align} \mathbb{E}(X) = \sum_{x} \underbrace{x}_{\text{value}} ~~ \underbrace{P(X=x)}_{\text{PMF}} ~& &\quad \text{ ... summed over x with } P(X=x) \gt 0 \end{align}

Expected value of $X \sim \operatorname{Bern}(p)$

\begin{align} \text{Let } X &\sim \operatorname{Bern}(p) \\ \mathbb{E}(X) &= \sum_{k=0}^{1} k P(X=k) \\ &= 1 ~~ P(X=1) + 0 ~~ P(X=0) \\ &= p \end{align}

Expected value of an Indicator Variable

\begin{align} X &= \begin{cases} 1, &\text{ if A occurs} \\ 0, &\text{ otherwise } \end{cases} \\ \\ \therefore \mathbb{E}(X) &= P(A) \end{align}

Notice how this lets us relate (bridge) the expected value $\mathbb{E}(X)$ with a probability $P(A)$.

Average of $X \sim \operatorname{Bin}(n,p)$

There is a hard way to do this, and an easy way.

First the hard way:

\begin{align} \mathbb{E}(X) &= \sum_{k=0}^{n} k \binom{n}{k} p^k (1-p)^{n-k} \\ &= \sum_{k=0}^{n} n \binom{n-1}{k-1} p^k (1-p)^{n-k} & &\text{from Lecture 2, Story proofs, ex. 2, choosing a team and president} \\ &= np \sum_{k=0}^{n} n \binom{n-1}{k-1} p^{k-1} (1-p)^{n-k} \\ &= np \sum_{j=0}^{n-1} \binom{n-1}{j} p^j(1-p)^{n-1-j} & &\text{letting } j=k-1 \text{, which sets us up to use the Binomial Theorem} \\ &= np \end{align}

Now, what about the easy way?

Linearity of Expected Values

Linearity is this:

\begin{align} \mathbb{E}(X+Y) &= \mathbb{E}(X) + \mathbb{E}(Y) & &\quad \text{even if X and Y are dependent}\\ \\ \mathbb{E}(cX) &= c \mathbb{E}(X)\\ \end{align}

Expected value of Binomial r.v using Linearity

Let $X \sim \operatorname{Bin}(n,p)$. The easy way to calculate the expected value of a binomial r.v. follows.

Let $X = X_1 + X_2 + \dots + X_n$ where $X_j \sim \operatorname{Bern}(P)$.

\begin{align} \mathbb{E}(X) &= \mathbb{E}(X_1 + X_2 + \dots + X_n) \\ \mathbb{E}(X) &= \mathbb{E}(X_1) + \mathbb{E}(X_2) + \dots + \mathbb{E}(X_n) & &\quad \text{by Linearity}\\ \mathbb{E}(X) &= n \mathbb{E}(X_1) & &\quad \text{by symmetry}\\ \mathbb{E}(X) &= np \end{align}

Expected value of Hypergeometric r.v.

Ex. 5-card hand $X=(\# aces)$. Let $X_j$ be the indicator that the $j^{th}$ card is an ace.

\begin{align} \mathbb{E}(X) &= \mathbb{E}(X_1 + X_2 + X_3 + X_4 + X_5) \\ &= \mathbb{E}(X_1) + \mathbb{E}(X_2) + \mathbb{E}(X_3) + \mathbb{E}(X_4) + \mathbb{E}(X_5) & &\quad \text{by Linearity} \\ &= 5 ~~ \mathbb{E}(X_1) & &\quad \text{by symmetry} \\ &= 5 ~~ P(1^{st} \text{ card is ace}) & &\quad \text{by the Fundamental Bridge}\\ &= \boxed{\frac{5}{13}} \end{align}

Note that when we use linearity in this case, the individual probabilities are weakly dependent, in that the probability of getting an ace decreases slightly; and that if you already have four aces, then the fifth card cannot possibly be an ace. But using linearity, we can nevertheless quickly and easily compute $\mathbb{E}(X_1 + X_2 + X_3 + X_4 + X_5)$.

Geometric Distribution

Description

The Geometric distribution comprises a series of independent $\operatorname{Bern}(p)$ trials where we count the number of failures before the first success.

Notation

$X \sim \operatorname{Geom}(p)$.

Parameters

$0 < p < 1 \text{, } p \in \mathbb{R}$



In [1]:

    
import matplotlib
import numpy as np
import matplotlib.pyplot as plt

from matplotlib.ticker import (MultipleLocator, FormatStrFormatter,
                               AutoMinorLocator)
from scipy.stats import geom

%matplotlib inline

plt.xkcd()
_, ax = plt.subplots(figsize=(12,8))

# seme Geometric parameters
p_values = [0.2, 0.5, 0.75]

# colorblind-safe, qualitative color scheme
colors = ['#1b9e77', '#d95f02', '#7570b3']

for i,p in enumerate(p_values):
    x = np.arange(geom.ppf(0.01, p), geom.ppf(0.99, p))
    pmf = geom.pmf(x, p)
    ax.plot(x, pmf, 'o', color=colors[i], ms=8, label='p={}'.format(p))
    ax.vlines(x, 0, pmf, lw=2, color=colors[i], alpha=0.3)

# legend styling
legend = ax.legend()
for label in legend.get_texts():
    label.set_fontsize('large')
for label in legend.get_lines():
    label.set_linewidth(1.5)

# y-axis
ax.set_ylim([0.0, 0.9])
ax.set_ylabel(r'$P(X=k)$')

# x-axis
ax.set_xlim([0, 20])
ax.set_xlabel('# of failures k before first success')

# x-axis tick formatting
majorLocator = MultipleLocator(5)
majorFormatter = FormatStrFormatter('%d')
minorLocator = MultipleLocator(1)
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_major_formatter(majorFormatter)
ax.xaxis.set_minor_locator(minorLocator)

ax.grid(color='grey', linestyle='-', linewidth=0.3)

plt.suptitle(r'Geometric PMF: $P(X=k) = pq^k$')

plt.show()

Probability mass function

Consider the event $A$ where there are 5 failures before the first success. We could notate this event $A$ as $\text{FFFFFS}$, where $F$ denotes failure and $S$ denotes the first success. Note that this string must end with a success. So, $P(A) = q^5p$.

And from just this, we can derive the PMF for a geometric r.v.

\begin{align} P(X=k) &= pq^k \text{, } k \in \{1,2, \dots \} \\ \\ \sum_{k=0}^{\infty} p q^k &= p \sum_{k=0}^{\infty} q^k \\ &= p ~~ \frac{1}{1-q} & &\quad \text{by the geometric series where } |r| < 1 \\ &= \frac{p}{p} \\ &= 1 & &\quad \therefore \text{ this is a valid PMF} \end{align}

Expected value

So, the hard way to calculate the expected value $\mathbb{E}(X)$ of a $\operatorname{Geom}(p)$ is

\begin{align} \mathbb{E}(X) &= \sum_{k=0}^{\infty} k p q^k \\ &= p \sum_{k=0}^{\infty} k q^k \\ \\ \\ \text{ now ... } \sum_{k=0}^{\infty} q^k &= \frac{1}{1-q} & &\quad \text{by the geometric series where |q| < 1} \\ \sum_{k=0}^{\infty} k q^{k-1} &= \frac{1}{(1-q)^2} & &\quad \text{by differentiating with respect to k} \\ \sum_{k=0}^{\infty} k q^{k} &= \frac{q}{(1-q)^2} \\ &= \frac{q}{p^2} \\ \\ \\ \text{ and returning, we have ... } \mathbb{E}(X) &= p ~~ \frac{q}{(p^2} \\ &= \frac{q}{p} & &\quad \blacksquare \end{align}

And here is the story proof, without using the geometric series and derivatives:

Again, we are considering a series of independent Bernoulli trials with probability of success $p$, and we are counting the number of failures before getting the first success.

Similar to doing a first step analysis in the case of the Gambler's Ruin, we look at the first case where we either:

get a heads (success) on the very first try, meaning 0 failures
or we get 1 failure, but we start the process all over again

Remember that in the case of a coin flip, the coin has no memory.

Let $c=\mathbb{E}(X)$.

\begin{align} c &= 0 ~~ p + (1 + c) ~~ q \\ &= q + qc \\ \\ c - cq &= q \\ c (1 - q) &= q \\ c &= \frac{q}{1-q} \\ &= \frac{q}{p} & &\quad \blacksquare \end{align}

View Lecture 9: Expectation, Indicator Random Variables, Linearity | Statistics 110 on YouTube.