A CDF: $F(x) = P(X \le x)$, as a function of real $x$ has to be
In the following discrete case, it is easy to see how the probability mass function (PMF) relates to the CDF:
Therefore, you can compute any probability given a CDF.
Ex. Find $P(1 \lt x \le 3)$ using CDF $F$.
\begin{align} & &P(x \le 1) + P(1 \lt x \le 3) &= P(x \le 3) \\ & &\Rightarrow P(1 \lt x \le 3) &= F(3) - F(1) \end{align}Note that while we don't need to be so strict in the continuous case, for the discrete case you need to be careful about the $\lt$ and $\le$.
A function $F$ is a CDF iff the following three conditions are satisfied.
$X, Y$ are independent r.v. if
\begin{align} \underbrace{P(X \le x, Y \le y)}_{\text{joint CDF}} &= P(X \le x) P(Y \le y) & &\text{ for all x, y in the continuous case} \\ \\ \underbrace{P(X=x, Y=y)}_{\text{joint PMF}} &= P(X=x) P(Y=y) & &\text{ for all x, y in the discrete case} \end{align}A mean is... well, the average of a sequence of values.
\begin{align} 1, 2, 3, 4, 5, 6 \rightarrow \frac{1+2+3+4+5+6}{6} = 3.5 \end{align}In the case where there is repetition in the sequence
\begin{align} 1,1,1,1,1,3,3,5 \rightarrow & \frac{1+1+1+1+1+3+3+5}{8} \\ \\ & \dots \text{ or } \dots \\ \\ & \frac{5}{8} ~~ 1 + \frac{2}{8} ~~ 3 + \frac{1}{8} ~~ 5 & &\quad \text{ ... weighted average} \end{align}where the weights are the frequency (fraction) of the unique elements in the sequence, and these weights add up to 1.
Notice how this lets us relate (bridge) the expected value $\mathbb{E}(X)$ with a probability $P(A)$.
There is a hard way to do this, and an easy way.
First the hard way:
\begin{align} \mathbb{E}(X) &= \sum_{k=0}^{n} k \binom{n}{k} p^k (1-p)^{n-k} \\ &= \sum_{k=0}^{n} n \binom{n-1}{k-1} p^k (1-p)^{n-k} & &\text{from Lecture 2, Story proofs, ex. 2, choosing a team and president} \\ &= np \sum_{k=0}^{n} n \binom{n-1}{k-1} p^{k-1} (1-p)^{n-k} \\ &= np \sum_{j=0}^{n-1} \binom{n-1}{j} p^j(1-p)^{n-1-j} & &\text{letting } j=k-1 \text{, which sets us up to use the Binomial Theorem} \\ &= np \end{align}Now, what about the easy way?
Linearity is this:
\begin{align} \mathbb{E}(X+Y) &= \mathbb{E}(X) + \mathbb{E}(Y) & &\quad \text{even if X and Y are dependent}\\ \\ \mathbb{E}(cX) &= c \mathbb{E}(X)\\ \end{align}Let $X \sim \operatorname{Bin}(n,p)$. The easy way to calculate the expected value of a binomial r.v. follows.
Let $X = X_1 + X_2 + \dots + X_n$ where $X_j \sim \operatorname{Bern}(P)$.
\begin{align} \mathbb{E}(X) &= \mathbb{E}(X_1 + X_2 + \dots + X_n) \\ \mathbb{E}(X) &= \mathbb{E}(X_1) + \mathbb{E}(X_2) + \dots + \mathbb{E}(X_n) & &\quad \text{by Linearity}\\ \mathbb{E}(X) &= n \mathbb{E}(X_1) & &\quad \text{by symmetry}\\ \mathbb{E}(X) &= np \end{align}Ex. 5-card hand $X=(\# aces)$. Let $X_j$ be the indicator that the $j^{th}$ card is an ace.
\begin{align} \mathbb{E}(X) &= \mathbb{E}(X_1 + X_2 + X_3 + X_4 + X_5) \\ &= \mathbb{E}(X_1) + \mathbb{E}(X_2) + \mathbb{E}(X_3) + \mathbb{E}(X_4) + \mathbb{E}(X_5) & &\quad \text{by Linearity} \\ &= 5 ~~ \mathbb{E}(X_1) & &\quad \text{by symmetry} \\ &= 5 ~~ P(1^{st} \text{ card is ace}) & &\quad \text{by the Fundamental Bridge}\\ &= \boxed{\frac{5}{13}} \end{align}Note that when we use linearity in this case, the individual probabilities are weakly dependent, in that the probability of getting an ace decreases slightly; and that if you already have four aces, then the fifth card cannot possibly be an ace. But using linearity, we can nevertheless quickly and easily compute $\mathbb{E}(X_1 + X_2 + X_3 + X_4 + X_5)$.
In [1]:
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import (MultipleLocator, FormatStrFormatter,
AutoMinorLocator)
from scipy.stats import geom
%matplotlib inline
plt.xkcd()
_, ax = plt.subplots(figsize=(12,8))
# seme Geometric parameters
p_values = [0.2, 0.5, 0.75]
# colorblind-safe, qualitative color scheme
colors = ['#1b9e77', '#d95f02', '#7570b3']
for i,p in enumerate(p_values):
x = np.arange(geom.ppf(0.01, p), geom.ppf(0.99, p))
pmf = geom.pmf(x, p)
ax.plot(x, pmf, 'o', color=colors[i], ms=8, label='p={}'.format(p))
ax.vlines(x, 0, pmf, lw=2, color=colors[i], alpha=0.3)
# legend styling
legend = ax.legend()
for label in legend.get_texts():
label.set_fontsize('large')
for label in legend.get_lines():
label.set_linewidth(1.5)
# y-axis
ax.set_ylim([0.0, 0.9])
ax.set_ylabel(r'$P(X=k)$')
# x-axis
ax.set_xlim([0, 20])
ax.set_xlabel('# of failures k before first success')
# x-axis tick formatting
majorLocator = MultipleLocator(5)
majorFormatter = FormatStrFormatter('%d')
minorLocator = MultipleLocator(1)
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_major_formatter(majorFormatter)
ax.xaxis.set_minor_locator(minorLocator)
ax.grid(color='grey', linestyle='-', linewidth=0.3)
plt.suptitle(r'Geometric PMF: $P(X=k) = pq^k$')
plt.show()
Consider the event $A$ where there are 5 failures before the first success. We could notate this event $A$ as $\text{FFFFFS}$, where $F$ denotes failure and $S$ denotes the first success. Note that this string must end with a success. So, $P(A) = q^5p$.
And from just this, we can derive the PMF for a geometric r.v.
\begin{align} P(X=k) &= pq^k \text{, } k \in \{1,2, \dots \} \\ \\ \sum_{k=0}^{\infty} p q^k &= p \sum_{k=0}^{\infty} q^k \\ &= p ~~ \frac{1}{1-q} & &\quad \text{by the geometric series where } |r| < 1 \\ &= \frac{p}{p} \\ &= 1 & &\quad \therefore \text{ this is a valid PMF} \end{align}So, the hard way to calculate the expected value $\mathbb{E}(X)$ of a $\operatorname{Geom}(p)$ is
\begin{align} \mathbb{E}(X) &= \sum_{k=0}^{\infty} k p q^k \\ &= p \sum_{k=0}^{\infty} k q^k \\ \\ \\ \text{ now ... } \sum_{k=0}^{\infty} q^k &= \frac{1}{1-q} & &\quad \text{by the geometric series where |q| < 1} \\ \sum_{k=0}^{\infty} k q^{k-1} &= \frac{1}{(1-q)^2} & &\quad \text{by differentiating with respect to k} \\ \sum_{k=0}^{\infty} k q^{k} &= \frac{q}{(1-q)^2} \\ &= \frac{q}{p^2} \\ \\ \\ \text{ and returning, we have ... } \mathbb{E}(X) &= p ~~ \frac{q}{(p^2} \\ &= \frac{q}{p} & &\quad \blacksquare \end{align}And here is the story proof, without using the geometric series and derivatives:
Again, we are considering a series of independent Bernoulli trials with probability of success $p$, and we are counting the number of failures before getting the first success.
Similar to doing a first step analysis in the case of the Gambler's Ruin, we look at the first case where we either:
Remember that in the case of a coin flip, the coin has no memory.
Let $c=\mathbb{E}(X)$.
\begin{align} c &= 0 ~~ p + (1 + c) ~~ q \\ &= q + qc \\ \\ c - cq &= q \\ c (1 - q) &= q \\ c &= \frac{q}{1-q} \\ &= \frac{q}{p} & &\quad \blacksquare \end{align}