Lecture 4: Conditional Probability

Stat 110, Prof. Joe Blitzstein, Harvard University


Definitions

We continue with some basic definitions of independence and disjointness:

Definition: independence & disjointness

Events A and B are independent if $P(A \cap B) = P(A)P(B)$. Knowing that event A occurs tells us nothing about event B.

In contrast, events A and B are disjoint if A occurring means that B cannot occur.

What about the case of events A, B and c?

Events A, B and C are independent if

\begin{align} P(A \cap B) &= P(A)P(B), ~~ P(A \cap C) = P(A)P(C), ~~ P(B \cap C) = P(B)P(C) \\ P(A \cap B \cap C) &= P(A)P(B)P(C) \end{align}

So you need both pair-wise independence and three-way independence.

Newton-Pepys Problem (1693)

Yet another famous example of probability that comes from a gambling question.

We have fair dice. Which of the following events is most likely?

  • $A$ ... at least one 6 with 6 dice
  • $B$ ... at least two 6's with 12 dice
  • $C$ ... at least three 6's with 18 dice

Let's solve for the probability of each event using independence.

\begin{align} P(A) &= 1 - P(A^c) ~~~~ &\text{since the complement of at least one 6 is no 6's at all} \\ &= 1 - \left(\frac{5}{6}\right)^6 &\text{the 6 dice are independent, so we just multiply them all} \\ &\approx 0.665 \\ \\ P(B) &= 1 - P(\text{no 6's}) - P(\text{one 6}) \\ &= 1 - \left(\frac{5}{6}\right)^{12} - 12 \left(\frac{1}{6}\right)\left(\frac{5}{6}\right)^{11} &\text{... does this look familiar?}\\ &\approx 0.619 \\ \\ P(C) &= 1 - P(\text{no 6's}) - P(\text{one 6}) - P(\text{two 6's}) \\ &= 1 - \sum_{k=0}^{2} \binom{18}{k} \left(\frac{1}{6}\right)^k \left(\frac{5}{6}\right)^{18-k} &\text{... it's Binomial probability!} \\ &\approx 0.597 \end{align}

Conditional Probability

Conditioning is the soul of probability.

How do you update your beliefs when presented with new information? That's the question here.

Consider 2 events $A$ and $B$. We defined conditional probability a $P(A|B)$, read the probability of A given B.

Suppose we just observed that $B$ occurred. Now if $A$ and $B$ are independent, then $P(A|B)$ is irrelevant. But if $A$ and $B$ are not independent, then the fact that $B$ happened is important information and we need to update our uncertainty about $A$ accordingly.

Definition: conditional probability

\begin{align} \text{conditional probability } P(A|B) &= \frac{P(A \cap B)}{P(B)} &\text{if }P(B)\gt0 \ \end{align}

Prof. Blitzstein gives examples of Pebble World and Frequentist World to help explain conditional probability, but I find that Legos make things simple.

Theorem 1

The intersection of events $A$ and $B$ can be given by

\begin{align} P(A \cap B) = P(B) P(A|B) = P(A) P(B|A) \end{align}

Note that if $A$ and $B$ are independent, then conditioning on $B$ means nothing (and vice-versa) so $P(A|B) = P(A)$, and $P(A \cap B) = P(A) P(B)$.

Theorem 2

\begin{align} P(A_1, A_2, ... A_n) = P(A_1)P(A_2|A_1)P(A_3|A_1,A_2)...P(A_n|A_1,A_2,...,A_{n-1}) \end{align}

Theorem 3: Bayes' Theorem

\begin{align} P(A|B) = \frac{P(B|A)P(A)}{P(B)} ~~~~ \text{this follows from Theorem 1} \end{align}

Appendix A: Bayes' Rule Expressed in Terms of Odds

The odds of an event with probability $p$ is $\frac{p}{1-p}$.

An event with probability $\frac{3}{4}$ can be described as having odds 3 to 1 in favor, or 1 to 3 against.

Let $H$ be the hypothesis, or the event we are interested in.

Let $D$ be the evidence (event) we gather in order to study $H$.

The prior probability $P(H)$ is that for which $H$ is true before we observe any new evidence $D$.

The posterior probability $P(H|D)$ is, of course, that which is after we observed new evidence.

The likelihood ratio is defined as $\frac{P(D|H)}{P(D^c|H^c)}$

Applying Bayes' Rule, we can see how the posterior odds, prior odds and likelihood odds are related:

\begin{align} P(H|D) &= \frac{P(D|H)P(H)}{P(D)} \\ \\ P(H^c|D) &= \frac{P(D|H^c)P(H^c)}{P(D)} \\ \\ \Rightarrow \underbrace{\frac{P(H|D)}{P(H^c|D)}}_{\text{posterior odds of H}} &= \underbrace{\frac{P(H)}{P(H^c)}}_{\text{prior odds of H}} \times \underbrace{\frac{P(D|H)}{P(D|H^c)}}_{\text{likelihood ratio}} \end{align}

Appendix B: Translating Odds into Probability

To go from odds back to probability

\begin{align} p = \frac{p/q}{1 + p/q} & &\text{ for } q = 1-p \end{align}