Lecture 6: Monty Hall, Simpson's Paradox

Stat 110, Prof. Joe Blitzstein, Harvard University

The Monty Hall Problem

In case you did not grow up watching way too much daytime television in America during the 70's and early 80's, here is Monty Hall on YouTube talking about the background of this math problem involving his popular game show, Let's Make A Deal.

  • There are three doors.
  • A car is behind one of the doors.
  • The other two doors have goats behind them.
  • You choose a door, but before you see what's behind your choice, Monty opens one of the other doors to reveal a goat.
  • Monty offers you the chance to switch doors.

Should you switch?

Defining the problem

Let $S$ be the event of winning when you switch.

Let $D_j$ be the event of the car being behind door $j$.

Solving with a probability tree

With a probability tree, it is easy to represent the case where you condition on Monty opening door 2. Given that you initially choose door 1, you can quickly see that if you stick with door 1, you have a $\frac{1}{3}~$ chance of winning.

You have a $\frac{2}{3}~$ chance of winning if you switch.

Solving with the Law of Total Probability

This is even easier to solve using the Law of Total Probability.

\begin{align} P(S) &= P(S|D_1)P(D_1) + P(S|D_2)P(D_2) + P(S|D_3)P(D_3) \\ &= 0 \frac{1}{3} + 1 \frac{1}{3} + 1 \frac{1}{3} \\ &= \frac{2}{3} \end{align}

A more general solution

Let $n = 7$ be the number of doors in the game.

Let $m=3$ be the number of doors with goats that Monty opens after you select your initial door choice.

Let $S$ be the event where you win by sticking with your original door choice of door 1.

Let $C_j$ be the event that the car is actually behind door $j$.

Conditioning only on which door has the car, we have \begin{align} & &P(S) &= P(S|C_1)P(C_1) + \dots + P(S|C_n)P(C_n) & &\text{Law of Total Probability} \\ & & &= P(C_1) \\ & & &= \frac{1}{7} \\ \end{align}

Let $M_{i,j,k}$ be the event that Monty opens doors $i,j,k$. Conditioning on Monty opening up doors $i,j,k$, we have

\begin{align} & &P(S) &= \sum_{i,j,k} P(S|M_{i,j,k})P(M_{i,j,k}) & &\text{summed over all i, j, k with } 2 \le i \lt j \lt k \le 7 \\ \\ & &\Rightarrow P(S|M_{i,j,k}) &= P(S) & &\text{by symmetry} \\ & & &=\frac{1}{7} \end{align}

Note that we can now generalize this to the case where:

  • there are $n \ge 3$ doors
  • after you choose a door, Monty opens $m$ of the remaining doors $n-1$ doors to reveal a goat (with $1 \le m \le n-m-2$)

The probability of winning with the strategy of sticking to your initial choice is $\frac{1}{n}$, whether unconditional or conditioning on the doors Monty opens.

After Monty opens $m$ doors, each of the remaining $n-m-1$ doors has conditional probability of $\left(\frac{n-1}{n-m-1}\right) \left(\frac{1}{n}\right)$.

Since $\frac{1}{n} \lt \left(\frac{n-1}{n-m-1}\right) \left(\frac{1}{n}\right)$, you will always have a greater chance of winning if you switch.

Simpson's Paradox

Is it possible for a certain set of events to be more (or less) probable than another without conditioning, and then be less (or more) probable with conditioning?

Assume that we have the above rates of success/failure for Drs. Hibbert and Nick for two types of surgery: heart surgery and band-aid removal.

Defining the problem

Let $A$ be the event of a successful operation.

Let $B$ be the event of treatment by Dr. Nick.

Let $C$ be the event of heart surgery.

\begin{align} P(A|B,C) &< P(A|B^c,C) & &\text{Dr. Nick is not as skilled as Dr. Hibbert in heart surgery} \\ P(A|B,C^c) &< P(A|B^c,C^c) & &\text{neither is he all that good at band-aid removal} \\ \end{align}

And yet $P(A|B) > P(A|B^c)$?

Explaining with the Law of Total Probability

To explain this paradox, let's try to use the Law of Total Probability.

\begin{align} P(A|B) &= P(A|B,C)P(C|B) + P(A|B,C^c)P(C^c|B) \\ \\ \text{but } P(A|B,C) &< P(A|B^c,C) \\ \text{and } P(A|B,C^c) &< P(A|B^c,C^c) \end{align}

Look at $P(C|B$ and $P(C|B^c)$. These weights are what makes this paradox possible, as they are what make the inequality relation sign flip.

Event $C$ is a case of confounding

Another example

_Is it possible to have events $A_1, A_2, B, C$ such that_

\begin{align} P(A_1|B) &\gt P(A_1|C) \text{ and } P(A_2|B) \gt P(A_2|C) & &\text{ ... yet...} \\ P(A_1 \cup A_2|B) &\lt P(A_1 \cup A_2|C) \end{align}

Yes, and this is just another case of Simpson's Paradox.

Note that

\begin{align} P(A_1 \cup A_2|B) &= P(A_1|B) + P(A_2|B) - P(A_1 \cap A_2|B) \end{align}

So this is not possible if $A_1$ and $A_2$ are disjoint and $P(A_1 \cup A_2|B) = P(A_1|B) + P(A_2|B)$.

It is crucial, therefore, to consider the intersection $P(A_1 \cap A_2|B)$, so let's look at the following example where $P(A_1 \cap A_2|B) \gg P(A_1 \cap A_2|C)$ in order to offset the other inequalities.

Consider two basketball players each shooting a pair of free throws.

Let $A_j$ be the event basketball free throw scores on the $j^{th}$ try.

Player $B$ always either makes both $P(A_1 \cap A_2|B) = 0.8$, or misses both.

\begin{align} P(A_1|B) = P(A_2|B) = P(A_1 \cap A_2|B) = P(A_1 \cup A_2|B) = 0.8 \end{align}

Player $C$ makes free throw shots with probability $P(A_j|C) = 0.7$, independently, so we have

\begin{align} P(A_1|C) &= P(A_2|C) = 0.7 \\ P(A_1 \cap A_2|C) &= P(A_1|C) P(A_2|C) = 0.49 \\ P(A_1 \cup A_2|C) &= P(A_1|C) + P(A_2|C) - P(A_1 \cap A_2|C) \\ &= 2 \times 0.7 - 0.49 \\ &= 0.91 \end{align}

And so we have our case where

\begin{align} P(A_1|B) = 0.8 &\gt P(A_1|C) = 0.7 \\ P(A_2|B) = 0.8 &\gt P(A_2|C) = 0.7 \\ \\ \text{ ... and yet... } \\ \\ P(A_1 \cup A_2|B) &\lt P(A_1 \cup A_2|C) ~~~~ \blacksquare \end{align}