Lecture 3: Birthday Problem, Properties of Probability

Stat 110, Prof. Joe Blitzstein, Harvard University


The Birthday Problem

Given $k$ people, what is the probability of at least 2 people having the same birthday?

First, we need to define the problem:

  1. there are 365 days in a year (no leap-years)
  2. births can be on any day with equal probability (birthdays are independent of one another)
  3. treat people as distinguishable, because.

$k \le 1$ is meaningless, so we will not consider those cases.

Now consider the case where you have more people than there are days in a year. In such a case,

$$ P(k\ge365) = 1$$

Now think about the event of no matches. We can compute this probability using the naïve definition of probability:

$$ P(\text{no match}) = \frac{365 \times 364 \times \cdots \times 365-k+1}{365^k} $$

Now the event of at least one match is the complement of no matches, so

\begin{align} P(\text{at least one match}) &= 1 - P(\text{no match}) \\ &= 1 - \frac{365 \times 364 \times \cdots \times 365-k+1}{365^k} \end{align}

In [1]:
import numpy as np

DAYS_IN_YEAR = 365

def bday_prob(k):
    def no_match(k):
        days = [(DAYS_IN_YEAR-x) for x in range(k)]
        num = np.multiply.reduce(days, dtype=np.float64)
        return num / DAYS_IN_YEAR**k
    return 1.0 - no_match(k)

print("With k=23 people, the probability of a match is {:0f}, already exceeding 0.5.".format(bday_prob(23)))
print("With k=50 people, the probability of a match is {:0f}, which is approaching 1.0.".format(bday_prob(50)))


With k=23 people, the probability of a match is 0.507297, already exceeding 0.5.
With k=50 people, the probability of a match is 0.970374, which is approaching 1.0.

Properties

Let's derive some properties using nothing but the two axioms stated earlier.

Property 1

The probability of an event $A$ is 1 minus the probability of that event's inverse (or complement).

\begin{align} P(A^{c}) &= 1 - P(A) \\ \\ \because 1 &= P(S) \\ &= P(A \cup A^{c}) \\ &= P(A) + P(A^{c}) & \quad \text{since } A \cap A^{c} = \emptyset ~~ \blacksquare \end{align}

Property 2

If $A$ is contained within $B$, then the probability of $A$ must be less than or equal to that for $B$.

\begin{align} \text{If } A &\subseteq B \text{, then } P(A) \leq P(B) \\ \\ \because B &= A \cup ( B \cap A^{c}) \\ P(B) &= P(A) + P(B \cap A^{c}) \\ \\ \implies P(B) &\geq P(A) \text{, since } P(B \cap A^{c}) \geq 0 & \quad \blacksquare \end{align}

Property 3, or the Inclusion/Exclusion Principle

The probability of a union of 2 events $A$ and $B$

\begin{align} P(A \cup B) &= P(A) + P(B) - P(A \cap B) \\ \\ \text{since } P(A \cup B) &= P(A \cup (B \cap A^{c})) \\ &= P(A) + P(B \cap A^{c}) \\ \\ \text{but note that } P(B) &= P(B \cap A) + P(B \cap A^{c}) \\ \text{and since } P(B) - P(A \cap B) &= P(B \cap A^{c}) \\ \\ \implies P(A \cup B) &= P(A) + P(B) - P(A \cap B) ~~~~ \blacksquare \end{align}

This is the simplest case of the principle of inclusion/exclusion.

Considering the 3-event case, we have:

\begin{align} P(A \cup B \cup C) &= P(A) + P(B) + P(C) - P(A \cap B) - P(B \cap C) - P(A \cap C) + P(A \cap B \cap C) \end{align}

...where we sum up all of the separate events; and then subtract each of the pair-wise intersections; and finally add back in that 3-event intersection since that was subtracted in the previous step.

For the general case, we have:

$$ P(A_1 \cup A_2 \cup \cdots \cup A_n) = \sum_{j=1}^n P(A_{j}) - \sum_{i<j} P(A_i \cap A_j) + \sum_{i<j<k} P(A_i \cap A_j \cap A_k) \cdots + (-1)^{n-1} P(A_1 \cap A_2 \cap \cdots \cap A_n) $$

... where we

  • sum up all of the separate events
  • subtract the sums of all even-numbered index intersections
  • add back the sums of all odd-numbered index intersections

de Montmort's problem (1713)

Again from a gambling problem, say we have a deck of $n$ cards, labeled 1 through $n$. The deck is thoroughly shuffled. The cards are then flipped over, one at a time. A win is when the card labeled $k$ is the $k^{th}$ card flipped.

What is the probability of a win?

Let $A_k$ be the event that card $k$ is the $k^{th}$ card flipped. The probability of a win is when at least one of the $n$ cards is in the correct position. Therefore, what we are interested in is

$$ P(A_1 \cup A_2 \cup \cdots \cup A_n) $$

Now, consider the following:

\begin{align} P(A_1) &= \frac{1}{n} & \quad \text{since all outcomes are equally likely} \\ P(A_1 \cap A_2) &= \frac{1}{n} \left(\frac{1}{n-1}\right) = \frac{(n-2)!}{n!} \\ &\vdots \\ P(A_1 \cap A_2 \cap \cdots \cap A_k) &= \frac{(n-k)!}{n!} & \quad \text{because, symmetry} \\ \end{align}

Which leads us to:

\begin{align} P(A_1 \cup A_2 \cup \cdots \cup A_k) &= \binom{n}{1}\frac{1}{n} - \binom{n}{2}\frac{1}{n(n-1)} + \binom{n}{3}\frac{1}{n(n-1)(n-2)} - \cdots \\\\ &= n \frac{1}{n} - \left(\frac{n(n-1)}{2!}\right)\frac{1}{n(n-1)} + \left(\frac{n(n-1)(n-2)}{3!}\right)\frac{1}{n(n-1)(n-2)} - \cdots \\\\ &= 1 - \frac{1}{2!} + \frac{1}{3!} - \frac{1}{4!} \cdots (-1)^{n-1}\frac{1}{n!} \\\\ &= 1 - \sum_{k=1}^{\infty} \frac{(-1)^{k-1}}{k!} \\\\ &= 1 - \frac{1}{e} \\\\ \\\\ \text{ since } e^{-1} &= \frac{(-1)^0}{0!} + \frac{-1}{1!} + \frac{(-1)^2}{2!} + \frac{(-1)^3}{3!} + \cdots + \frac{(-1)^n}{n!} ~~~~ \text{ from the Taylor expansion of } e^{x} \end{align}

Appendix A: The Birthday Paradox Experiment

Here's a very nice, interactive explanation of the Birthday Paradox.