Lecture 11: Poisson Distribution and Approximation

Stat 110, Prof. Joe Blitzstein, Harvard University


Sympathetic Magic, or Confusing the map for the territory

Don't mistake a random variable for its distribution

As a case in point, say we have two random variables $X$ and $Y$.

\begin{align} P(X + Y) \ne P(X=x) + P(Y=y) \end{align}

Claiming equality is clearly nonsense, since $X + Y$ is itself a random variable, and it should be recognized that $P(X=x) + P(Y=y)$ can exceed 1.

Think of them as houses & blueprints

Metaphorically, the distribution is the blueprint, and the random variable is the house.


Poisson Distribution

Description

The single most often used real-world discrete model. Applicable when there are a large number of trials with a small probability of success in each trial.

Note that the probability of success in each trial does not have to be the same in each trial (unlike $\operatorname{Bin}(n,p)$).

The Poisson distribution expresses the probability of a given number of events occurring in a fixed interval of time and/or space, if these events occur with a known average rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

Examples might be counting:

  • e-mails coming into your mailbox in an hour
  • buses following the same route arriving at a bus-stop in a day
  • natural phenomena, like earthquakes or meteor sightings, etc., in a month
  • chips in a chocolate-chip cookie (area/volume)

Notation

$X \sim \operatorname{Pois}(\lambda)$

Parameters

$\lambda$ - the average number of events per interval (rate), where $\lambda \gt 0$


In [1]:
import matplotlib
import numpy as np
import matplotlib.pyplot as plt

from matplotlib.ticker import (MultipleLocator, FormatStrFormatter,
                               AutoMinorLocator)
from scipy.stats import poisson

%matplotlib inline

plt.xkcd()
_, ax = plt.subplots(figsize=(12,8))

# seme Poisson parameters
lambda_values = [1, 4, 8, 16]

# colorblind-safe, divergent color scheme
colors = ['#018571', '#80cdc1', '#dfc27d', '#a6611a']

for i,l in enumerate(lambda_values):
    x = np.arange(poisson.ppf(0.01, l), poisson.ppf(0.99, l))
    pmf = poisson.pmf(x, l)
    ax.plot(x, pmf, 'o', color=colors[i], ms=8, label=r'$\lambda={}$'.format(l))
    ax.vlines(x, 0, pmf, lw=2, color=colors[i], alpha=0.3)

# legend styling
legend = ax.legend()
for label in legend.get_texts():
    label.set_fontsize('large')
for label in legend.get_lines():
    label.set_linewidth(1.5)

# y-axis
ax.set_ylim([0.0, 0.4])
ax.set_ylabel(r'$P(X=k)$')

# x-axis
ax.set_xlim([0, 26])
ax.set_xlabel(r'total # of occurrences $k$ under $\lambda$')

# x-axis tick formatting
majorLocator = MultipleLocator(5)
majorFormatter = FormatStrFormatter('%d')
minorLocator = MultipleLocator(1)
ax.xaxis.set_major_locator(majorLocator)
ax.xaxis.set_major_formatter(majorFormatter)
ax.xaxis.set_minor_locator(minorLocator)

ax.grid(color='grey', linestyle='-', linewidth=0.3)

plt.suptitle(r'Poisson PMF: $P(X = k) = e^{-\lambda} \frac{\lambda^k}{k!}$')

plt.show()


Probability mass function

\begin{align} P(X = k) = e^{-\lambda} \frac{\lambda^k}{k!} ~~ \text{, where } k \in \{0,1,2,\dots,\} \end{align}

Checking the validity of this PMF it is easy to see that it is always non-negative.

Furthermore,

\begin{align} \sum_{k=0}^{\infty} P(X = k) &= \sum_{k=0}^{\infty} e^{-\lambda} \frac{\lambda^k}{k!} \\ &= e^{-\lambda} \sum_{k=0}^{\infty} \frac{\lambda^k}{k!} & &\text{since } e^{-\lambda} \text{ is constant} \\ &= e^{-\lambda} e^{\lambda} & &\text{recall Taylor series for } e^{\lambda} \\ &= 1 ~~~~ \blacksquare \end{align}

Expected value

\begin{align} \mathbb{E}(X) &= \sum_{k=0}^{\infty} k ~~ e^{-\lambda} \frac{\lambda^k}{k!} \\ &= e^{-\lambda} \sum_{k=1}^{\infty} k \frac{\lambda^k}{k!} \\ &= e^{-\lambda} \sum_{k=1}^{\infty} \frac{\lambda^k}{(k-1)!} \\ &= \lambda e^{-\lambda} \sum_{k=1}^{\infty} \frac{\lambda^{k-1}}{(k-1)!} \\ &= \lambda e^{-\lambda} e^{\lambda} \\ &= \lambda ~~~~ \blacksquare \end{align}


Poisson Paradigm (Approximate)

Suppose we have:

  1. a lot of events $A_1, A_2, \dots, A_n$, with $P(A_j) = p$
  2. number of trials $n$ is very large
  3. $p$ is very small

The events could be independent (knowing that $A_1$ occurred has no bearing whatsoever on $A_2$.

They could even be weakly dependent ($A_1$ occurring has some level of affect on the likelihood of $A_2$).

Either way, the expected number of events $A_j$ occurring is approximately $\operatorname{Pois}(\lambda)$, and by Linearity

\begin{align} \mathbb{E}(A) &= \lambda = \sum_{j=1}^n p_j \end{align}

Relating the Binomial distribution to the Poisson

This example will relate the Binomial distribution to the Poisson distribution.

Suppose we have $X \sim \operatorname{Bin}(n,p)$, and let $n \rightarrow \infty$, $p \rightarrow 0$.

We will hold $\lambda = np$ constant, which will let us see what happens to the Binomial distribution when the number of trails approaches $\infty$ while the probabability of success $p$ gets very small.

Specifically let's see what happens to the Binomial PMF under these conditions.

\begin{align} P(X = k) &= \binom{n}{k} p^k q^{n-k} ~~~~ \text{where k is fixed} \\ &= \underbrace{\frac{n (n-1) \dots (n-k+1)}{k!} \left(\frac{\lambda}{n}\right)^k}_{\text{n terms cancel out}} ~~ \underbrace{\left(1 - \frac{\lambda}{n}\right)^n}_{\text{continuous compounding and e}} ~~ \underbrace{\left(1 - \frac{\lambda}{n}\right)^{-k}}_{\text{goes to 1}} \\ \\ &\rightarrow \frac{\lambda^k}{k!} e^{-\lambda} ~~ \text{which is the Poisson PMF at k} \end{align}

Thought-experiment: counting raindrops

Let's say we want to count the number of raindrops that fall on a certain area on the ground.

We could divide this area with a grid, with each grid cell being very, very tiny.

Therefore:

  • the number grid cells is very, very large
  • the probability of a rain drop landing in an arbitrary cell is very, very small

What distribution could we use to model this situation?

How about $X \sim \operatorname{Bin}(n,p)$?

Considering the Binomial distribution, we would have to make certain assumptions:

  1. the event of a raindrop hitting a cell is independent of other events
  2. the probability $p$ of a raindrop hitting an arbitrary cell is identical.

While it might be OK to make these assumptions, but one big stumbling block in using the Binomial is that we would end up calculating factorials of a very large $n$, which even for a computer would be problematic.

How about $X \sim \operatorname{Pois}(\lambda)$?

The Poisson distribution is a better model approximation than the Binomial, since

  1. no factorials involved, so much simpler than the Binomial
  2. we can deal with the case where more than one raindrop lands in a cell
  3. we don't need to make any assumption on $p$ being identical

Example: Graph comparing $X \sim \operatorname{Bin}(n,p)$ vs $X \sim \operatorname{Pois}(\lambda = np)$ where $n = 1e^{6}$ and $p = 1e^{-5}$


In [2]:
from scipy.stats import binom

plt.xkcd()

fig = plt.figure(figsize=(12,8))

# Binomial parameters
n = 1e6
p = 1e-5

x = np.arange(binom.ppf(0.01, n, p), binom.ppf(0.99, n, p))

# plot Bin(n,np)
binom_pmf = binom.pmf(x, n, p)
ax1 = fig.add_subplot(211)
ax1.plot(x, binom_pmf, 'o', ms=8, color='#f46d43', label=r'$X \sim Bin(n,p)$')
ax1.vlines(x, 0, binom_pmf, lw=2, color='#f46d43', alpha=0.3)
ax1.set_title(r'$P(x=k) = \binom{n}{k} p^k (1-p)^{n-k}$')
ax1.legend(loc='upper right')

ax1.grid(color='grey', linestyle='-', linewidth=0.3)

# y-axis
ax1.set_ylim([0.0, 0.15])

# plot Pois(np)
pois_pmf = poisson.pmf(x, n*p)
ax2 = fig.add_subplot(212)
ax2.plot(x, pois_pmf, 'o', ms=8, color='#4575b4', label=r'$X \sim Pois(\lambda = np)$')
ax2.vlines(x, 0, pois_pmf, lw=2, color='#4575b4', alpha=0.3)
ax2.set_title(r'$P(X = k) = e^{-\lambda} \frac{\lambda^k}{k!}$')
ax2.legend(loc='upper right')

ax2.grid(color='grey', linestyle='-', linewidth=0.3)

# y-axis
ax2.set_ylim([0.0, 0.15])

fig.tight_layout()

st = fig.suptitle(r'Approximating $X \sim Bin(n,p)$ with $X \sim Pois(\lambda)$, where $n=1e^{6}, p=1e^{-5}$', fontsize=18)
st.set_y(1.035)



Birthdays, revisited: triple matches

Suppose we have $n$ people, and we want to know the approximate probability that there are 3 people among the $n$ who share the same birthday.

Doing this the way we did in Lecture 3 is possible, but it would get very messy.

Let's set up the problem first.

  • there are $\binom{n}{3}$ triplets
  • we used indicator r.v. $I_{ijk} \text{, where } i < j < k$
  • $\Rightarrow \mathbb{E}(\# \text{ triple matches}) = \binom{n}{3} \frac{1}{365^2}$

But we are interested in the approximate probability of triple matches, so now let's use the Poisson.

Why?

  1. we expect the total number of trials $\binom{n}{3}$ to be very large
  2. the probability $P(I_{ijk})$ is very small
  3. events are weakly dependent, since if persons $1$ and $2$ are already known to share a birthday, then $I_{123} \text{, } I_{124}$ are not completely independent

We claim that these circumstances are approximately $\operatorname{Pois}(\lambda)$ with $\lambda = \mathbb{E}(\# \text{ triple matches})$.

Let $X = \text{# triple matches}$

\begin{align} P(X \ge 1) &= 1 - P(X=0) \\ &\approx 1 - e^{-\lambda} \frac{\lambda^0}{0!} \\ &= 1 - e^{-\lambda} ~~~~ \text{where you just plug in } \lambda = \binom{n}{3} \frac{1}{365^2} ~~~~ \blacksquare \end{align}