All of Statistics

Chapter 1: Probability

Some terminology:

  • $\Omega$: the set of possible outcomes of an experiment.
  • $\omega$: a sample outcome, realization, or element of $\Omega$.
  • A: event, a subset of $\Omega$.
  • $A^c$: complement of A (not A)
  • $A \cup B$: union (A or B)
  • $A \cap B$: intersection (A and B)
  • disjoint: $A \cap B = \emptyset$, also known as mutually exclusive
  • partition: a sequence of disjoint sets that can be unioned to make $\Omega$
  • $A \subset B$: A is a subset of B.
  • $A \supset B$: A is a superset of B.


  • $\mathbb{P}(A)$: the probability of event A
  • $\mathbb{P}$: the probability distribution
    • $\mathbb{P}(A) >= 0$ for every A.
    • $\mathbb{P}(\Omega) = 1$
    • If $A_i$ are disjoint,
      $\mathbb{P}(\cup_{i=1}^{\infty}A_i) = \sum_{i=1}^{\infty}\mathbb{P}(A_i)$


  • n choose k: $\binom{n}{k} = \frac{n!}{k!(n-k)!}$
    Number of distinct ways of choosing k items from n.

Independent Events

  • independent: $\mathbb{P}(AB) = \mathbb{P}(A)\mathbb{P}(B)$

Conditional Probability

If $\mathbb{P}(B) > 0$ then the conditional probability of A given B is
$\mathbb{P}(A|B) = \frac{\mathbb{P}(AB)}{\mathbb{P}(B)}$

If A and B are independent events iff
$\mathbb{P}(A|B) = \mathbb{P}(A)$

For any pair of events A and B,
$\mathbb{P}(AB) = \mathbb{P}(A|B)\mathbb{P}(B) = \mathbb{P}(B|A)\mathbb{P}(A)$

Bayes' Theorem

Let $A_1,...,A_k$ be a partition of $\Omega$ such that $\mathbb{P}(A_i) > 0$ for each i. If $\mathbb{P}(B) > 0$, then
$\mathbb{P}(A_i|B) = \frac{\mathbb{P}(B|A_i)\mathbb{P}(A_i)}{\mathbb{P}(B)} = \frac{\mathbb{P}(B|A_i)\mathbb{P}(A_i)}{\sum_j\mathbb{P}(B|A_j)\mathbb{P}(A_j)}$



5. Suppose we toss a fair coin until we get exactly two heads. Describe the sample space S. What is the probability that exactly k tosses are required?


10. A prize is placed at random behind one of three doors. You pick a door. (We'll say you always pick door #1.) Monty Hall chooses one of the other two doors, opens it and shows you that it is empty. He then gives you the opportunity to keep your door or switch to the other unopened door. Should you stay or switch? Intuition suggests it doesn't matter. The correct answer is that you should switch. Prove it.

If we write the sample space $\Omega$ as $\{(A_i, B_j); i\in\{1,2,3\}, j\in\{2,3\}\}$ where $A_i$ is the door the prize is behind and $B_j$ is the door that Monty Hall opens (without the prize), then we can enumerate the sample space as:
$\Omega = \{(1,2), (1,3), (2,3), (3,2)\}$

$P(A_i|B_j) = \frac{P(B_j|A_i)P(A_i)}{P(B_j)} = \frac{P(B_j|A_i)P(A_i)}{\sum_k P(B_j|A_k)P(A_k)}$

First let's compute $P(A_1|B_2)$, meaning the probability that the prize is behind door 1 given that Monty Hall opened door 2.

$P(A_1|B_2) = \frac{P(B_2|A_1)P(A_1)}{P(B_2)} = \frac{P(B_2|A_1)P(A_1)}{P(B_2|A_1)P(A_1)+P(B_2|A_2)P(A_2)+P(B_2|A_3)P(A_3)}$

  • $P(B_2|A_1) = \frac{1}{2}$ (Given that the prize is behind door 1, the door we picked, this leaves two doors for Monty to reveal.)
  • $P(A_1) = \frac{1}{3}$ (This is our prior belief that the prize was behind door 1.)
  • $P(B_2|A_1)P(A_1) = \frac{1}{2} \cdot \frac{1}{3}$
  • $P(B_2|A_2)P(A_2) = 0 \cdot \frac{1}{3}$ (There is no chance Monty will open door 2 given the prize is behind door 2.)
  • $P(B_2|A_3)P(A_3) = 1 \cdot \frac{1}{3}$ (If the prize is behind door 3, Monty will reveal door 2 every time.)

$P(A_1|B_2) = \frac{\frac{1}{2} \cdot \frac{1}{3}}{\frac{1}{2}\cdot\frac{1}{3} + 0 + \frac{1}{3}} = \frac{\frac{1}{6}}{\frac{1}{2}} = \frac{1}{3}$

Considering that door 2 is the one revealed, the other probability to figure out is $P(A_3|B_2)$. We already know that $P(A_2|B_2) = 0$. And we know that $P(A_1|B_2) + P(A_2|B_2) + P(A_3|B_2) = 1$, so that leads us to

$P(A_3|B_2) = \frac{2}{3}$

We could also figure this out explicitly.

$P(A_3|B_2) = \frac{P(B_2|A_3)P(A_3)}{P(B_2)} = \frac{P(B_2|A_3)P(A_3)}{P(B_2|A_1)P(A_1) + P(B_2|A_2)P(A_2) + P(B_2|A_3)P(A_3)}$

  • $P(B_2|A_3) = 1$
  • $P(A_3) = \frac{1}{3}$
  • $P(B_2|A_1)P(A_1) = \frac{1}{2} \cdot \frac{1}{3}$
  • $P(B_2|A_2)P(A_2) = 0$
  • $P(B_2|A_3)P(A_3) = 1 \cdot \frac{1}{3}$

$P(A_3|B_2) = \frac{1 \cdot \frac{1}{3}}{\frac{1}{2} \cdot \frac{1}{3} + 0 + \frac{1}{3}} = \frac{\frac{1}{3}}{\frac{1}{2}} = \frac{2}{3}$


12. There are three cards.

  • Green on both sides
  • Red on both sides
  • Green on one side, red on the other.

We choose a card at random and we see one side (also chosen at random). If the side we see is green, what is the probability that the other side is also green?

Let's enumerate each card face as $C_{i,j}$ where $i \in \{1,2,3\}$ is the card index and $j \in \{1,2\}$ is the face index.

  • $C_{1,1}, C_{1,2}$ (green, green)
  • $C_{2,1}, C_{2,2}$ (red, red)
  • $C_{3,1}, C_{3,2}$ (green, red)

If we see a green face, we are looking at one of $C_{1,1}, C_{1,2}, C_{3,1}$ with equal probability of $\frac{1}{3}$. So the other sides of the cards are the card faces $C_{1,2}, C_{1,1}, C_{3,2}$, or {green, green, red}. So the probability of the other side being green is 2/3.


15. The probability that a child has blue eyes is 1/4. Assume independence between children. Consider a family with 3 children.

(a) If it is known that at least one child has blue eyes, what is the probability that at least two children have blue eyes? (b) If it is known that the youngest child has blue eyes, what is the probability that at least two children have blue eyes?

We will denote the children as $C_{i,j}$ where $i \in \{1,2,3\}$ is the index of the child, and $j \in \{0,1\}$ denotes the eye color, 1=blue, 0=not blue. The entire space is

$\Omega = \{(C_{1,0},C_{2,0},C_{3,0}), (C_{1,1},C_{2,0},C_{3,0}), (C_{1,0},C_{2,1},C_{3,0}), (C_{1,0},C_{2,0},C_{3,1}), (C_{1,1},C_{2,1},C_{3,0}), (C_{1,1},C_{2,0},C_{3,1}), (C_{1,0},C_{2,1},C_{3,1}), (C_{1,1},C_{2,1},C_{3,1})\}$

We will further define the state $B_k$ where $k \in \{0,1,...i\}$ to mean that exactly k children have blue eyes. And we will define the state $B_{k+}$ where $k \in \{0,1,...i\}$ to mean that k or more children have blue eyes. We are intersted in

$P(B_{2+} \mid B_{1+}) = \cfrac{P(B_{2+} \cap B_{1+})}{P(B_{1+})} = \cfrac{P(B_{2+})}{P(B_{1+})}$

  • $P(B_{2+}) = \frac{1}{2}$ (There are 4 out of 8 events where two or more children have blue eyes.)
  • $P(B_{1+}) = \frac{7}{8}$

$P(B_{2+}|B_{1+}) = \frac{\frac{1}{2}}{\frac{7}{8}} = \frac{4}{7}$

We could just as well have done this by narrowing the event space by the condition that at least one child as blue eyes, leaving us with 7 events, then seeing that 4 of those seven events have 2 or more children with blue eyes.

If we now consider the case that the youngest child has blue eyes, we can designate the order of children in the events as youngest to oldest. (It actually don't matter, as long as we designate a single child as having blue eyes.) There are 4 events that involve $C_{1,1}:

$\{(C_{1,1},C_{2,0},C_{3,0}), (C_{1,1},C_{2,1},C_{3,0}), (C_{1,1},C_{2,0},C_{3,1}), (C_{1,1},C_{2,1},C_{3,1})\}$

Of those events, there are 3 with two or more children with blue eyes. So the probability is 3/4.

$P(B_{2+}|C_{1,1}) = \frac{3}{4}$


19. Suppose that 30 percent of computer owners use a Macintosh, 50 percent use Windows, and 20 percent use Linux. Suppose that 65 percent of the Mac users have succumbed to a computer virus, 82 percent of the Windows users get the virus, and 50 percent of the Linux users get the virus. We select a person at random and learn that her system was infected with the virus. What is the probability that she is a Windows user?

We will denote M as Mac, W as Windows, and L for Linux. + means the user has the virus, - means no virus.

+ 0.3 * 0.65 0.5 * 0.82 0.2 * 0.5
- 0.3 * 0.35 0.5 * 0.18 0.2 * 0.5
+ 0.195 0.41 0.1
- 0.105 0.09 0.1

$P(W|+) = \cfrac{P(W \cap +)}{P(+)} = \cfrac{0.41}{0.195 + 0.41 + 0.1} = \cfrac{0.41}{0.705} = 0.58156$

$P(M|+) = 0.27659$

$P(L|+) = 0.14184$


20. A box contains 5 coins and each has a different probability of showing heads. Let $p_1,...,p_5$ denote the probability of heads on each coin. Suppose that

  • $p_1 = 0$
  • $p_2 = 1/4$
  • $p_3 = 1/2$
  • $p_4 = 3/4$
  • $p_5 = 1$

Let H denote "heads is obtained" and let $C_i$ denote the event that coin $i$ is selected.

(a) Selecte a coin at random and toss it. Suppose a head is obtained. What is the posterior probability that coin i was selected (i=1,...5)? In other words, find $P(C_i|H) for i=1,...,5.

$P(C_i|H) = \cfrac{P(H|C_i)P(C_i)}{P(H)} = \cfrac{P(H|C_i)P(C_i)}{\sum_{k=1}^{5}P(H|C_k)P(C_k)}$

We know the probabilities of each coin.

  • $P(H|C_1) = 0$
  • $P(H|C_2) = 1/4$
  • $P(H|C_3) = 1/2$
  • $P(H|C_4) = 3/4$
  • $P(H|C_5) = 1$

And we know that $P(C_k) = 1/5$.

The sum will be in the denominator of each calculation is then

$\sum_{k=1}^{5}P(H|C_k)P(C_k) = 0 \cdot \frac{1}{5} + \frac{1}{4} \cdot \frac{1}{5} + \frac{1}{2} \cdot \frac{1}{5} + \frac{3}{4} \cdot \frac{1}{5} + 1 \cdot \frac{1}{5} = \frac{0 + 1 + 2 + 3 + 4}{20} = \frac{1}{2}$

  • $P(C_1|H) = \cfrac{0 \cdot \frac{1}{5}}{\frac{1}{2}} = 0$
  • $P(C_2|H) = \cfrac{\frac{1}{4} \cdot \frac{1}{5}}{\frac{1}{2}} = \frac{1}{10}$
  • $P(C_3|H) = \cfrac{\frac{1}{2} \cdot \frac{1}{5}}{\frac{1}{2}} = \frac{1}{5}$
  • $P(C_4|H) = \cfrac{\frac{3}{4} \cdot \frac{1}{5}}{\frac{1}{2}} = \frac{3}{10}$
  • $P(C_5|H) = \cfrac{1 \cdot \frac{1}{5}}{\frac{1}{2}} = \frac{2}{5}$

(b) Toss the coin again. What is the prbability of another head? In other words, find $P(H_2|H_1)$ where $H_j$ = "heads on toss j".

We can use the fact that the coin events $C_k$ partition the space,

$P(H_2|H_1) = \sum_{k=1}^{5}P(H_2|H_1,C_k)P(C_k)$

These events $H_1$ and $H_2$ are independent, even though our belief of one is updated by the results of the other.

  • $P(H_2|H_1,C_1) = 0$
  • $P(H_2|H_1,C_2) = \frac{1}{4}$
  • $P(H_2|H_1,C_3) = \frac{1}{2}$
  • $P(H_2|H_1,C_4) = \frac{3}{4}$
  • $P(H_2|H_1,C_5) = 1$

And we have our updated posteriors above.

$P(H_2|H_1) = P(H_2|H_1,C_1)P(C_1) + P(H_2|H_1,C_2)P(C_2) + P(H_2|H_1,C_3)P(C_3) + P(H_2|H_1,C_4)P(C_4) + P(H_2|H_1,C_5)P(C_5) = 0 \cdot 0 + \frac{1}{4} \cdot \frac{1}{10} + \frac{1}{2} \cdot \frac{1}{5} + \frac{3}{4} \cdot \frac{3}{10} + 1 \cdot \frac{2}{5} = \frac{0 + 1 + 4 + 9 + 16}{40} = \frac{3}{4}$


Suppose a coin has probability p of falling heads up. If we flip the coin many times, we would expect the proportion of heads to be near p. We will make this formal later. Take p=0.3 and n=1000 and simulate n coin flips. Plot the proportion of heads as a function of n. Repeat for p=0.03.

In [44]:
import matplotlib.pyplot as plt
import numpy as np

def plot_random_simulation(p=0.3, n=10000):
    heads = (np.random.rand(1, n) <= p).cumsum()
    flips = (np.ones(n)).cumsum()
    ratio = 1.0*heads/flips
    fig, ax = plt.subplots(figsize=(7,5))
    ax.plot(flips, ratio)
    ax.set_title('Simulating {n} Coin Flips with p={p}'.format(n=n, p=p))
    ax.set_xlabel('number of flips')
    ax.set_ylabel('ratio of heads')
    ax.set_ylim(bottom=0, top=1.0)
    ax.axhline(y=p, linestyle=':', alpha=0.5)



Suppose we flip a coin n times and let p denote the probability of heads. Let X be the number of heads. We call X a binomial random variable, which is discussed in the next chapter. Intuition suggests that X will be close to $n{\cdot}p$. To see if this is true, we can repeat this experiment many times and average the X values. Carry out a simulation and compare the average of the X's to $n{\cdot}p$. Try this for p=0.3 and n=10, n=100, and n=1000.

In [60]:
def plot_binomial_simulation(p=0.3, n=1000):
    iteration = 10000
    heads = np.array([(np.random.rand(1, n) <= p).sum() for i in xrange(iteration)])
    fig, ax = plt.subplots(figsize=(7,5))
    bins = heads.max()-heads.min() + 1
    ax.hist(heads, bins=bins)
    ax.axvline(x=n*p, color='red')

In [ ]: