By Evgenia "Jenny" Nitishinskaya and Delaney Granizo-Mackenzie

Notebook released under the Creative Commons Attribution 4.0 License.

The probability of an event answers the question, how likely is it that this event will occur? It is a number between 0 and 1, with 0 meaning "this event is impossible," and 1 meaning "this event will certainly occur." The probability of a fair coin landing heads up is 1/2, while the probability of it landing heads or tails is 1 (since every time we flip it, it must be heads or tails). We can write this as $P(H) = 1/2$ and $P(H\text{ or }T) = 1$. If we pick a ball randomly from a bag that has 15 white and 30 red balls, then the probability of drawing a red ball is 2/3.

In general, we have $P(A\text{ or }B) = P(A) + P(B) - P(A\text{ and } B)$. Two events are mutually exclusive if they cannot both occur at the same time, so $P(A\text{ and }B) = 0$. In that case, the probability of one of the two happening is just the sum of their individual probabilities: $P(A\text{ or }B) = P(A) + P(B)$. For instance, a die cannot land both 1 up and 5 up, so $P(1\text{ or }5) = P(1) + P(5) = 1/3$. A set of events is exhaustive if one of them must occur. The sum of their probabilities is at least 1, e.g. $P(H) + P(H \text{ or } T) = 1.5$. The sum of the probabilities of a set of mutually exclusive and exhaustive events is 1. In particular, $E$ and not $E$ are mutally exclusive and exhaustive, in which case this rule says that an event must either happen or not happen: $P(E) + P(\text{not }E) = 1$.

Often we have a random variable - that is, a process which has an uncertain outcome, such as a coin flip or the returns on a stock - which we do not know the true probabilities for. In that case, we can estimate the probability of an event based on historical data. If a coin has come up heads 21 out of 30 times, our best guess for the probability of this particular coin coming up heads is 7/10. This is called the empirical probability of the event.

We can state the probability of an event in terms of odds. The odds of an event $E$ happening are $P(E) \big/ (1 - P(E))$. If the odds of team A beating team B are 2 to 5, then the probability that team A wins is 2/7, while the probability that team B wins is 5/7.

Conditional probability

Conditional probability refers to probability that takes some extra information about the state of the world into account. For instance, we can ask, "what is the probability that I have the flu, given that I am sneezing?" or, "what is the probability that this stock will have a return above the risk-free rate, given that it has positive return?" In both cases, the extra information changes the probability. We write the probability of $A$ given $B$ as $P(A|B)$. When looking at conditional probability, we are only considering situations where $B$ occurs, and we don't care about the situations where it does not. To compute conditional probability, we use the equation

$$ P(A|B) = P(A\text{ and } B) \big/ P(B) $$

This can be written as $P(A|B)P(B) = P(A\text{ and } B)$, that is, the probability that $A$ and $B$ both occur is the probability that $B$ occurs, times the probability that $A$ occurs given that $B$ occurs. So the probability that I am sneezing and I have the flu is the probability that I am sneezing, times the probability that I also have the flu. Notice that the equation above is symmetric, so we can also say $P(A\text{ and } B) = P(B|A)P(A)$. Often we may know $P(B|A)$ but not $P(A|B)$, so a useful restatement of the equation for conditional probability is $$ P(A|B) = \frac{P(B|A)P(A)}{P(B)} $$

If we compute conditional probability empirically, we can use simple counting instead of formulas. Just look at the set of data points for which $B$ is true, and compute the fraction of them that have $A$ true as well. For instance, we can ask, "what is the probability that a stock will be in the top decile of returns this month, given that it was in the top decile last month?" To compute this, we count the number of stocks that were in the top decile both months, and divide it by the number of stocks in the top decile last month.

Two events are independent if they do not influence each other, so that $P(B|A) = P(B)$ and $P(A|B) = P(A)$ (note that if one of these equations is true, the other is also true). For example, the probability that a coin comes up heads is independent from the probability of it raining next week: if the coin comes up heads, it doesn't change our belief about the probability of rain, and vice versa. Two successive coin flips are independent if we know the probability of this coin landing heads (e.g. if we know that it is a fair coin). For independent events, the probability of them both occuring is simply $P(A\text{ and } B) = P(A)P(B)$.

Conditional probabilitites satisfy the total probability rule, which states that if $S_1, S_2, \ldots, S_n$ are mutually exclusive and exhaustive events, then $$ P(A) = P(A|S_1)P(S_1) + P(A|S_2)P(S_2) + \ldots + P(A|S_n)P(S_n) $$

As an example, imagine that a certain test for disease X comes up positive for 99% of subjects with the disease, and for 5% of healthy subjects. Statistically, 1% of the population have the disease. Say $B$ is the test coming up positive, and $A$ is having the disease. Then $P(B|A) = .99$ and $P(A) = .01$. To compute $P(B)$, we can use the total probability rule, and we get that $P(B) = .99 \cdot .06$. Then the probability that you have the disease, given that your test comes up positive, is $P(A|B) = 16.7\%$.

In [ ]:
# We'll need numpy's random number generator
import numpy as np

# Initialize counters for number of people of each type
sick_and_pos = 0
sick_and_neg = 0
healthy_and_pos = 0
healthy_and_neg = 0

In [159]:
# Simulate 2000 patients
for i in range(2000):
    sick = (np.random.rand() < 0.01)
    if sick:
        test = (np.random.rand() < 0.99)
        if test:
            sick_and_pos += 1
            sick_and_neg += 1
        test = (np.random.rand() < 0.05)
        if test:
            healthy_and_pos += 1
            healthy_and_neg += 1

print 'Empirical P(A|B):', float(sick_and_pos)/(sick_and_pos + healthy_and_pos)

Empirical P(A|B): 0.163742690058

Expected value

The expected value $E(X)$ estimates the result of a random variable $X$ that has numerical outcomes. It is a weighted average of the outcomes: $$ E(X) = \sum_{i=1}^n P(X_i) X_i $$ where $X_1, X_2, \ldots, X_n$ are the possible outcomes of $X$. This is the mean value of the random variable $X$.

To understand why this formula makes sense, say that we are playing a game where we flip a coin and I pay you \$2 if it comes up heads, while you pay me \$1 if the coin comes up tails. If the coin is fair, there is a 50% chance you will gain \$2 and a 50% chance you will lose \$1. Intuitively this game seems to be in your favor. If we calculate the expected value precisely, you expect to earn \$0.50 from one round of this game. However, if the coin is biased so that the odds of coming up heads are 1:2, then your advantage disappears and you expect to break even: $P(X_1)X_1 + P(X_2)X_2 = 1/3 \cdot 2 - 2/3 \cdot 1 = 0$. Run the simulations below a few times. The average money per game should be close to what we calculated.

In [160]:
# Initialize counters for # games played and amount earned
fair_games = 0
fair_money = 0.0
biased_games = 0
biased_money = 0.0

In [155]:
# Play 300 fair games
for i in range(300):
    flip = (np.random.rand() < 0.5) # Simulating 50% probability of heads
    fair_games += 1
    if flip:
        fair_money += 2
        fair_money -= 1
print 'Fair games played:', fair_games, 'Money:', fair_money, '$/games:', fair_money/fair_games

Fair games played: 300 Money: 168.0 $/games: 0.56

In [156]:
# Play 300 biased games (with the biased coin)
for i in range(300):
    flip = (np.random.rand() < 0.333333) # Simulating 1/3 probability of heads
    biased_games += 1
    if flip:
        biased_money += 2
        biased_money -= 1
print 'Biased games played:', biased_games, 'Money:', biased_money, '$/games:', biased_money/biased_games

Biased games played: 300 Money: 3.0 $/games: 0.01

Notice that in the first situation we were equally likely to win, while in the second I was more likely to win. But since the amount of money we win matters, we need to compute the expected value to determine whether a game is advantageous. Similarly, when playing the lottery, it is important to take into account both how likely you are to win and how much money you would receive if you did.

Expected value can be estimated empirically from historical data, as we did with the coin simulation. In this case it is clear that the expected value is the same as the mean, because we simply average the historical values.

We can formulate a definition of conditional expected value as follows: $$ E(X|S) = \sum_{i=1}^n P(X_i|S) X_i $$ That is, we assume that $S$ happens, and compute the expected value of $X$ under those circumstances. A version of the total probability rule for conditional probability applies here: $$ E(X) = E(X|S_1)P(S_1) + P(X|S_2)P(S_2) + \ldots + P(X|S_n)P(S_n) $$ where $S_1, S_2, \ldots, S_n$ are mutually exclusive and exhaustive events.


The expected value tells us a single number that we can use to represent our random variable, which is the average value. It's important to also measure risk, or how close to the expected value we're actually likely to get. For this we use use variance, which can quantify the difference between, say, playing the game above with a fair coin, and simply being handed \$0.50. The variance of a random variable is defined as

$$ Var(X) = E\left([X - E(X)]^2 \right) $$

It is also sometimes written as $\sigma^2(X)$, and $\sigma$ (the positive square root of the variance) is called the standard deviation. This is often more useful because it is in the same units as the original data, so it is directly comprable.

Continuing with the coin game example, let's expand the definition of variance: $$ Var(X) = \sum_{i=1}^n P(X_i) [X_i - E(X)]^2 $$ Plugging in the probabilities and values of the outcomes, we get that the variance of your profit in the fair-coin game is $\frac{1}{2} (2 - .5)^2 + \frac{1}{2} (-1 - .5)^2 = 2.25$, and the standard deviation is 1.5. So, although you expect to get a slight profit, you shouldn't be very confident in this expectation if we are only playing the game once (if we play many times, then on average you will get 50 cents per game). Meanwhile the standard deviation of me handing you \$0.50 is 0. In the weighted-coin game, the standard deviation is $\approx 1.4$.