In this notenook we work an example of the base rate fallacy using Bayes Theorem.
Assume that we have two random variables $HasDisease$ and $FailsTest$. $HasDisease=y$ indicates that a person has the disease while $HasDisease=n$ indicates that the person in disease free. In addition, we have a test which attempts to detect the disease. $FailsTest=y$ indicates that our test says a person hasthe disease while $FailsTest=n$ indicates that our test says a person does not have the disease.
In this notebook you can play around with the probabilities of interest and see now likely it is that, given you fail the test, that you actually have the disease.
Suppose we know the following probabilities:
\begin{align} Pr(FailsTest=y | HasDisease=y) &= FailAndHasDisease \\ Pr(FailsTest=n | HasDisease=y) &= NotFailAndHasDisease \\ Pr(FailsTest=y | HasDisease=n) &= FailAndNotHasDisease \\ Pr(FailsTest=n | HasDisease=n) &= NotFailAndNotHasDisease \\ \end{align}And we know the prior probability of the disease in the population
$$ Pr(HasDisease=y). $$Note, the point of the base rate fallacy is that you need all 5 probabilities to compute what you are interested in, namely the probability you have the disease given you fail the test, denoted
$$ Pr(HasDisease=y | FailsTest=y). $$Without, $Pr(HasDisease=y)$ you cannot truly understand $Pr(HasDisease=y | FailsTest=y)$.
You can play aroun with the numbers in the next cell to see how things work out.
In [1]:
FailAndHasDisease = 1.0
NotFailAndHasDisease = 0.0
FailAndNotHasDisease = 0.01
NotFailAndNotHasDisease = 0.99
HasDisease = 1./1000
Bayes theorem says that
$$ Pr(HasDisease=y | FailsTest=y) = \frac{Pr(FailsTest=y | HasDisease=y) Pr(HasDisease=y)}{Pr(FailsTest=y)} $$Our table gives us the two terms in the numerator, we get the demoninator from the Law of total probability.
\begin{align} Pr(FailsTest=y) & = Pr(FailsTest=y | HasDisease=y) Pr(HasDisease=y) + Pr(FailsTest=y | HasDisease=n) Pr(HasDisease=n) \\ & = Pr(FailsTest=y | HasDisease=y) Pr(HasDisease=y) + Pr(FailsTest=y | HasDisease=n) (1- Pr(HasDisease=y)) \end{align}So, the whole thing is
$$ Pr(HasDisease=y | FailsTest=y) = \frac{Pr(FailsTest=y | HasDisease=y) Pr(HasDisease=y)}{(Pr(FailsTest=y | HasDisease=y) Pr(HasDisease=y) + Pr(FailsTest=y | HasDisease=n) (1- Pr(HasDisease=y)))} $$
In [2]:
FailAndHasDisease*HasDisease/(FailAndHasDisease*HasDisease + FailAndNotHasDisease*(1-HasDisease))
Out[2]:
This matches the result we did by hand in class. Play around with the probabilities and see what you discover.
In [ ]: