In [1]:
from __future__ import print_function, division
import thinkbayes2
import thinkplot
import numpy as np
from scipy import stats
%matplotlib inline
Here's an update that takes the prior probability of being sick prior
=$x$, and the likelihoods of the data, prob_given_sick
=$P(fever|sick)$ and prob_given_not
=$P(fever|not sick)$
In [2]:
def update(prior, prob_given_sick, prob_given_not):
suite = thinkbayes2.Suite()
suite['sick'] = prior * prob_given_sick
suite['not sick'] = (1-prior) * prob_given_not
suite.Normalize()
return suite['sick']
If we start with $x=0.1$ and update with the assumption that fever is more likely if you're sick, the posterior goes up to $x\prime = 0.25$
In [3]:
prior = 0.1
prob_given_sick = 0.9
prob_given_not = 0.3
post = update(prior, prob_given_sick, prob_given_not)
post
Out[3]:
Now suppose we don't know $s =$ prob_given_sick
=$P(fever|sick)$ and $t = $ prob_given_not
=$P(fever|not sick)$, but we think they are uniformly distributed and independent.
In [4]:
dist_s = thinkbayes2.Beta(1, 1)
dist_t = thinkbayes2.Beta(1, 1)
dist_s.Mean(), dist_t.Mean()
Out[4]:
We can compute the distribute of $x\prime$ by drawing samples from the distributions of $s$ and $t$ and computing the posterior for each.
In [5]:
n = 1000
ss = dist_s.Sample(n)
ts = dist_t.Sample(n)
Just checking that the samples have the right distributions:
In [6]:
thinkplot.Cdf(thinkbayes2.Cdf(ss))
thinkplot.Cdf(thinkbayes2.Cdf(ts))
Out[6]:
Now computing the posteriors:
In [7]:
posts = [update(prior, s, t) for s, t in zip(ss, ts)]
Here's what the distribution of values for $x\prime$ looks like:
In [8]:
cdf = thinkbayes2.Cdf(posts)
thinkplot.Cdf(cdf)
Out[8]:
And here's the mean:
In [9]:
cdf.Mean()
Out[9]:
This result implies that if our prior probability for $x$ is 0.1, and then we learn that the patient has a fever, we should be uncertain about $x\prime$, and this distribution describes that uncertainty. It says that the fever probably has little predictive power, but might have quite a lot.
The mean of this distribution is a little higher than the prior, which suggests that our priors for $s$ and $t$ are not neutral with respect to updating $x$. It's surprising that the effect is not symmetric, because our beliefs about $s$ and $t$ are symmetric. But then again, we just computed an arithmetic mean on a set of probabilities, which is a bogus kind of thing to do. So maybe we deserve what we got.
Just for fun, what would we have to believe about $s$ and $t$ to make them neutral with respect to the posterior mean of $x$?
In [10]:
dist_s = thinkbayes2.Beta(1, 1)
dist_t = thinkbayes2.Beta(1.75, 1)
n = 10000
ss = dist_s.Sample(n)
ts = dist_t.Sample(n)
thinkplot.Cdf(thinkbayes2.Cdf(ss))
thinkplot.Cdf(thinkbayes2.Cdf(ts))
Out[10]:
In [11]:
posts = [update(prior, s, t) for s, t in zip(ss, ts)]
np.array(posts).mean()
Out[11]:
Now here's a version that simulates worlds where $x$ is known and $s$ and $t$ are drawn from uniform distributions. For each $s$-$t$ pair, we generate one patient with a fever and compute the probability that they are sick.
In [12]:
def prob_sick(x, s, t):
return x * s / (x * s + (1-x) * t)
In [13]:
dist_s = thinkbayes2.Beta(1, 1)
dist_t = thinkbayes2.Beta(1, 1)
n = 10000
ss = dist_s.Sample(n)
ts = dist_t.Sample(n)
In [14]:
x = 0.1
probs = [prob_sick(x, s, t) for s, t in zip(ss, ts)]
In [15]:
np.array(probs).mean()
Out[15]:
In [16]:
cohort = np.random.random(len(probs)) < probs
In [17]:
cohort.mean()
Out[17]:
April 6, 2016
Suppose
If we take the average value of s and compute p = p(sick|fever), we would consider p to be the known quantity 0.1
In [26]:
s = 0.5
t = 0.5
x = 0.1
p = x * s / (x * s + (1-x) * t)
p
Out[26]:
If we propagate the uncertainty about s through the calculation, we consider p to be either p1 or p2, but we don't know which.
In [36]:
t = 0.8
p1 = x * s / (x * s + (1-x) * t)
p1, 5/77
Out[36]:
In [35]:
t = 0.2
p2 = x * s / (x * s + (1-x) * t)
p2, 5/23
Out[35]:
If we were asked to make a prediction about a single patient, we would average the two possible values of p.
In [29]:
(p1 + p2) / 2
Out[29]:
In [34]:
def logodds(p):
return np.log(p / (1-p))
logodds(p1) - logodds(0.1), logodds(p2) - logodds(0.1)
Out[34]:
In [37]:
p_mix = (p1 + p2) / 2
logodds(p_mix) - logodds(0.1)
Out[37]:
So let's simulate a series of patients by drawing a random value of y, computing p, and then tossing coins with probability p.
In [30]:
import random
def simulate_patient():
s = 0.5
t = random.choice([0.2, 0.8])
x = 0.1
p = x * s / (x * s + (1-x) * t)
return random.random() < p
simulate_patient()
Out[30]:
In this simulation, the fraction of patients with fever who turn out to be sick is close to 0.141
In [31]:
patients = [simulate_patient() for _ in range(10000)]
sum(patients) / len(patients)
Out[31]:
In [153]:
x = 0.1
s = 0.5
t1 = 0.2
t2 = 0.8
In [154]:
import pandas as pd
d1 = dict(feverbad=0.5, fevergood=0.5)
d2 = dict(sick=x, notsick=1-x)
d3 = dict(fever='t', notfever='1-t')
iterables = [d1.keys(), d2.keys(), d3.keys()]
index = pd.MultiIndex.from_product(iterables, names=['first', 'second', 'third'])
df = pd.DataFrame(np.zeros(8), index=index, columns=['prob'])
In [155]:
t_map = dict(fevergood=t2, feverbad=t1)
for label1, p1 in d1.items():
t = t_map[label1]
for label2, p2 in d2.items():
for label3, p3 in d3.items():
if label2 == 'sick':
p = p1 * p2 * 0.5
else:
p = p1 * p2 * eval(p3)
df.prob[label1, label2, label3] = p
df
Out[155]:
If there are two kinds of people, some more fever prone than others, and we don't know which kind of patient we're dealing with, P(sick | fever)
In [157]:
df.prob[:, 'sick', 'fever'].sum() / df.prob[:, :, 'fever'].sum()
Out[157]:
If we know fevergood, P(sick|fever) is
In [158]:
p_sick_fevergood = df.prob['fevergood', 'sick', 'fever'].sum() / df.prob['fevergood', :, 'fever'].sum()
p_sick_fevergood
Out[158]:
If we know feverbad, P(sick|fever) is
In [149]:
p_sick_feverbad = df.prob['feverbad', 'sick', 'fever'].sum() / df.prob['feverbad', :, 'fever'].sum()
p_sick_feverbad
Out[149]:
If we think there's a 50-50 chance of feverbad
In [159]:
(p_sick_fevergood + p_sick_feverbad) / 2
Out[159]:
If fevergood, here's the fraction of all patients with fever:
In [143]:
p_fever_fevergood = df.prob['fevergood', :, 'fever'].sum()
p_fever_fevergood
Out[143]:
If fever bad, here's the fraction of all patients with fever
In [144]:
p_fever_feverbad = df.prob['feverbad', :, 'fever'].sum()
p_fever_feverbad
Out[144]:
So if we started out thinking there's a 50-50 chance of feverbad, we should now think feverbad is less likely
In [146]:
p_feverbad_fever = p_fever_feverbad / (p_fever_feverbad + p_fever_fevergood)
p_feverbad_fever
Out[146]:
And if we compute the weighted sum of the two possible worlds
In [150]:
p_feverbad_fever * p_sick_feverbad + (1-p_feverbad_fever) * p_sick_fevergood
Out[150]:
In [ ]: