Bayesian interpretation of medical tests

This notebooks explores several problems related to interpreting the results of medical tests.

MIT License: http://opensource.org/licenses/MIT



In [1]:

    
from __future__ import print_function, division

from thinkbayes2 import Pmf, Suite

from fractions import Fraction

Medical tests

Suppose we test a patient to see if they have a disease, and the test comes back positive. What is the probability that the patient is actually sick (that is, has the disease)?

To answer this question, we need to know:

The prevalence of the disease in the population the patient is from. Let's assume the patient is identified as a member of a population where the known prevalence is p.
The sensitivity of the test, s, which is the probability of a positive test if the patient is sick.
The false positive rate of the test, t, which is the probability of a positive test if the patient is not sick.

Given these parameters, we can compute the probability that the patient is sick, given a positive test.

Test class

To do that, I'll define a Test class that extends Suite, so it inherits Update and provides Likelihood.

The instance variables of Test are:

p, s, and t: Copies of the parameters.
d: a dictionary that maps from hypotheses to their probabilities. The hypotheses are the strings sick and notsick.
likelihood: a dictionary that encodes the likelihood of the possible data values pos and neg under the hypotheses.



In [2]:

    
class Test(Suite):
    """Represents beliefs about a patient based on a medical test."""
    
    def __init__(self, p, s, t, label='Test'):
        # initialize the prior probabilities
        d = dict(sick=p, notsick=1-p)
        super(Test, self).__init__(d, label)
        
        # store the parameters
        self.p = p
        self.s = s
        self.t = t
        
        # make a nested dictionary to compute likelihoods
        self.likelihood = dict(pos=dict(sick=s, notsick=t),
                               neg=dict(sick=1-s, notsick=1-t))
        
    def Likelihood(self, data, hypo):
        """
        data: 'pos' or 'neg'
        hypo: 'sick' or 'notsick'
        """
        return self.likelihood[data][hypo]

Now we can create a Test object with parameters chosen for demonstration purposes (most medical tests are better than this!):



In [3]:

    
p = Fraction(1, 10)     # prevalence
s = Fraction(9, 10)     # sensitivity
t = Fraction(3, 10)     # false positive rate

test = Test(p, s, t)
test.Print()









    



notsick 9/10
sick 1/10

If you are curious, here's the nested dictionary that computes the likelihoods:



In [4]:

    
test.likelihood









    Out[4]:





{'neg': {'notsick': Fraction(7, 10), 'sick': Fraction(1, 10)},
 'pos': {'notsick': Fraction(3, 10), 'sick': Fraction(9, 10)}}

And here's how we update the Test object with a positive outcome:



In [5]:

    
test.Update('pos')
test.Print()









    



notsick 3/4
sick 1/4

The positive test provides evidence that the patient is sick, increasing the probability from 0.1 to 0.25.

Uncertainty about `t`

So far, this is basic Bayesian inference. Now let's add a wrinkle. Suppose that we don't know the value of t with certainty, but we have reason to believe that t is either 0.2 or 0.4 with equal probability.

Again, we would like to know the probability that a patient who tests positive actually has the disease. As we did with the Red Die problem, we will consider several scenarios:

Scenario A: The patients are drawn at random from the relevant population, and the reason we are uncertain about t is that either (1) there are two versions of the test, with different false positive rates, and we don't know which test was used, or (2) there are two groups of people, the false positive rate is different for different groups, and we don't know which group the patient is in.

Scenario B: As in Scenario A, the patients are drawn at random from the relevant population, but the reason we are uncertain about t is that previous studies of the test have been contradictory. That is, there is only one version of the test, and we have reason to believe that t is the same for all groups, but we are not sure what the correct value of t is.

Scenario C: As in Scenario A, there are two versions of the test or two groups of people. But now the patients are being filtered so we only see the patients who tested positive and we don't know how many patients tested negative. For example, suppose you are a specialist and patients are only referred to you after they test positive.

Scenario D: As in Scenario B, we have reason to think that t is the same for all patients, and as in Scenario C, we only see patients who test positive and don't know how many tested negative.

Scenario A

We can represent this scenario with a hierarchical model, where the levels of the hierarchy are:

At the top level, the possible values of t and their probabilities.
At the bottom level, the probability that the patient is sick or not, conditioned on t.

To represent the hierarchy, I'll define a MetaTest, which is a Suite that contains Test objects with different values of t as hypotheses.



In [6]:

    
class MetaTest(Suite):
    """Represents a set of tests with different values of `t`."""
    
    def Likelihood(self, data, hypo):
        """
        data: 'pos' or 'neg'
        hypo: Test object
        """
        # the return value from `Update` is the total probability of the
        # data for a hypothetical value of `t`
        return hypo.Update(data)

To update a MetaTest, we update each of the hypothetical Test objects. The return value from Update is the normalizing constant, which is the total probability of the data under the hypothesis.

We use the normalizing constants from the bottom level of the hierarchy as the likelihoods at the top level.

Here's how we create the MetaTest for the scenario we described:



In [7]:

    
q = Fraction(1, 2)
t1 = Fraction(2, 10)
t2 = Fraction(4, 10)

test1 = Test(p, s, t1, 'Test(t=0.2)')
test2 = Test(p, s, t2, 'Test(t=0.4)')

metatest = MetaTest({test1:q, test2:1-q})
metatest.Print()









    



Test(t=0.2) 1/2
Test(t=0.4) 1/2

At the top level, there are two tests, with different values of t. Initially, they are equally likely.

When we update the MetaTest, it updates the embedded Test objects and then the MetaTest itself.



In [8]:

    
metatest.Update('pos')









    Out[8]:





Fraction(9, 25)

Here are the results.



In [9]:

    
metatest.Print()









    



Test(t=0.4) 5/8
Test(t=0.2) 3/8

Because a positive test is more likely if t=0.4, the positive test is evidence in favor of the hypothesis that t=0.4.

This MetaTest object represents what we should believe about t after seeing the test, as well as what we should believe about the probability that the patient is sick.

Marginal distributions

To compute the probability that the patient is sick, we have to compute the marginal probabilities of sick and notsick, averaging over the possible values of t. The following function computes this distribution:



In [10]:

    
def MakeMixture(metapmf, label='mix'):
    """Make a mixture distribution.

    Args:
      metapmf: Pmf that maps from Pmfs to probs.
      label: string label for the new Pmf.

    Returns: Pmf object.
    """
    mix = Pmf(label=label)
    for pmf, p1 in metapmf.Items():
        for x, p2 in pmf.Items():
            mix.Incr(x, p1 * p2)
    return mix

Here's the posterior predictive distribution:



In [11]:

    
predictive = MakeMixture(metatest)
predictive.Print()









    



notsick 3/4
sick 1/4

After seeing the test, the probability that the patient is sick is 0.25, which is the same result we got with t=0.3.

Two patients

Now suppose you test two patients and they both test positive. What is the probability that they are both sick?

To answer that, I define a few more functions to work with Metatests:



In [12]:

    
def MakeMetaTest(p, s, pmf_t):
    """Makes a MetaTest object with the given parameters.
        
    p: prevalence
    s: sensitivity
    pmf_t: Pmf of possible values for `t`
    """
    tests = {}
    for t, q in pmf_t.Items():
        label = 'Test(t=%s)' % str(t)
        tests[Test(p, s, t, label)] = q
    return MetaTest(tests)

def Marginal(metatest):
    """Extracts the marginal distribution of t.
    """
    marginal = Pmf()
    for test, prob in metatest.Items():
        marginal[test.t] = prob
    return marginal

def Conditional(metatest, t):
    """Extracts the distribution of sick/notsick conditioned on t."""
    for test, prob in metatest.Items():
        if test.t == t:
            return test

MakeMetaTest makes a MetaTest object starting with a given PMF of t.

Marginal extracts the PMF of t from a MetaTest.

Conditional takes a specified value for t and returns the PMF of sick and notsick conditioned on t.

I'll test these functions using the same parameters from above:



In [13]:

    
pmf_t = Pmf({t1:q, t2:1-q})
metatest = MakeMetaTest(p, s, pmf_t)
metatest.Print()









    



Test(t=1/5) 1/2
Test(t=2/5) 1/2

Here are the results



In [14]:

    
metatest = MakeMetaTest(p, s, pmf_t)
metatest.Update('pos')
metatest.Print()









    



Test(t=2/5) 5/8
Test(t=1/5) 3/8

Same as before. Now we can extract the posterior distribution of t.



In [15]:

    
Marginal(metatest).Print()

Having seen one positive test, we are a little more inclined to believe that t=0.4; that is, that the false positive rate for this patient/test is high.

And we can extract the conditional distributions for the patient:



In [16]:

    
cond1 = Conditional(metatest, t1)
cond1.Print()









    



notsick 2/3
sick 1/3



In [17]:

    
cond2 = Conditional(metatest, t2)
cond2.Print()









    



notsick 4/5
sick 1/5

Finally, we can make the posterior marginal distribution of sick/notsick, which is a weighted mixture of the conditional distributions:



In [18]:

    
MakeMixture(metatest).Print()









    



notsick 3/4
sick 1/4

At this point we have a MetaTest that contains our updated information about the test (the distribution of t) and about the patient that tested positive.

Now, to compute the probability that both patients are sick, we have to know the distribution of t for both patients. And that depends on details of the scenario.

In Scenario A, the reason we are uncertain about t is either (1) there are two versions of the test, with different false positive rates, and we don't know which test was used, or (2) there are two groups of people, the false positive rate is different for different groups, and we don't know which group the patient is in.

So the value of t for each patient is an independent choice from pmf_t; that is, if we learn something about t for one patient, that tells us nothing about t for other patients.

So if we consider two patients who have tested positive, the MetaTest we just computed represents our belief about each of the two patients independently.

To compute the probability that both patients are sick, we can convolve the two distributions.



In [19]:

    
convolution = metatest + metatest
convolution.Print()









    



Pmf({'notsicknotsick': Fraction(8, 15), 'sicksick': Fraction(1, 15), 'sicknotsick': Fraction(2, 15), 'notsicksick': Fraction(4, 15)}) 15/64
Pmf({'notsicknotsick': Fraction(16, 25), 'sicksick': Fraction(1, 25), 'sicknotsick': Fraction(4, 25), 'notsicksick': Fraction(4, 25)}) 25/64
Pmf({'notsicknotsick': Fraction(8, 15), 'sicksick': Fraction(1, 15), 'sicknotsick': Fraction(4, 15), 'notsicksick': Fraction(2, 15)}) 15/64
Pmf({'notsicknotsick': Fraction(4, 9), 'sicksick': Fraction(1, 9), 'sicknotsick': Fraction(2, 9), 'notsicksick': Fraction(2, 9)}) 9/64

Then we can compute the posterior marginal distribution of sick/notsick for the two patients:



In [20]:

    
marginal = MakeMixture(metatest+metatest)
marginal.Print()









    



notsicknotsick 9/16
notsicksick 3/16
sicknotsick 3/16
sicksick 1/16

So in Scenario A the probability that both patients are sick is 1/16.

As an aside, we could have computed the marginal distributions first and then convolved them, which is computationally more efficient:



In [21]:

    
marginal = MakeMixture(metatest) + MakeMixture(metatest)
marginal.Print()









    



notsicknotsick 9/16
notsicksick 3/16
sicknotsick 3/16
sicksick 1/16

We can confirm that this result is correct by simulation. Here's a generator that generates random pairs of patients:



In [22]:

    
from random import random

def flip(p):
    return random() < p

def generate_pair_A(p, s, pmf_t):
    while True:
        sick1, sick2 = flip(p), flip(p)
        
        t = pmf_t.Random()
        test1 = flip(s) if sick1 else flip(t)

        t = pmf_t.Random()
        test2 = flip(s) if sick2 else flip(t)

        yield test1, test2, sick1, sick2

And here's a function that runs the simulation for a given number of iterations:



In [23]:

    
def run_simulation(generator, iters=100000):
    pmf_t = Pmf([0.2, 0.4])
    pair_iterator = generator(0.1, 0.9, pmf_t)

    outcomes = Pmf()
    for i in range(iters):
        test1, test2, sick1, sick2 = next(pair_iterator)
        if test1 and test2:
            outcomes[sick1, sick2] += 1

    outcomes.Normalize()
    return outcomes



In [24]:

    
outcomes = run_simulation(generate_pair_A)
outcomes.Print()









    



(False, False) 0.5635400907715582
(False, True) 0.18267776096822994
(True, False) 0.19130105900151284
(True, True) 0.062481089258698934

As we increase iters, the probablity of (True, True) converges on 1/16, which is what we got from the analysis.

Good so far!

Scenario B

In Scenario B, we have reason to believe the t is the same for all patients, but we are not sure what it is. So each time we see a positive test, we get some information about t for all patients.

The first time we see positive test we do the same update as in Scenario A:



In [25]:

    
metatest1 = MakeMetaTest(p, s, pmf_t)
metatest1.Update('pos')
metatest1.Print()









    



Test(t=2/5) 5/8
Test(t=1/5) 3/8

And the marginal distribution of sick/notsick is the same:



In [26]:

    
marginal = MakeMixture(metatest1)
marginal.Print()









    



notsick 3/4
sick 1/4

Now suppose the second patient arrives. We need a new MetaTest that contains the updated information about the test, but no information about the patient other than the prior probability of being sick, p:



In [27]:

    
metatest2 = MakeMetaTest(p, s, Marginal(metatest1))
metatest2.Print()









    



Test(t=1/5) 3/8
Test(t=2/5) 5/8

Now we can update this MetaTest with the result from the second test:



In [28]:

    
metatest2.Update('pos')
metatest2.Print()









    



Test(t=2/5) 25/34
Test(t=1/5) 9/34

This distribution contains updated information about the test, based on two positive outcomes, and updated information about a patient who has tested positive (once).

After seeing two patients with positive tests, the probability that t=0.4 has increased to 25/34, around 74%.

For either patient, the probability of being sick is given by the marginal distribution from metatest2:



In [29]:

    
predictive = MakeMixture(metatest2)
predictive.Print()









    



notsick 13/17
sick 4/17

After two tests, the probability that the patient is sick is slightly lower than after one (4/17 is about 23.5%, compared to 25%). That's because the second positive test increases our belief that the false positive rate is high (t=0.4), which decreases our belief that either patient is sick.

Now, to compute the probability that both are sick, we can't just convolve the posterior marginal distribution with itself, as we did in Scenario A, because the selection of t is not independent for the two patients. Instead, we have to make a weighted mixture of conditional distributions.

If we know t=t1, we can compute the joint distribution for the two patients:



In [30]:

    
cond_t1 = Conditional(metatest2, t1)
conjunction_t1 = cond_t1 + cond_t1
conjunction_t1.Print()









    



notsicknotsick 4/9
notsicksick 2/9
sicknotsick 2/9
sicksick 1/9

If we know that t=t1, the probability of sicksick is 0.111. And for t=t2:



In [31]:

    
cond_t2 = Conditional(metatest2, t2)
conjunction_t2 = cond_t2 + cond_t2
conjunction_t2.Print()









    



notsicknotsick 16/25
notsicksick 4/25
sicknotsick 4/25
sicksick 1/25

If we know that t=t2, the probability of sicksick is 0.04.

The overall probability of sicksick is the weighted average of these probabilities:



In [32]:

    
posterior_t = Marginal(metatest2)
posterior_t[t1] * conjunction_t1['sicksick'] + posterior_t[t2] * conjunction_t2['sicksick']









    Out[32]:





Fraction(1, 17)

1/17 is about 0.0588, slightly smaller than in Scenario A (1/16, which is about 0.0667).

To compute the probabilities for all four outcomes, I'll make a Metapmf that contains the two conditional distributions.



In [33]:

    
metapmf = Pmf()
for t, prob in Marginal(metatest2).Items():
    cond = Conditional(metatest2, t)
    conjunction = cond + cond
    metapmf[conjunction] = prob
    
metapmf.Print()









    



Pmf({'notsicknotsick': Fraction(16, 25), 'sicksick': Fraction(1, 25), 'sicknotsick': Fraction(4, 25), 'notsicksick': Fraction(4, 25)}) 25/34
Pmf({'notsicknotsick': Fraction(4, 9), 'sicksick': Fraction(1, 9), 'sicknotsick': Fraction(2, 9), 'notsicksick': Fraction(2, 9)}) 9/34

And finally we can use MakeMixture to compute the weighted averages of the posterior probabilities:



In [34]:

    
predictive = MakeMixture(metapmf)
predictive.Print()









    



notsicknotsick 10/17
notsicksick 3/17
sicknotsick 3/17
sicksick 1/17

To confirm that this result is correct, I'll use the simuation again with a different generator:



In [35]:

    
def generate_pair_B(p, s, pmf_t):
    while True:
        sick1, sick2 = flip(p), flip(p)
        
        t = pmf_t.Random()
        test1 = flip(s) if sick1 else flip(t)

        #  Here's the difference
        #  t = pmf_t.Random()
        test2 = flip(s) if sick2 else flip(t)

        yield test1, test2, sick1, sick2

The difference between Scenario A and Scenario B is the line I commented out. In Scenario B, we generate t once and it applies to both patients.



In [36]:

    
outcomes = run_simulation(generate_pair_B)
outcomes.Print()









    



(False, False) 0.5922922348213004
(False, True) 0.17568537390555478
(True, False) 0.17145112674034738
(True, True) 0.06057126453279748

As iters increases, the results from the simulation converge on 1/17.

Summary so far

In summary:

                    P(sick|pos)             P(sicksick|pospos)
Scenario A          1/4 = 25%               1/16 = 6.25%
Scenario B          1/4 = 25%               1/17 ~= 5.88%

If we are only interested in one patient at a time, Scenarios A and B are the same. But for collections of patients, they yield different probabilities.

A real scenario might combine elements of A and B; that is, the false positive rate might be different for different people, and we might have some uncertainty about what it is. In that case, the most accurate probability for two patients might be anywhere between 1/16 and 1/17.

Scenario C

Scenario C is similar to Scenario A: we believe that the false positive rate t might be different for different people, or for different versions of the test. The difference is that in Scenario A we see all patients, sick or not, positive test or not.

In Scenario C, we only see patients after they have tested positive, and we don't know how many tested negative. For example, if you are a specialist and patients are referred to you only if they test positive, Scenario C might be a good model of your situation.

Before I analyze this scenario, I'll start with a simulation. As a reminder, here's a generator that generates pairs of patients in Scenario A:



In [37]:

    
def generate_pair_A(p, s, pmf_t):
    while True:
        sick1, sick2 = flip(p), flip(p)
        
        t = pmf_t.Random()
        test1 = flip(s) if sick1 else flip(t)

        t = pmf_t.Random()
        test2 = flip(s) if sick2 else flip(t)

        yield test1, test2, sick1, sick2

And here's the simulator that uses the generator to estimate the probability that two patients who test positive are both sick.



In [38]:

    
def run_simulation(generator, iters=100000):
    pmf_t = Pmf([0.2, 0.4])
    pair_iterator = generator(0.1, 0.9, pmf_t)

    outcomes = Pmf()
    for i in range(iters):
        test1, test2, sick1, sick2 = next(pair_iterator)
        if test1 and test2:
            outcomes[sick1, sick2] += 1

    outcomes.Normalize()
    return outcomes

As we saw before, this probability converges on $1/16$.



In [39]:

    
outcomes = run_simulation(generate_pair_A)
outcomes.Print()









    



(False, False) 0.5723947125730094
(False, True) 0.18567476175837688
(True, False) 0.1796802951122041
(True, True) 0.062250230556409464

Now here's a generator that generates pairs of patients in Scenario C. The difference is that for each pair we check the outcome of the tests; if they are not both positive, we loop back and try again:



In [40]:

    
def generate_pair_C(p, s, pmf_t):
    while True:
        sick1, sick2 = flip(p), flip(p)
        
        t = pmf_t.Random()
        test1 = flip(s) if sick1 else flip(t)

        t = pmf_t.Random()
        test2 = flip(s) if sick2 else flip(t)

        # here is the difference
        if test1 and test2:
            yield test1, test2, sick1, sick2

When we run it, it seems like the probability is still 1/16:



In [41]:

    
outcomes = run_simulation(generate_pair_C)
outcomes.Print()









    



(False, False) 0.56137
(False, True) 0.18819000000000002
(True, False) 0.18706
(True, True) 0.06338

If you examine the code, you see that the conditional in generate_pair_C makes no difference because it is redundant with the conditional in run_simulation. In Scenarios A and C, we filter out pairs if they are not both positive; it doesn't matter whether the filtering happens in the generator or the simulator.

In fact, Scenarios A and C are identical. In both scenarios, when we see a patient with a positive test, we learn something about the patient (more likely to be sick) and something about the particular test applied to the patient (more likely to generate false positives).

This is similar to what we saw in the Red Die problem. In Scenario C, the reddish die is more likely to produce a red outcome, so a red outcome provides evidence that we rolled the reddish die.

However, that is not the case with Scenario D.

Scenario D

As a reminder, Scenario D is similar to B: we have reason to think that t is either 0.2 or 0.4 for everyone. The difference in Scenario D is that we only see patients if they test positive.

Here's a generator that generates single patients:



In [42]:

    
def generate_patient_D(p, s, pmf_t):
    while True:
        # choose t
        t = pmf_t.Random()
        
        # generate patients until positive test
        while True:
            sick = flip(p)
            test = flip(s) if sick else flip(t)
            if test:
                yield test, sick
                break

And here's a simulator that counts the fraction of positive tests that turn out to be sick:



In [43]:

    
def run_single_simulation(generator, iters=100000):
    pmf_t = Pmf([0.2, 0.4])
    iterator = generator(0.1, 0.9, pmf_t)

    outcomes = Pmf()
    for i in range(iters):
        test, sick = next(iterator)
        if test:
            outcomes[sick] += 1

    outcomes.Normalize()
    return outcomes

When we run the simulation, it doesn't look like it converges to 1/4 as it does in the other three scenarios.



In [44]:

    
outcomes = run_single_simulation(generate_patient_D)
outcomes.Print()









    



False 0.7319500000000001
True 0.26805

So how can we analyze this scenario?

The key is to realize that, as in Scenario D of the Red Dice problem, if we roll until we get red, we don't learn anything about the die we rolled, and in this case, if we generate patients until we get a positive test, we don't learn anything about t. The likelihood of the data (a positive test) is 1, regardless of t.

We can compute the probablity the patient is sick by creating a MetaTest and updating only the lower level (the Test objects) but not the upper level (the distribution of t).



In [45]:

    
metatest = MakeMetaTest(p, s, pmf_t)
for hypo in metatest:
    hypo.Update('pos')

After the update, the marginal distribution of t is unchanged:



In [46]:

    
Marginal(metatest).Print()

But the conditional probabilities have been updated:



In [47]:

    
Conditional(metatest, t1).Print()









    



notsick 2/3
sick 1/3



In [48]:

    
Conditional(metatest, t2).Print()









    



notsick 4/5
sick 1/5

We can use MakeMixture to compute the weighted average of the conditional distributions.



In [49]:

    
MakeMixture(metatest).Print()









    



notsick 11/15
sick 4/15

So in Scenario D, a patient who tests positive has a probability of 4/15 of being sick, which is about 26.7%, and consistent with the simulation.

That's a little higher than in the other three Scenarios, because we have less reason to think that t is high.

Scenario D, two patients

Now let's see what happens with two patients. Here's a generator that generates pairs of patients:



In [50]:

    
def generate_pair_D(p, s, pmf_t):
    while True:
        t = pmf_t.Random()
        while True:
            sick1, sick2 = flip(p), flip(p)
        
            test1 = flip(s) if sick1 else flip(t)
            test2 = flip(s) if sick2 else flip(t)

            if test1 and test2:
                yield test1, test2, sick1, sick2
                break

And here's what we get when we run the simulation:



In [51]:

    
outcomes = run_simulation(generate_pair_D, iters=1000000)
outcomes.Print()









    



(False, False) 0.541945
(False, True) 0.191442
(True, False) 0.191182
(True, True) 0.075431

It looks like the probability that both patients are sick is higher than 1/16.

We can compute the result exactly using the posterior distribution and the same method we used in Scenario B, computing the mixture of two conjunctions:



In [52]:

    
def MixConjunctions(metatest):
    metapmf = Pmf()
    for t, prob in Marginal(metatest).Items():
        cond = Conditional(metatest, t)
        conjunction = cond + cond
        metapmf[conjunction] = prob
    
    return MakeMixture(metapmf)



In [53]:

    
MixConjunctions(metatest).Print()









    



notsicknotsick 122/225
notsicksick 43/225
sicknotsick 43/225
sicksick 17/225

In Scenario D, the probability that both patients are sick is 17/225, or about 0.0755, which is consistent with the simulation and, again, a little higher than in the other scenarios.