Second Edition

Copyright 2020 Allen B. Downey

License: Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0)

```
In [2]:
```# If we're running on Colab, install empiricaldist
# https://pypi.org/project/empiricaldist/
import sys
IN_COLAB = 'google.colab' in sys.modules
if IN_COLAB:
!pip install empiricaldist

```
In [3]:
```# Get utils.py and create directories
import os
if not os.path.exists('utils.py'):
!wget https://github.com/AllenDowney/ThinkBayes2/raw/master/code/soln/utils.py
if not os.path.exists('figs'):
!mkdir figs
if not os.path.exists('tables'):
!mkdir tables

```
In [4]:
```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from empiricaldist import Pmf
from utils import decorate, savefig, write_table

```
In [5]:
```t1 = [1,3,5]

```
In [6]:
```t2 = [2,4]

The most common outer operation is the outer product, which computes the product of every pair of values, one from each sequence.

For example, here is the outer product of `t1`

and `t2`

:

```
In [7]:
```a = np.multiply.outer(t1, t2)
a

We can understand this result more easily if we put it in a DataFrame:

```
In [8]:
```df = pd.DataFrame(a, index=t1, columns=t2)
write_table(df, 'table09-01', bold_rows=True)
df

The values from `t1`

appear along the rows of the result; the values from `t2`

appear along the columns.

Each element of the result is the product of an element from `t1`

and an element from `t2`

.

*sum* of an element from `t1`

and an element from `t2`

.

```
In [9]:
```a = np.add.outer(t1, t2)
a

```
In [10]:
```df = pd.DataFrame(a, index=t1, columns=t2)
write_table(df, 'table09-02', bold_rows=True)
df

These outer operations work with Python lists and tuples, and NumPy arrays, but not Pandas Series.

However, the following function works with Pandas Series and puts the result into a DataFrame.

```
In [11]:
```def outer_product(s1, s2):
"""Compute the outer product of two Series.
First Series goes down the rows;
second goes across the columns.
s1: Series
s2: Series
return: DataFrame
"""
a = np.multiply.outer(s1.to_numpy(), s2.to_numpy())
return pd.DataFrame(a, index=s1.index, columns=s2.index)

It might not be obvious yet why these operations are useful, but we'll see some examples soon.

With that, we are ready to take on a new Bayesian problem.

Suppose I choose two people from the population of adult males in the U.S.; I'll call them A and B. If we see that A taller than B, how tall is A?

To answer this question:

I'll use background information about the height of men in the U.S. to form a prior distribution of height,

I'll construct a joint distribution of height for A and B (and I'll explain what that is);

Then I'll update the prior with the information that A is taller, and

From the posterior joint distribution I'll extract the posterior distribution of height for A.

In the U.S. the average height of male adults in 178 cm and the standard deviation is 7.7 cm. The distribution is not exactly normal, because nothing in the real world is, but the normal distribution is a pretty good model of the actual distribution, so we can use it as a prior distribution for A and B.

Here's an array of equally-spaced values from roughly 3 standard deviations below the mean to 3 standard deviations above.

```
In [12]:
```mean = 178
std = 7.7
qs = np.arange(mean-24, mean+24, 0.5)

`norm`

that represents a normal distribution with a given mean and standard deviation, and provides `pdf`

, which evaluates the probability distribution function (PDF), which we will use as the prior probabilities.

```
In [13]:
```from scipy.stats import norm
ps = norm(mean, std).pdf(qs)

I'll store the `ps`

and `qs`

in a `Pmf`

that represents the prior distribution.

```
In [14]:
```prior = Pmf(ps, qs)
prior.normalize()

And here's what it looks like.

```
In [15]:
```prior.plot(color='gray')
decorate(xlabel='Height in cm',
ylabel='Probability',
title='Approximate distribution of height for men in U.S.')

`A`

and `B`

before we take into account the data that `A`

is taller.

The next step is to construct a distribution that represents the probability of every pair of heights, which is called a joint distribution.

The elements of the joint distribution are

$P(A_y~\mathrm{and}~B_x)$

which is the probability that `A`

is $y$ cm tall and `B`

is $x$ cm tall, for all values of $y$ and $x$.

At this point all we know about `A`

and `B`

is that they are male residents of the U.S., so their heights are independent; that is, knowing the height of `A`

provides no additional information about the height of `B`

.
In that case, we can compute the joint probabilities like this:

$P(A_y~\mathrm{and}~B_x) = P(A_y)~P(B_x)$

Each joint probability is the product of one element from the distribution for `A`

and one element from the distribution for `B`

.
So we can compute the joint distribution using `outer_product`

:

```
In [16]:
```joint = outer_product(prior, prior)
joint.shape

The result is a `DataFrame`

with possible heights of `A`

along the rows, heights of `B`

along the columns, and the joint probabilities as elements.

If the prior is normalized, the joint prior should also be normalized.

```
In [17]:
```joint.to_numpy().sum()

`DataFrame`

to a `NumPy`

array before calling `sum`

. Otherwise, `DataFrame.sum`

would compute the sums of the columns and return a `Series`

.

```
In [18]:
```joint.sum()

```
In [19]:
```def plot_joint(joint):
"""Plot a joint distribution.
joint: DataFrame representing a joint PMF
"""
plt.pcolormesh(joint.columns, joint.index, joint, cmap='Blues')
plt.colorbar()
decorate(ylabel='A height in cm',
xlabel='B height in cm')

And here's what the result looks like.

```
In [20]:
```plot_joint(joint)
decorate(title='Joint prior distribution of height for A and B')
savefig('fig09-01')

As you might expect, the probability is highest near the mean and drops off away from the mean.

Another way to visualize the joint distribution is a contour plot.

```
In [21]:
```def plot_contour(joint):
"""Plot a joint distribution.
joint: DataFrame representing a joint PMF
"""
plt.contour(joint.columns, joint.index, joint)
plt.ylabel('A height in cm')
plt.xlabel('B height in cm')

```
In [22]:
```plot_contour(joint)
decorate(title='Joint prior distribution of height for A and B')

Each circle represents a level of equal probability.

Now that we have a joint prior distribution, we can update it with the data, which is that `A`

is taller than `B`

.

Each element in the joint distribution represents a hypothesis about the heights of `A`

and `B`

; for example:

The element

`(180, 170)`

represents the hypothesis that`A`

is 180 cm tall and`B`

is 170 cm tall. Under this hypothesis, the probability that`A`

is taller than`B`

is 1.The element

`(170, 180)`

represents the hypothesis that`A`

is 170 cm tall and`B`

is 180 cm tall. Under this hypothesis, the probability that`A`

is taller than`B`

is 0.

To compute the likelihood of every pair of values, we can extract the values from the prior, like this:

```
In [23]:
```Y = joint.index.to_numpy()
X = joint.columns.to_numpy()

`outer`

version of `np.subtract`

, which computes the difference between every element of `Y`

(height of `A`

) and every element of `X`

(height of `B`

).

```
In [24]:
```diff = np.subtract.outer(Y, X)

`np.where`

which puts `1`

where the `diff`

is greater than 0 and 0 elsewhere.

```
In [25]:
```a = np.where(diff>0, 1, 0)

`DataFrame`

with the values of `Y`

in the index and the values of `X`

in the columns.

```
In [26]:
```likelihood = pd.DataFrame(a, index=Y, columns=X)

Here's what it looks like:

```
In [27]:
```plot_joint(likelihood)
decorate(title='Likelihood of A>B')
savefig('fig09-02')

The likelihood of the data is 1 where `Y>X`

and 0 elsewhere.

```
In [28]:
```posterior = joint * likelihood

I'll use the following function to normalize the posterior:

```
In [29]:
```def normalize(joint):
"""Normalize a joint distribution.
joint: DataFrame
"""
prob_data = joint.to_numpy().sum()
joint /= prob_data

```
In [30]:
```normalize(posterior)

Here's what it looks like.

```
In [31]:
```plot_joint(posterior)
decorate(title='Joint posterior distribution of height for A and B')
savefig('fig09-03')

The joint posterior distribution represents what we believe about the heights of `A`

and `B`

, given the prior distributions and the information that `A`

is taller.

From this joint distribution, we can compute the posterior distributions for `A`

and `B`

. To see how, let's start with a simpler problem.

Suppose we want to know the probability that `B`

is 180 cm tall. We can select the column from the joint distribution where `X=180`

.

```
In [32]:
```column = posterior[180]

`X=180`

; if we add them up, we get the total probability that `B`

is 180 cm tall.

```
In [33]:
```column.sum()

`B`

, we can add up all of the columns, like this:

```
In [34]:
```column_sums = posterior.sum(axis=0)
column_sums

The argument `axis=0`

means we want to sum the elements along the rows; that is, we want to add up the columns.

The result is a `Series`

that contains every possible height for `B`

and its probability. In other words, it is the distribution of heights for `B`

.

We can put it in a `Pmf`

like this:

```
In [35]:
```marginal_B = Pmf(column_sums)

When we extract the distribution of a single variable from a joint distribution, the result is called a marginal distribution. The name comes from a common visualization that shows the joint distribution in the middle and the marginal distributions in the margins.

Here's what the marginal distribution for `B`

looks like:

```
In [36]:
```marginal_B.plot(label='Posterior for B', color='C1')
decorate(xlabel='Height in cm',
ylabel='Probability',
title='Posterior distribution for B')

`A`

by adding up the rows and putting the result in a `Pmf`

.

```
In [37]:
```row_sums = posterior.sum(axis=1)
marginal_A = Pmf(row_sums)

Here's what it looks like.

```
In [38]:
```marginal_A.plot(label='Posterior for A')
decorate(xlabel='Height in cm',
ylabel='Probability',
title='Posterior distribution for A')

```
In [39]:
```def marginal(joint, axis):
"""Compute a marginal distribution.
axis=1 returns the marginal distribution of the first variable
axis=0 returns the marginal distribution of the second variable
joint: DataFrame representing a joint distribution
axis: int axis to sum along
returns: Pmf
"""
return Pmf(joint.sum(axis=axis))

```
In [40]:
```marginal_B = marginal(posterior, axis=0)
marginal_A = marginal(posterior, axis=1)

Here's what they look like.

```
In [41]:
```prior.plot(label='Prior', color='gray')
marginal_A.plot(label='Posterior for A')
marginal_B.plot(label='Posterior for B')
decorate(xlabel='Height in cm',
ylabel='Probability',
title='Prior and posterior distributions for A and B')
savefig('fig09-04')

As you might expect, the posterior distribution for `A`

is shifted to the right and the posterior distribution for `B`

is shifted to the left.

We can summarize the results by computing the posterior means:

```
In [42]:
```prior.mean()

```
In [43]:
```marginal_A.mean(), marginal_B.mean()

`A`

is taller than `B`

, we are inclined to believe that `A`

is a little taller than average, and `B`

is a little shorter.

```
In [44]:
```prior.std()

```
In [45]:
```marginal_A.std(), marginal_B.std()

`A`

and `B`

after we compare them.

```
In [46]:
```column_180 = posterior[180]

`Series`

that represents possible heights for `A`

and their relative likelihoods.
These likelihoods are not normalized, but we can normalize them like this:

```
In [47]:
```cond_A = Pmf(column_180)
cond_A.normalize()

Note that when we make a `Pmf`

it copies the data by default, so we can modify `cond_A`

without affecting `column_180`

or `posterior`

.

The result is the conditional distribution of height for `A`

given that `B`

is 180 cm tall.

Here's what it looks like:

```
In [48]:
```prior.plot(label='Prior', color='gray')
marginal_A.plot(label='Posterior for A')
cond_A.plot(label='Posterior for A given B=180', color='C3')
decorate(xlabel='Height in cm',
ylabel='Probability',
title='Prior, posterior and conditional distribution for A')
savefig('fig09-05')

`A`

is taller than `B`

and `B`

is 180 cm.
And the mean of the conditional distribution is a few centimeters higher:

```
In [49]:
```marginal_A.mean(), cond_A.mean()

When we constructed the joint prior distribution, I said that the heights of `A`

and `B`

were independent, which means that knowing one of them provides no information about the other.
In other words, the conditional probability $P(A_y | B_x)$ is the same as the unconditioned probability $P(A_y)$.

That's why we can compute an element of the joint prior, $P(A_y~\mathrm{and}~B_x)$, by rewriting it in terms of conditional probability, $P(B_x)~P(A_y~|~B_x)$, and using the independence of $A$ and $B$ to replace the conditional probability.

Putting it all together, we have

$P(A_y~\mathrm{and}~B_x) = P(B_x)~P(A_y)$

But remember, that's only true if $A$ and $B$ are independent.

In the posterior distribution, they are not.

We know that `A`

is taller than `B`

, so if we know how tall `B`

is, that gives us information about `A`

.

The conditional distribution we just computed demonstrates this dependence.

**Exercise:** Based on the results of the previous example, compute the posterior conditional distribution for `B`

given that `A`

is 190 cm.

Hint: Use `loc`

to select a row from a `DataFrame`

.

```
In [50]:
```row_190 = posterior.loc[190]
cond_B = Pmf(row_190)
cond_B.normalize()

```
In [73]:
```cond_B.plot(label='Posterior for B given A=190', color='C1')
decorate(xlabel='Height in cm',
ylabel='Probability',
title='Conditional distribution for B')

**Exercise:** Suppose we have established that `A`

is taller than `B`

, but we don't know how tall `B`

is.
Now we choose a random woman, `C`

, and find that she is shorter than `A`

by at least 15 cm. Compute posterior distributions for the heights of `A`

and `C`

.

The average height for women in the U.S. is 163 cm; the standard deviation is 7.3 cm.

```
In [52]:
``````
# Solution goes here
```

```
In [53]:
``````
# Solution goes here
```

```
In [54]:
``````
# Solution goes here
```

```
In [55]:
``````
# Solution goes here
```

```
In [56]:
``````
# Solution goes here
```

```
In [57]:
``````
# Solution goes here
```

```
In [58]:
``````
# Solution goes here
```

```
In [59]:
``````
# Solution goes here
```

**Exercise:** The Elo rating system is a way to quantify the skill level of players for games like chess.

It is based on a model of the relationship between the ratings of players and the outcome of a game. Specifically, if $R_A$ is the rating of player `A`

and $R_B$ is the rating of player `B`

, the probability that `A`

beats `B`

is given by the logistic function:

$P(\mathrm{A~beats~B}) = 1 / (1 + 10^{(R_B-R_A)/400})$

The parameters $10$ and $400$ are arbitrary choices that determine the range of the ratings. In chess, the range is from 100 to 2800.

Notice that the probability of winning depends only on the difference in rankings. As an example, if $R_A$ exceeds $R_B$ by 100 points, the probability that `A`

wins is

```
In [60]:
```1 / (1 + 10**(-100/400))

Suppose `A`

has a current rating of 1600, but we are not sure it is accurate. We could describe their true rating with a normal distribution with mean 1600 and standard deviation 100, to indicate our uncertainty.

And suppose `B`

has a current rating of 1800, with the same level of uncertainty.

Then `A`

and `B`

play and `A`

wins. How should we update their ratings?

To answer this question:

Construct prior distributions for

`A`

and`B`

.Use them to construct a joint distribution, assuming that the prior distributions are independent.

Use the logistic function above to compute the likelihood of the outcome under each joint hypothesis.

Use the joint prior and likelihood to compute the joint posterior.

Extract and plot the marginal posteriors for

`A`

and`B`

.Compute the posterior means for

`A`

and`B`

. How much should their ratings change based on this outcome?

```
In [61]:
``````
# Solution goes here
```

```
In [62]:
``````
# Solution goes here
```

```
In [63]:
``````
# Solution goes here
```

```
In [64]:
``````
# Solution goes here
```

```
In [65]:
``````
# Solution goes here
```

```
In [66]:
``````
# Solution goes here
```

```
In [67]:
``````
# Solution goes here
```

```
In [68]:
``````
# Solution goes here
```

```
In [69]:
``````
# Solution goes here
```

```
In [70]:
``````
# Solution goes here
```

```
In [71]:
``````
# Solution goes here
```

```
In [72]:
``````
# Solution goes here
```

In this notebook I started with the "outer" operations, like outer product, which we used to construct a joint distribution.

In general, you cannot construct a joint distribution from two marginal distributions, but in the special case where the distributions are independent, you can.

We extended the Bayesian update process we've seen in previous notebooks and applied it to a joint distribution. Then from the posterior joint distribution we extracted posterior marginal distributions and posterior conditional distributions.

As an exercise, you had a chance to apply the same process to a slightly more difficult problem, updating Elo ratings based on the outcome of a chess game.

```
In [ ]:
```