Authors:
Probability of a girl birth given placenta previa (BDA3 p. 37). Illustrate the effect of a prior. Comparison of posterior distributions with different parameter values for Beta prior distribution.
In [1]:
# import necessary packages
import numpy as np
from scipy.stats import beta
%matplotlib inline
import matplotlib.pyplot as plt
In [2]:
# add utilities directory to path
import os, sys
util_path = os.path.abspath(os.path.join(os.path.pardir, 'utilities_and_data'))
if util_path not in sys.path and os.path.exists(util_path):
sys.path.insert(0, util_path)
# import from utilities
import plot_tools
In [3]:
# edit default plot settings
plt.rc('font', size=12)
In [4]:
# grid
x = np.linspace(0.375, 0.525, 150)
# posterior with data (437,543) and uniform prior Beta(1,1)
au = 438
bu = 544
# calculate densities
pdu = beta.pdf(x, au, bu)
# compare 3 cases
# arrays of different priors:
# Beta(0.485*n, (1-0.485)*n), for n = 2, 20, 200
ap = np.array([0.485 * (2*10**i) for i in range(3)])
bp = np.array([(1-0.485) * (2*10**i) for i in range(3)])
# corresponding posteriors with data (437,543)
ai = 437 + ap
bi = 543 + bp
# calculate prior and posterior densities
pdp = beta.pdf(x, ap[:,np.newaxis], bp[:,np.newaxis])
pdi = beta.pdf(x, ai[:,np.newaxis], bi[:,np.newaxis])
The above two expressions uses numpy broadcasting inside the beta.pdf
function. Arrays ap
and bp
have shape (3,) i.e. they are 1d arrays of length 3. Array x
has shape (150,) and the output pdp
is an array of shape (3, 150).
Instead of using the beta.pdf
function, we could have also calculated other arithmetics. For example
out = x + (ap * bp)[:, np.newaxis]
returns an array of shape (3, 150), where each element out[i, j] = x[j] + ap[i] * bp[i]
.
With broadcasting, unnecessary repetition is avoided, i.e. it is not necessary to create an array of ap
repeated 150 times into the memory. More info can be found on the numpy documentation (search for broadcasting).
In [5]:
# plot 3 subplots
fig, axes = plt.subplots(
nrows=3, ncols=1, sharex=True, sharey=True, figsize=(8, 12))
# show only x-axis
plot_tools.modify_axes.only_x(axes)
# manually adjust spacing
fig.subplots_adjust(hspace=0.4)
# 3 subplots
for i, ax in enumerate(axes):
# plot three precalculated densities
post1, = ax.plot(x, pdu, color=plot_tools.lighten('C1'), linewidth=5)
prior, = ax.plot(x, pdp[i], 'k:')
post2, = ax.plot(x, pdi[i], color='k', dashes=(6, 8))
# add vertical line
known = ax.axvline(0.485, color='C0')
# set the title for this subplot
ax.set_title(
r'$\alpha/(\alpha+\beta) = 0.485,\quad \alpha+\beta = {}$'
.format(2*10**i)
)
# limit x-axis
axes[0].autoscale(axis='x', tight=True)
axes[0].set_ylim((0,30))
# add legend to the last subplot
axes[-1].legend(
(post1, prior, post2, known),
( 'posterior with uniform prior',
'informative prior',
'posterior with informative prior',
r'known $\theta=0.485$ in general population'),
loc='upper center',
bbox_to_anchor=(0.5, -0.2)
);