Title: Bayesian Computation with R... In Python! Slug: bayesian-computation-python Summary: An attempt to implement the examples in Bayesian Computation with R, but using Python. Date: 2018-01-23 16:03 Category: Statistics Tags: Bayesian Authors: Thomas Pinder
This notebook is an implementation of the work by Jim Albert in his book Bayesian Computation with R. So as to to fully understand the concepts and methods within this book, I will be implementing the work in Python.
The datasets used here are obtained from the link here
The first problem covered in this chapter surrounds the idea that doctors recomment 8 hours of sleep, per night. Here a sample of 27 college students is taken, with p being the proportion of students getting the advised 8 hours of sleep a night. We can think of this problem in a binomial setting, with a success being a student getting 8 hours sleep and anything else being a failiure. This therefore makes the likelihood function $L(p)= p^s(1-p)^f, \ 0<p<1$. Within Bayesian analysis, the posterior function is a product of our likelihood function and the prior density, written as $g(p|data)\propto g(p)\star L(p)$.
A simplisting approach for assessing a prior is to write down a plausible proportions and assign each value a belief.
In [94]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import collections as matcoll
%matplotlib inline
beliefs = np.arange(0.05, 1, 0.1)
priors = np.array([2, 4, 8, 8, 4, 2, 1, 1, 1, 1])
prior = priors/np.sum(priors)
def r_histogram(x_data, y_data, x_label, y_label):
lines = []
for i in range(len(x_data)):
pair=[(x_data[i],0), (x_data[i], y_data[i])]
lines.append(pair)
linecoll = matcoll.LineCollection(lines)
fig, ax = plt.subplots()
ax.add_collection(linecoll)
plt.scatter(x_data, y_data);
plt.xticks(x_data);
plt.ylim(0, np.max(y_data)+0.02);
plt.xlim(0, np.max(x_data)+0.02);
plt.xlabel(x_label)
plt.ylabel(y_label)
r_histogram(beliefs, prior, "Proportion", "Posterior Probability")
In the book's example, 11 of the 27 students get the required amount of sleep, making the likelihood function $L(p)=p^{11}(1-p)^{16}, \ 0<p<1$. We can now compute the posterior distribution, however, in R this is done by simply calling the pdisc
function so we're going to have to engineer ourselve an alternative.
In [107]:
def pdisc(prop, prior, data):
"""
Within this function, prop must be a NumPy array of proportions. Similarily, prior a vector of prior
probabilities. Finally, data must be an array containing the number of successes and failiures, in
that order.
"""
s = data[0]
f = data[1]
p = prop
p[p<0.0001] =0.5
p[p>0.9999] = 0.5
like = s*np.log(p) + f*np.log(1-p)
like = like*prop[prop>0]*prop[prop<1]
like_exp = np.exp(like-np.max(like))
product = like*prior
posterior = product/np.sum(product)
return posterior
data = np.array([11, 16])
post = pdisc(beliefs, prior, data)
r_histogram(beliefs, post, "Proportion", "Updated Posterior")
In [106]:
s = data[1]
999*((s[s>0]*beliefs[beliefs==0])+(s[s>0]*beliefs[beliefs==1]))
beliefs - np.atleast_1d(999*((s[s>0]*beliefs[beliefs==0])+(s[s>0]*beliefs[beliefs==1])))