Title: Bayesian Computation with R... In Python! Slug: bayesian-computation-python Summary: An attempt to implement the examples in Bayesian Computation with R, but using Python. Date: 2018-01-23 16:03 Category: Statistics Tags: Bayesian Authors: Thomas Pinder

Introduction

This notebook is an implementation of the work by Jim Albert in his book Bayesian Computation with R. So as to to fully understand the concepts and methods within this book, I will be implementing the work in Python.

Datasets

The datasets used here are obtained from the link here

Introduction to Bayesian Thinking

The first problem covered in this chapter surrounds the idea that doctors recomment 8 hours of sleep, per night. Here a sample of 27 college students is taken, with p being the proportion of students getting the advised 8 hours of sleep a night. We can think of this problem in a binomial setting, with a success being a student getting 8 hours sleep and anything else being a failiure. This therefore makes the likelihood function $L(p)= p^s(1-p)^f, \ 0<p<1$. Within Bayesian analysis, the posterior function is a product of our likelihood function and the prior density, written as $g(p|data)\propto g(p)\star L(p)$.

Using a Discrete Prior

A simplisting approach for assessing a prior is to write down a plausible proportions and assign each value a belief.


In [94]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import collections as matcoll

%matplotlib inline

beliefs = np.arange(0.05, 1, 0.1)
priors = np.array([2, 4, 8, 8, 4, 2, 1, 1, 1, 1])

prior = priors/np.sum(priors)

def r_histogram(x_data, y_data, x_label, y_label):
    lines = []
    for i in range(len(x_data)):
        pair=[(x_data[i],0), (x_data[i], y_data[i])]
        lines.append(pair)

    linecoll = matcoll.LineCollection(lines)
    fig, ax = plt.subplots()
    ax.add_collection(linecoll)

    plt.scatter(x_data, y_data);

    plt.xticks(x_data);
    plt.ylim(0, np.max(y_data)+0.02);
    plt.xlim(0, np.max(x_data)+0.02);
    plt.xlabel(x_label)
    plt.ylabel(y_label)

r_histogram(beliefs, prior, "Proportion", "Posterior Probability")


In the book's example, 11 of the 27 students get the required amount of sleep, making the likelihood function $L(p)=p^{11}(1-p)^{16}, \ 0<p<1$. We can now compute the posterior distribution, however, in R this is done by simply calling the pdisc function so we're going to have to engineer ourselve an alternative.


In [107]:
def pdisc(prop, prior, data):
    """
    Within this function, prop must be a NumPy array of proportions. Similarily, prior a vector of prior 
    probabilities. Finally, data must be an array containing the number of successes and failiures, in 
    that order.
    """
    s = data[0]
    f = data[1]

    p = prop
    p[p<0.0001] =0.5
    p[p>0.9999] = 0.5
    
    like = s*np.log(p) + f*np.log(1-p)
    like = like*prop[prop>0]*prop[prop<1]
    like_exp = np.exp(like-np.max(like))
    
    product = like*prior
    posterior = product/np.sum(product)
    return posterior

data = np.array([11, 16])

post = pdisc(beliefs, prior, data)

r_histogram(beliefs, post, "Proportion", "Updated Posterior")



In [106]:
s = data[1]
999*((s[s>0]*beliefs[beliefs==0])+(s[s>0]*beliefs[beliefs==1]))
beliefs - np.atleast_1d(999*((s[s>0]*beliefs[beliefs==0])+(s[s>0]*beliefs[beliefs==1])))


---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-106-15ebb5dfcf62> in <module>()
      1 s = data[1]
      2 999*((s[s>0]*beliefs[beliefs==0])+(s[s>0]*beliefs[beliefs==1]))
----> 3 beliefs -np.atleast_1d(999*((s[s>0]*beliefs[beliefs==0])+(s[s>0]*beliefs[beliefs==1])))

ValueError: operands could not be broadcast together with shapes (10,) (0,)