Think Bayes

Copyright 2018 Allen B. Downey

MIT License: https://opensource.org/licenses/MIT


In [1]:
# Configure Jupyter so figures appear in the notebook
%matplotlib inline

# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'

import numpy as np

# import classes from thinkbayes2
from thinkbayes2 import Pmf, Cdf, Suite, Joint

import thinkplot

Bayesian regression

This notebook presents a simple example of Bayesian regression using sythetic data

Data

Suppose there is a linear relationship between x and y with slope 2 and intercept 1, but the measurements of y are noisy; specifically, the noise is Gaussian with mean 0 and sigma = 0.3.


In [2]:
slope = 2
inter = 1
sigma = 0.3

In [3]:
xs = np.linspace(0, 1, 6)

In [4]:
ys = inter + slope * xs + np.random.normal(0, sigma, len(xs))

In [5]:
thinkplot.plot(xs, ys)
thinkplot.decorate(xlabel='x',
                   ylabel='y')

Grid algorithm

We can solve the problem first using a grid algorithm, with uniform priors for slope, intercept, and sigma.

As an exercise, fill in this likelihood function, then test it using the code below.

Your results will depend on the random data you generated, but in general you should find that the posterior marginal distributions peak near the actual parameters.


In [6]:
from scipy.stats import norm

class Regress(Suite, Joint):
    
    def Likelihood(self, data, hypo):
        """
        
        data: x, y
        hypo: slope, inter, sigma
        """
        return 1

In [7]:
# Solution goes here

In [8]:
params = np.linspace(-4, 4, 21)

In [9]:
sigmas = np.linspace(0.1, 2, 20)

In [10]:
from itertools import product
hypos = product(params, params, sigmas)

In [11]:
suite = Regress(hypos);

In [12]:
for data in zip(xs, ys):
    suite.Update(data)

In [13]:
thinkplot.Pdf(suite.Marginal(0))
thinkplot.decorate(xlabel='Slope',
                   ylabel='PMF',
                   title='Posterior marginal distribution')

In [14]:
thinkplot.Pdf(suite.Marginal(1))
thinkplot.decorate(xlabel='Intercept',
                   ylabel='PMF',
                   title='Posterior marginal distribution')

In [15]:
thinkplot.Pdf(suite.Marginal(2))
thinkplot.decorate(xlabel='Sigma',
                   ylabel='PMF',
                   title='Posterior marginal distribution')

MCMC

Implement this model using MCMC. As a starting place, you can use this example from Computational Statistics in Python.

You also have the option of using the GLM module, described here.


In [17]:
import pymc3 as pm
pm.GLM

In [18]:
thinkplot.plot(xs, ys)
thinkplot.decorate(xlabel='x',
                   ylabel='y')

In [19]:
import pymc3 as pm

with pm.Model() as model:
    """Fill this in"""

In [20]:
# Solution goes here

In [21]:
# Solution goes here

In [22]:
# Solution goes here

In [23]:
# Solution goes here

The posterior distributions for these parameters should be similar to what we got with the grid algorithm.


In [ ]: