Installation:
>> conda install pymc
In [142]:
%matplotlib inline
import numpy as np
import scipy as sp
import pymc as pm
import seaborn as sb
import matplotlib.pyplot as plt
Suppose you have a sample $\{y_t\}_{t=0}^{T}$ and want to characeterize it by the following probabilistic model; for $t\geq 0$
$$ y_{t+1} = \rho y_t + \sigma_x \varepsilon_{t+1}, \quad \varepsilon_{t+1}\stackrel{iid}{\sim}\cal{N}(0,1) $$with the initial value $y_0 \sim {\cal N}\left(0, \frac{\sigma_x^2}{1-\rho^2}\right)$ and suppose the following (independent) prior beliefs for the parameters $\theta \equiv (\rho, \sigma_x)$
Aim: given the statistical model and the prior $\pi(\theta)$ we want to ''compute'' the posterior distribution $p\left( \theta \hspace{1mm} | \hspace{1mm} y^T \right)$ associated with the sample $y^T$.
How: if no conjugate form available, sample from $p\left( \theta \hspace{1mm} | \hspace{1mm} y^T \right)$ and learn about the posterior's properties from that sample
Remark: We go from the prior $\pi$ to the posterior $p$ by using Bayes rule: \begin{equation} p\left( \theta \hspace{1mm} | \hspace{1mm} y^T \right) = \frac{f( y^T \hspace{1mm}| \hspace{1mm}\theta) \pi(\theta) }{f( y^T)} \end{equation} The first-order autoregression implies that the likelihood function of $y^T$ can be factored as follows:
$$ f(y^T \hspace{1mm}|\hspace{1mm} \theta) = f(y_T| y_{T-1}; \theta)\cdot f(y_{T-1}| y_{T-2}; \theta) \cdots f(y_1 | y_0;\theta )\cdot f(y_0 |\theta) $$where for all $t\geq 1$ $$ f(y_t | y_{t-1}; \theta) = {\mathcal N}(\rho y_{t-1}, \sigma_x^2) = {\mathcal N}(\mu_t, \sigma_x^2)$$
Generate a sample with $T=100$ for known parameter values: $$\rho = 0.5\quad \sigma_x = 1.0$$
In [143]:
def sample_path(rho, sigma, T, y0=None):
'''
Simulates the sample path for y of length T+1 starting from a specified initial value OR if y0
is None, it initializes the path with a draw from the stationary distribution of y.
Arguments
-----------------
rho (Float) : AR coefficient
sigma (Float) : standard deviation of the error
T (Int) : length of the sample path without x0
y0 (Float) : initial value of X
Return:
-----------------
y_path (Numpy Array) : simulated path
'''
if y0 == None:
stdev_erg = sigma / np.sqrt(1 - rho**2)
y0 = np.random.normal(0, stdev_erg)
y_path = np.empty(T+1)
y_path[0] = y0
eps_path = np.random.normal(0, 1, T)
for t in range(T):
y_path[t + 1] = rho * y_path[t] + sigma * eps_path[t]
return y_path
#-------------------------------------------------------
# Pick true values:
rho_true, sigma_x_true, T = 0.5, 1.0, 20
#np.random.seed(1453534)
sample = sample_path(rho_true, sigma_x_true, T)
pymc
Model instance $\approx$ collection of random variables linked together according to some rules
parent: variables that influence another variable
child: variables that are affected by other variables (subjects of parent variables)
Why are they useful?
child variable's current value automatically changes whenever its parents' values change
value
attribute producing the current internal value (given the values of the parents)parents
(gives dictionary), children
(gives a set) Two main classes of random variables in pymc
:
random()
method)logp
attribute: evaluate the logprob (mass or density) at the current value; for vector-valued variables it returns the sum of the (joint) logprobname
+ params of the distribution (can be pymc
variable)value
: for a default initial value; if not specified, initialized by a draw from the given distribution size
: for multivariate array of independent stochastic variables. (Alternatively: use array as a distribution parameter)Initialize stochastic variables
In [144]:
# Priors:
rho = pm.Uniform('rho', lower = -1, upper = 1) # note the capitalized distribution name (rule for pymc distributions)
sigma_x = pm.InverseGamma('sigma_x', alpha = 3, beta = 1)
In [145]:
# random() method
print('Initialization:')
print("Current value of rho = {: f}".format(rho.value.reshape(1,)[0]))
print("Current logprob of rho = {: f}".format(rho.logp))
rho.random()
print('\nAfter redrawing:')
print("Current value of rho = {: f}".format(rho.value.reshape(1,)[0]))
print("Current logprob of rho = {: f}".format(rho.logp))
pm.deterministic
pymc.Lambda
Initialize deterministic variables
:
(a) Standard deviation of $y_0$ is a deterministic function of $\rho$ and $\sigma$
In [146]:
@pm.deterministic(trace = False)
def y0_stdev(rho = rho, sigma = sigma_x):
return sigma / np.sqrt(1 - rho**2)
# Alternatively:
#y0_stdev = pm.Lambda('y0_stdev', lambda r = rho, s = sigma_x: s / np.sqrt(1 - r**2) )
(b) Conditional mean of $y_t$, $\mu_y$, is a deterministic function of $\rho$ and $y_{t-1}$
In [147]:
# For elementary operators simply write
mu_y = rho * sample[:-1]
print(type(mu_y))
# You could also write, to generate a list of Determinisitc functions
#MU_y = [rho * sample[j] for j in range(T)]
#print(type(MU_y))
#print(type(MU_y[1]))
#MU_y = pm.Container(MU_y)
#print(type(MU_y))
Let's see the parents of y0_stdev
...
In [148]:
y0_stdev.parents
Out[148]:
Notice that this is a dictionary, so for example...
In [149]:
y0_stdev.parents['rho'].value
Out[149]:
In [150]:
rho.random()
y0_stdev.parents['rho'].value # if the parent is a pymc variable, the current value will be always 'updated'
Out[150]:
... and as we alter the parent's value, the child's value changes accordingly
In [151]:
print("Current value of y0_stdev = {: f}".format(y0_stdev.value))
rho.random()
print('\nAfter redrawing rho:')
print("Current value of y0_stdev = {: f}".format(y0_stdev.value))
and similarly for mu_y
In [152]:
print("Current value of mu_y:")
print(mu_y.value[:4])
rho.random()
print('\nAfter redrawing rho:')
print("Current value of mu_y:")
print(mu_y.value[:4])
pymc
what you 'know' about the data?We define the data as a stochastic variable with fixed values and set the observed
flag equal to True
For the sample $y^T$, depending on the question at hand, we might want to define
In the current setup, as we fix the value of $y$ (observed), it doesn't really matter (approach A is easier). However, if we have an array-valued stochastic variable with mutable value, the restriction that we cannot update the values of stochastic variables' in-place becomes onerous in the sampling step (where the step method should propose array-valued variable). Straight from the pymc documentation:
''In this case, it may be preferable to partition the variable into several scalar-valued variables stored in an array or list.''
In [153]:
y0 = pm.Normal('y0', mu = 0.0, tau = 1 / y0_stdev, observed = True, value = sample[0])
Y = pm.Normal('Y', mu = mu_y, tau = 1 / sigma_x, observed=True, value = sample[1:])
In [154]:
Y.value
Out[154]:
Notice that the value of this variable is fixed (even if the parent's value changes)
In [155]:
Y.parents['tau'].value
Out[155]:
In [156]:
sigma_x.random()
print(Y.parents['tau'].value)
Y.value
Out[156]:
In [157]:
Y_alt = np.empty(T + 1, dtype = object)
Y_alt[0] = y0 # definition of y0 is the same as above
for i in range(1, T + 1):
Y_alt[i] = pm.Normal('y_{:d}'.format(i), mu = mu_y[i-1], tau = 1 / sigma_x)
print(type(Y_alt))
Y_alt
Out[157]:
Currently, this is just a numpy array of pymc.Deterministic
functions. We can make it a pymc
object by using the pymc.Container
type.
In [158]:
Y_alt = pm.Container(Y_alt)
type(Y_alt)
Out[158]:
and the pymc methods are applied element-wise.
In [159]:
ar1_model = pm.Model([rho, sigma_x, y0, Y, y0_stdev, mu_y])
In [160]:
ar1_model.stochastics # notice that this is an unordered set (!)
Out[160]:
In [161]:
ar1_model.deterministics
Out[161]:
This object have very limited awareness of the structure of the probabilistic model that it describes and does not itslef possess methods for updating the values in the sampling methods.
The joint prior distribution is sitting on an $N$-dimensional space, where $N$ is the number of parameters we are about to make inference on (see the figure below). Looking at the data through the probabilistic model deform the prior surface into the posterior surface, that we need to explore. In principle, we could naively search this space by picking random points in $\mathbb{R}^N$ and calculate the corresponding posterior value (Monte Carlo methods), but a more efficient (especially in higher dimensions) way is to do Markov Chain Monte Carlo (MCMC), which is basically an intelligent way of discovering the posterior surface.
MCMC is an iterative procedure: at every iteration, it proposes a nearby point in the space, then ask 'how likely that this point is close to the maximizer of the posterior surface?', it accepts the proposed point if the likelihood exceeds a particular level and rejects it otherwise (by going back to the old position). The key feature of MCMC is that it produces proposals by simulating a Markov chain for which the posterior is the unique, invariant limiting distribution. In other words, after a possible 'trasition period' (i.e. post converegence), it starts producing draws from the posterior.
pymc
By default it uses the Metropolis-within-Gibbs algorithm (in my oppinion), which is based on two simple principles:
Blocking and conditioning:
Sampling (choose/construct pymc.StepMethod
): if for a given block the conditional density $f$ can be expressed in (semi-)analytic form, use it, if not, use Metrololis-Hastings
Again, a pymc.Model
instance is not much more than a collection, for example, the model variables (blocks) are not matched with step methods determining how to update values in the sampling step. In order to do that, first we need to construct an MCMC instance, which is then ready to be sampled from.
MCMC‘s primary job is to create and coordinate a collection of step methods, each of which is responsible for updating one or more variables (blocks) at each step of the MCMC algorithm. By default, step methods are automatically assigned to variables by PyMC (after we call the sample method).
pymc.StepMethod
s* Metropolis
* AdaptiveMetropolis
* Slicer
* Gibbs
you can assign step methods manually by calling the method use_step_method(method, *args, **kwargs)
:
In [162]:
M = pm.MCMC(ar1_model)
Notice that the step_methods are not assigned yet
In [163]:
M.step_method_dict
Out[163]:
You can specify them now, or if you call the sample
method, pymc will assign the step_methods automatically according to some rule
In [164]:
# draw a sample of size 20,000, drop the first 1,000 and keep only every 5th draw
M.sample(iter = 50000, burn = 1000, thin = 5)
... and you can check what kind of step methods have been assigned (the default in most cases is the Metropolis step method for non-observed stochastic variables, while in case of observed stochastics, we simply draw from the prior)
In [165]:
M.step_method_dict
Out[165]:
The sample can be reached by the trace method (use the names you used at the initialization not the python name -- useful if the two coincide)
In [166]:
M.trace('rho')[:20]
Out[166]:
In [167]:
M.trace('sigma_x')[:].shape
Out[167]:
Then this is just a numpy array, so you can do different sort of things with it. For example plot
In [168]:
sigma_sample = M.trace('sigma_x')[:]
rho_sample = M.trace('rho')[:]
fig, ax = plt. subplots(1, 2, figsize = (15, 5))
ax[0].plot(sigma_sample)
ax[1].hist(sigma_sample)
Out[168]:
Acutally, you don't have to waste your time on construction different subplots. pymc
's built-in plotting functionality creates pretty informative plots for you (baed on matplotlib
). On the figure below
thin=1
),
In [169]:
from pymc.Matplot import plot as fancy_plot
fancy_plot(M.trace('rho'))
For a non-graphical summary of the posterior use the stats()
method
In [170]:
M.stats('rho')
# Try also:
#M.summary()
Out[170]:
In [171]:
N = len(rho_sample)
rho_pr = [rho.random() for i in range(N)]
sigma_pr = [sigma_x.random() for i in range(N)]
Prior = np.vstack([rho_pr, sigma_pr]).T
Posterior = np.vstack([rho_sample, sigma_sample]).T
In [172]:
fig, bx = plt.subplots(1, 2, figsize = (17, 10), sharey = True)
sb.kdeplot(Prior, shade = True, cmap = 'PuBu', ax = bx[0])
bx[0].patch.set_facecolor('white')
bx[0].collections[0].set_alpha(0)
bx[0].axhline(y = sigma_x_true, color = 'DarkRed', lw =2)
bx[0].axvline(x = rho_true, color = 'DarkRed', lw =2)
bx[0].set_xlabel(r'$\rho$', fontsize = 18)
bx[0].set_ylabel(r'$\sigma_x$', fontsize = 18)
bx[0].set_title('Prior', fontsize = 20)
sb.kdeplot(Posterior, shade = True, cmap = 'PuBu', ax = bx[1])
bx[1].patch.set_facecolor('white')
bx[1].collections[0].set_alpha(0)
bx[1].axhline(y = sigma_x_true, color = 'DarkRed', lw =2)
bx[1].axvline(x = rho_true, color = 'DarkRed', lw =2)
bx[1].set_xlabel(r'$\rho$', fontsize = 18)
bx[1].set_ylabel(r'$\sigma_x$', fontsize = 18)
bx[1].set_title('Posterior', fontsize = 20)
plt.xlim(-1, 1)
plt.ylim(0, 1.5)
plt.tight_layout()
plt.savefig('beamer/prior_post.pdf')
In [173]:
rho_grid = np.linspace(-1, 1, 100)
sigmay_grid = np.linspace(0, 1.5, 100)
U = sp.stats.uniform(-1, 2)
IG = sp.stats.invgamma(3)
fig2, cx = plt.subplots(2, 2, figsize = (17, 12), sharey = True)
cx[0, 0].plot(rho_grid, U.pdf(rho_grid), 'r-', lw = 3, alpha = 0.6, label = r'$\rho$ prior')
cx[0, 0].set_title(r"Marginal prior for $\rho$", fontsize = 18)
cx[0, 0].axvline(x = rho_true, color = 'DarkRed', lw = 2, linestyle = '--', label = r'True $\rho$')
cx[0, 0].legend(loc='best', fontsize = 16)
cx[0, 0].set_xlim(-1, 1)
sb.distplot(rho_sample, ax = cx[0,1], kde_kws={"color": "r", "lw": 3, "label": r"$\rho$ posterior"})
cx[0, 1].set_title(r"Marginal posterior for $\rho$", fontsize = 18)
cx[0, 1].axvline(x = rho_true, color = 'DarkRed', lw = 2, linestyle = '--', label = r'True $\rho$')
cx[0, 1].legend(loc='best', fontsize = 16)
cx[0, 1].set_xlim(-1, 1)
cx[1, 0].plot(sigmay_grid, IG.pdf(sigmay_grid), 'r-', lw=3, alpha=0.6, label=r'$\sigma_y$ prior')
cx[1, 0].set_title(r"Marginal prior for $\sigma_y$", fontsize = 18)
cx[1, 0].axvline(x = sigma_x_true, color = 'DarkRed', lw = 2, linestyle = '--', label = r'True $\sigma_y$')
cx[1, 0].legend(loc = 'best', fontsize = 16)
cx[1, 0].set_xlim(0, 3)
sb.distplot(sigma_sample, ax = cx[1,1], kde_kws={"color": "r", "lw": 3, "label": r"$\sigma_y$ posterior"})
cx[1, 1].set_title(r"Marginal posterior for $\sigma_y$", fontsize = 18)
cx[1, 1].axvline(x = sigma_x_true, color = 'DarkRed', lw = 2, linestyle = '--', label = r'True $\sigma_y$')
cx[1, 1].legend(loc = 'best', fontsize = 16)
cx[1, 1].set_xlim(0, 3)
plt.tight_layout()
plt.savefig('beamer/marginal_prior_post.pdf')
In [ ]:
pymc
official documentation: https://pymc-devs.github.io/pymc/index.html
Rich set of fun examples (very easy read) -- Probabilistic Programming & Bayesian Methods for Hackers http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/
Nice example about potential
: http://healthyalgorithms.com/2008/11/05/mcmc-in-python-pymc-to-sample-uniformly-from-a-convex-body/
Non-trivial example comparing the Gibbs and Metropolis algorithms: https://github.com/aflaxman/pymc-examples/blob/master/gibbs_for_uniform_ball.ipynb
Another example: https://users.obs.carnegiescience.edu/cburns/ipynbs/PyMC.html