```
In [1]:
```%matplotlib inline
import numpy as np
import theano.tensor as tt
import pymc3 as pm
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_context('notebook')

```
In [2]:
```with pm.Model() as model:
# Model definition
pass

We discuss RVs further below but let's create a simple model to explore the `Model`

class.

```
In [3]:
```with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=1)
obs = pm.Normal('obs', mu=mu, sd=1, observed=np.random.randn(100))

```
In [4]:
```model.basic_RVs

```
Out[4]:
```

```
In [5]:
```model.free_RVs

```
Out[5]:
```

```
In [6]:
```model.observed_RVs

```
Out[6]:
```

```
In [7]:
```model.logp({'mu': 0})

```
Out[7]:
```

Every probabilistic program consists of observed and unobserved Random Variables (RVs). Observed RVs are defined via likelihood distributions, while unobserved RVs are defined via prior distributions. In PyMC3, probability distributions are available from the main module space:

```
In [8]:
```help(pm.Normal)

```
```

In the PyMC3 module, the structure for probability distributions looks like this:

pymc3.distributions

|- continuous

|- discrete

|- timeseries

|- mixture

```
In [9]:
```dir(pm.distributions.mixture)

```
Out[9]:
```

```
In [10]:
```with pm.Model():
x = pm.Normal('x', mu=0, sd=1)

As with the model, we can evaluate its logp:

```
In [11]:
```x.logp({'x': 0})

```
Out[11]:
```

`observed`

keyword argument:

```
In [12]:
```with pm.Model():
obs = pm.Normal('x', mu=0, sd=1, observed=np.random.randn(100))

`observed`

supports lists, `numpy.ndarray`

, `theano`

and `pandas`

data structures.

PyMC3 allows you to freely do algebra with RVs in all kinds of ways:

```
In [13]:
```with pm.Model():
x = pm.Normal('x', mu=0, sd=1)
y = pm.Gamma('y', alpha=1, beta=1)
plus_2 = x + 2
summed = x + y
squared = x**2
sined = pm.math.sin(x)

`pm.Determinstic`

:

```
In [14]:
```with pm.Model():
x = pm.Normal('x', mu=0, sd=1)
plus_2 = pm.Deterministic('x plus 2', x + 2)

`plus_2`

can be used in the identical way to above, we only tell PyMC3 to keep track of this RV for us.

```
In [15]:
```with pm.Model() as model:
x = pm.Uniform('x', lower=0, upper=1)

When we look at the RVs of the model, we would expect to find `x`

there, however:

```
In [16]:
```model.free_RVs

```
Out[16]:
```

`x_interval__`

represents `x`

transformed to accept parameter values between -inf and +inf. In the case of an upper and a lower bound, a `LogOdd`

s transform is applied. Sampling in this transformed space makes it easier for the sampler. PyMC3 also keeps track of the non-transformed, bounded parameters. These are common determinstics (see above):

```
In [17]:
```model.deterministics

```
Out[17]:
```

When displaying results, PyMC3 will usually hide transformed parameters. You can pass the `include_transformed=True`

parameter to many functions to see the transformed parameters that are used for sampling.

You can also turn transforms off:

```
In [18]:
```with pm.Model() as model:
x = pm.Uniform('x', lower=0, upper=1, transform=None)
print(model.free_RVs)

```
```

```
In [19]:
```with pm.Model():
x = [pm.Normal('x_{}'.format(i), mu=0, sd=1) for i in range(10)] # bad

However, even though this works it is quite slow and not recommended. Instead, use the `shape`

kwarg:

```
In [20]:
```with pm.Model() as model:
x = pm.Normal('x', mu=0, sd=1, shape=10) # good

`x`

is now a random vector of length 10. We can index into it or do linear algebra operations on it:

```
In [21]:
```with model:
y = x[0] * x[1] # full indexing is supported
x.dot(x.T) # Linear algebra is supported

```
In [22]:
```with pm.Model():
x = pm.Normal('x', mu=0, sd=1, shape=5)
x.tag.test_value

```
Out[22]:
```

```
In [23]:
```with pm.Model():
x = pm.Normal('x', mu=0, sd=1, shape=5, testval=np.random.randn(5))
x.tag.test_value

```
Out[23]:
```

This technique is quite useful to identify problems with model specification or initialization.

Once we have defined our model, we have to perform inference to approximate the posterior distribution. PyMC3 supports two broad classes of inference: sampling and variational inference.

The main entry point to MCMC sampling algorithms is via the `pm.sample()`

function. By default, this function tries to auto-assign the right sampler(s) and auto-initialize if you don't pass anything.

```
In [24]:
```with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=1)
obs = pm.Normal('obs', mu=mu, sd=1, observed=np.random.randn(100))
trace = pm.sample(1000, tune=500)

```
```

```
In [25]:
```len(trace)

```
Out[25]:
```

You can also run multiple chains in parallel using the `njobs`

kwarg:

```
In [26]:
```with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=1)
obs = pm.Normal('obs', mu=mu, sd=1, observed=np.random.randn(100))
trace = pm.sample(njobs=4)

```
```

```
In [27]:
```trace['mu'].shape

```
Out[27]:
```

```
In [28]:
```trace.nchains

```
Out[28]:
```

```
In [29]:
```trace.get_values('mu', chains=1).shape # get values of a single chain

```
Out[29]:
```

PyMC3, offers a variety of other samplers, found in `pm.step_methods`

.

```
In [30]:
```list(filter(lambda x: x[0].isupper(), dir(pm.step_methods)))

```
Out[30]:
```

Commonly used step-methods besides NUTS are `Metropolis`

and `Slice`

. **For almost all continuous models, NUTS should be preferred.** There are hard-to-sample models for which

`NUTS`

will be very slow causing many users to use `Metropolis`

instead. This practice, however, is rarely successful. NUTS is fast on simple models but can be slow if the model is very complex or it is badly initialized. In the case of a complex model that is hard for NUTS, Metropolis, while faster, will have a very low effective sample size or not converge properly at all. A better approach is to instead try to improve initialization of NUTS, or reparameterize the model.For completeness, other sampling methods can be passed to sample:

```
In [31]:
```with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=1)
obs = pm.Normal('obs', mu=mu, sd=1, observed=np.random.randn(100))
step = pm.Metropolis()
trace = pm.sample(1000, step=step)

```
```

You can also assign variables to different step methods.

```
In [32]:
```with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=1)
sd = pm.HalfNormal('sd', sd=1)
obs = pm.Normal('obs', mu=mu, sd=sd, observed=np.random.randn(100))
step1 = pm.Metropolis(vars=[mu])
step2 = pm.Slice(vars=[sd])
trace = pm.sample(10000, step=[step1, step2], njobs=4)

```
```

```
In [33]:
```pm.traceplot(trace);

```
```

Another common metric to look at is R-hat, also known as the Gelman-Rubin statistic:

```
In [34]:
```pm.gelman_rubin(trace)

```
Out[34]:
```

These are also part of the `forestplot`

:

```
In [35]:
```pm.forestplot(trace);

```
```

```
In [36]:
```pm.plot_posterior(trace);

```
```

`NUTS`

we can look at the energy plot to assess problems of convergence:

```
In [37]:
```with pm.Model() as model:
x = pm.Normal('x', mu=0, sd=1, shape=100)
trace = pm.sample(njobs=4)
pm.energyplot(trace);

```
```

```
In [38]:
```with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=1)
sd = pm.HalfNormal('sd', sd=1)
obs = pm.Normal('obs', mu=mu, sd=sd, observed=np.random.randn(100))
approx = pm.fit()

```
```

`Approximation`

object has various capabilities, like drawing samples from the approximated posterior, which we can analyse like a regular sampling run:

```
In [39]:
```approx.sample(500)

```
Out[39]:
```

`variational`

submodule offers a lot of flexibility in which VI to use and follows an object oriented design. For example, full-rank ADVI estimates a full covariance matrix:

```
In [40]:
```mu = pm.floatX([0., 0.])
cov = pm.floatX([[1, .5], [.5, 1.]])
with pm.Model() as model:
pm.MvNormal('x', mu=mu, cov=cov, shape=2)
approx = pm.fit(method='fullrank_advi')

```
```

An equivalent expression using the object-oriented interface is:

```
In [41]:
```with pm.Model() as model:
pm.MvNormal('x', mu=mu, cov=cov, shape=2)
approx = pm.FullRankADVI().fit()

```
```

```
In [42]:
```plt.figure()
trace = approx.sample(10000)
sns.kdeplot(trace['x'])

```
Out[42]:
```

Stein Variational Gradient Descent (SVGD) uses particles to estimate the posterior:

```
In [43]:
```w = pm.floatX([.2, .8])
mu = pm.floatX([-.3, .5])
sd = pm.floatX([.1, .1])
with pm.Model() as model:
pm.NormalMixture('x', w=w, mu=mu, sd=sd)
approx = pm.fit(method=pm.SVGD(n_particles=200, jitter=1.))

```
```

```
In [44]:
```plt.figure()
trace = approx.sample(10000)
sns.distplot(trace['x']);

```
```

For more information on variational inference, see these examples.

```
In [45]:
```data = np.random.randn(100)
with pm.Model() as model:
mu = pm.Normal('mu', mu=0, sd=1)
sd = pm.HalfNormal('sd', sd=1)
obs = pm.Normal('obs', mu=mu, sd=sd, observed=data)
trace = pm.sample()

```
```

```
In [46]:
```with model:
post_pred = pm.sample_ppc(trace, samples=500, size=len(data))

```
```

`sample_ppc()`

returns a dict with a key for every observed node:

```
In [47]:
```post_pred['obs'].shape

```
Out[47]:
```

```
In [48]:
```plt.figure()
ax = sns.distplot(post_pred['obs'].mean(axis=1), label='Posterior predictive means')
ax.axvline(data.mean(), color='r', ls='--', label='True mean')
ax.legend()

```
Out[48]:
```

In many cases you want to predict on unseen / hold-out data. This is especially relevant in Probabilistic Machine Learning and Bayesian Deep Learning. While we plan to improve the API in this regard, this can currently be achieved with a `theano.shared`

variable. These are theano tensors whose values can be changed later. Otherwise they can be passed into PyMC3 just like any other numpy array or tensor.

```
In [49]:
```import theano
x = np.random.randn(100)
y = x > 0
x_shared = theano.shared(x)
y_shared = theano.shared(y)
with pm.Model() as model:
coeff = pm.Normal('x', mu=0, sd=1)
logistic = pm.math.sigmoid(coeff * x_shared)
pm.Bernoulli('obs', p=logistic, observed=y_shared)
trace = pm.sample()

```
```

`x_shared`

and `y_shared`

. Theoretically we don't need to set `y_shared`

as we want to predict it but it has to match the shape of `x_shared`

.

```
In [50]:
```x_shared.set_value([-1, 0, 1.])
y_shared.set_value([0, 0, 0]) # dummy values
with model:
post_pred = pm.sample_ppc(trace, samples=500)

```
```

```
In [51]:
```post_pred['obs'].mean(axis=0)

```
Out[51]:
```