In [53]:
import matplotlib.pyplot as pl
from ssa import SSA
from pystan import stan

import numpy as np

import random

%matplotlib inline

# Fixing figure sizes8
from pylab import rcParams
rcParams['figure.figsize'] = 8,8

Inferring from long versus short-time regimes

How well can we infer parameters of choice from data that is at different stages? For example, if we are dealing with a system that has equilibrated versus a system that is still in the transition regime.

The system of choice shall be a simple birth-death process, described by the relations :

$$ \begin{align} \emptyset \stackrel{a}{\to} X,\\ X \stackrel{\mu X}{\to} \emptyset. \end{align} $$

$X$ here is the population number.

Throughout, we shall use $a=10$ and $\mu=1.0$.

For some intuition on what the system is doing, we shall use the corresponding reaction rate equation. The reaction rate equation for this system is :

$$ \begin{align} \dot{x}=a-\mu\cdot x,\\ x(0)=x_0. \end{align} $$

As this is a linear equation, we can solve it exactly, with solution

$$ x(t)= a/\mu+(x_0-a/\mu) e^{-t\mu} $$

Let us set $\gamma = a/\mu$. Then, the solution above can be written as

$$ x(t)= \gamma+(x_0-\gamma) e^{-t\mu}. $$

But now note that, since $\mu>0$, we have that $$\lim_{t\to\infty} x(t)=\gamma.$$

That is, for long times, the relevant quantity that drives the system is $\gamma$, and by that time, we can no longer recover the exact values of $a$ and $\mu$.


In [47]:
N = 1000
alpha = 1.0
mu = 10.0

x, t = SSA(100,N,a=alpha,mu=mu)
x = x.astype(int) # path data supposed to be integers.

path_data = {
    'N' : N,
    't' : t,
    'x' : x
}

print 'Simulated up to time T =', round(t[N-1])


Simulated up to time T = 438.0

In [48]:
# Setup STAN :
model_description = """
data{
      int<lower=0> N; ## number of time steps
      vector[N] t; ## time value at each time step
      int<lower=0> x[N]; ## population value at each time step
    }

    transformed data{
      int<lower=0> x0; ## starting population value
      x0 <- x[1];
    }

    parameters{
      real<lower=0> alpha; ## birth rate parameter 
      real<lower=0> mu; ## death rate parameter
    }
    
    transformed parameters{
      real<lower=0> gamma;
      gamma <- alpha/mu;
    }


 model {
      x ~ poisson(gamma+(x0-gamma)*exp(-mu*t));
    }

"""

In [49]:
fit = stan(model_code=model_description,
           data=path_data, chains=1,iter=1000)

print fit


Inference for Stan model: anon_model_08776c60aeff1edc9f21a230218a96da.
1 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=500.

        mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha   6.69    0.02   0.34   5.99   6.46   6.68   6.94   7.36  500.0    1.0
mu     11.19    0.01   0.25  10.76  11.01  11.17  11.34  11.74  500.0    1.0
gamma    0.6  1.2e-3   0.03   0.54   0.58    0.6   0.62   0.65  500.0    1.0
lp__   1.5e4    0.05   1.11  1.5e4  1.5e4  1.5e4  1.5e4  1.5e4  500.0   1.01

Samples were drawn using NUTS(diag_e) at Fri Apr  8 18:17:12 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Above we can see the inference from STAN. Remember that $a=1$ and $\mu=10$, which gives $\gamma=a/\mu=0.1$. Looking at the values above, we can see that STAN didn't quite hit the mark with $a$, whereas it did better at $\mu$ and $\gamma$.

We shall now try again, but instead of using a single path, we shall use multiple paths.


In [71]:
N = 300
Npath = 10

alpha = 1.0
mu = 10.0

x = np.zeros([Npath,N])
t = np.zeros([Npath,N])

for i in xrange(Npath):
    x[i,], t[i,] = SSA(100,N,a=alpha,mu=mu)
    
x = x.astype(int) # path data supposed to be integers.

x = x.flatten()
t = t.flatten()

Ndata = N*Npath

path_data = {
    'N' : Ndata,
    't' : t,
    'x' : x
}

Now that we have data that are mostly from the transition regime, we run our inference again.


In [73]:
fit_trans = stan(model_code=model_description,
           data=path_data, chains=1,iter=2000)

print fit_trans


Inference for Stan model: anon_model_08776c60aeff1edc9f21a230218a96da.
1 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=1000.

        mean se_mean     sd   2.5%    25%    50%    75%  97.5%  n_eff   Rhat
alpha   5.98  5.3e-3   0.17   5.66   5.86   5.98   6.09   6.31 1000.0    1.0
mu     10.18  2.0e-3   0.06  10.06  10.14  10.18  10.22   10.3 1000.0    1.0
gamma   0.59  5.0e-4   0.02   0.56   0.58   0.59    0.6   0.62 1000.0    1.0
lp__   1.6e5    0.03   0.96  1.6e5  1.6e5  1.6e5  1.6e5  1.6e5 1000.0   1.01

Samples were drawn using NUTS(diag_e) at Fri Apr  8 18:34:12 2016.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

In [ ]: