Example of using IIS

Define a model to estimate


In [ ]:
from scipy.stats import norm, uniform
from iis import IIS, Model

def mymodel(params):
    """User-defined model with two parameters

    Parameters
    ----------
    params : numpy.ndarray 1-D

    Returns
    -------
    state : float
        return value (could also be an array)
    """
    return params[0] + params[1]*2

likelihood = norm(loc=1, scale=1)  # normal, univariate distribution mean 1, s.d. 1
prior = [norm(loc=0, scale=10), uniform(loc=-10, scale=20)] 

model = Model(mymodel, likelihood, prior=prior)  # define the model

Estimate its parameters


In [ ]:
solver = IIS(model)
ensemble = solver.estimate(size=500, maxiter=10)

Investigate results

The IIS class has two attributes of interests:

  • ensemble : current ensemble
  • history : list of previous ensembles

And a to_panel method to vizualize the data as a pandas Panel.

The Ensemble class has following attributes of interest:

  • state : 2-D ndarray (samples x state variables)
  • params : 2-D ndarray (samples x parameters)
  • model : the model defined above, with target distribution and forward integration functions

For convenience, it is possible to extract these field as pandas DataFrame or Panel, combining params and state. See in-line help for methods Ensemble.to_dataframe and IIS.to_panel. This feature requires having pandas installed.

Two plotting methods are also provided: Ensemble.scatter_matrix and IIS.plot_history. The first is simply a wrapper around pandas' function, but it is so frequently used that it is added as a method.


In [ ]:
# Use pandas to check out the quantiles of the final ensemble
ensemble.to_dataframe().quantile([0.5, 0.05, 0.95])

In [ ]:
# or the iteration history 
solver.to_panel(quantiles=[0.5, 0.05, 0.95])

Check convergence


In [ ]:
# Plotting methods
%matplotlib inline
solver.plot_history(overlay_dists=True)

Scatter matrix to investigate final distributions and correlations


In [ ]:
ensemble.scatter_matrix() # result

Advanced vizualisation using pandas (classes)

Pandas is also shipped with a few methods to investigates clusters in data. The categories key-word has been included to Ensemble.to_dataframe to automatically add a column with appropriate categories.


In [ ]:
from pandas.tools.plotting import parallel_coordinates, radviz, andrews_curves
import matplotlib.pyplot as plt

# create clusters of data 
categories = []
for i in xrange(ensemble.size):
    if ensemble.params[i,0]>0:
        cat = 'p0 > 0'
    elif ensemble.params[i,0] > -5:
        cat = 'p0 < 0 and |p0| < 5'
    else:
        cat = 'rest'
    categories.append(cat)

# Create a DataFrame with a category name
class_column  = '_CatName'
df = ensemble.to_dataframe(categories=categories, class_column=class_column)

plt.figure()
parallel_coordinates(df, class_column)
plt.title("parallel_coordinates")

plt.figure()
radviz(df, class_column)
plt.title("radviz")