cythonRmath Demo

This document demonstrates how to call functions from the Rmath library from cython code.

Note: Having R installed on your system is not enough. You have to build the Rmath library for standalone use. See the README for a discussion on this.

Remember: Make sure that libRmath.{so|dylib} is either

  1. Somewhere standard, e.g., /usr/local/lib
  2. In the current directory

Below I have specified where to look for the library and header with the -L path/to/libRmath.{so|dylib} and -I /path/to/Rmath.h in the %%cython calls below. You may also need to set your {DY}LD_LIBRARY_PATH environment variable to include the path to libRmath.{so|dylib} so that the runtime can find the library (you cannot set this variable in python, it must be set in the shell that you run the interpreter from).

See examples/setup.py for a basic setup script showing what to specify for real code that uses cythonRmath.


In [1]:
%load_ext cythonmagic

In [2]:
%pylab inline


Populating the interactive namespace from numpy and matplotlib

Gibbs sampling bivariate distributions

This is an implementation of a Gibbs sampler for a bivariate distribution that has become a somewhat "standard" example to compare peformance of various languages. It has been implemented in Matlab, python, cython, Julia, R, C/C++, and probably other languages.

The goal is to draw samples from the distribution

\begin{equation*} f(x, y) \propto x^2 \exp(-xy^2 - y^2 + 2y - 4x). \end{equation*}

Gibbs sampling iteratively draws a sample from the distribution of $x|y$, and then from the distribution of $y|x$. These are given by:

\begin{align} x | y &\sim \text{Gamma}(3, y^2+4) \\ y | x &\sim \text{N} \left ( \frac{1}{1+x}, \frac{1}{2(1+x)} \right ) \end{align}

The cython code below defines a function, gibbs, that simulates from the distribution defined by $f$.


In [3]:
%%cython --compile-args=-DMATHLIB_STANDALONE=1 -I /Users/nfoti/src/Rmath/include -L /Users/nfoti/src/Rmath/src -lRmath -lm --force
# Change the . -I and -L paths to reflect where Rmath is on your system
"""
Gibbs sampler for the function:

f(x,y) = x^2 \exp(-xy^2 - y^2 + 2y - 4x).

The conditional distributions are given by:

x|y \sim Gamma(3, y^2 + 4)
y|x \sim Normal(\frac{1}{1+x}, \frac{1}{2(1+x)}.

Original version by Flavio Coelho.
Tweaked by Chris Fonnesbeck.
Based on CythonGSL version by Thomas V. Wiecki.
Ported to use cythonRmath by Nick Foti.
"""
cimport cython
from cython_rmath cimport *

import numpy as np
cimport numpy as np

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
def gibbs(int N=20000, int thin=500):
    cdef:
        double x = 0.
        double y = 0.
        Py_ssize_t i, j
        np.ndarray[np.float64_t, ndim=2] samples = np.empty((N,2),
                                                    dtype=np.float64)
        
    for i in xrange(N):
        for j in xrange(thin):
            x = rgamma(3., 1. / (y * y + 4))
            y = rnorm(1. / (1. + x), 1. / sqrt(2.*x + 2.))
        samples[i, 0] = x
        samples[i, 1] = y

                
    return samples

In [4]:
posterior = gibbs()

In [5]:
plt.hexbin(posterior[:,0], posterior[:,1])


Out[5]:
<matplotlib.collections.PolyCollection at 0x110f7e910>

Simulating a gamma process

A gamma process is a pure-jump L\'{e}vy process where the jumps are approximately distributed according to $\text{Gamma}(\epsilon, 1)$, where $\epsilon$ is very small. As such, most of the jump sizes will be small, but a few will be large. We can simulate a sample path from a gamma process using the inverse L\'{e}vy measure (ILM) algorithm.

The ILM algorithm generalizes inversion sampling to Poisson processes and for a gamma process proceeds as follows:

  1. Choose a truncation (number of jumps), $K$.
  2. Simulate $K$ arrival times from a unit-rate Poisson process denoted $\{u_i\}_{i=1}^K$.
  3. For each $u_i$ compute $F^{-1}(u_i)$, where $F$ is a particular function.
  4. Simulate the locations of each jump iid from a base distribution $H$.

The function $F^{-1}$ is called the inverse L\'{e}vy measure, hence the name, and for a gamma process is related to the inverse exponential integral, denoted $E_1^{-1}$. Ickstadt and Wolpert show that for a gamma process, $F^{-1}$ can be approximated as

e1inv(y, d=1e-9) = qchisq(1-(d/2.)*y, d) / 2.

where qchisq(p, d) is the quantile function of the $\chi$-square distribution with d degress of freedom. Scipy provides an implementation of qchisq as scipy.stat.chi2.ppf. Calling this function from cython will be inefficient because of the Python overhead. The Scipy function uses functions from the cephes library to implement chi2.ppf. I implemented my own qchisq function by building cephes and calling the low-level functions directly. However, I ran into numerous numerical issues. Arguments to my cephes-based qchisq resulted in 0.0, where the qchisq in R I would get non-zeros results. This was one reason for making the cythonRmath wrapper.

The cython code below defines a function to simulate from a gamma process on $[0,1]$.


In [6]:
%%cython --compile-args=-DMATHLIB_STANDALONE=1 -I /Users/nfoti/src/Rmath/include -L /Users/nfoti/src/Rmath/src -lRmath -lm --force

cimport cython
from cython_rmath cimport *

import numpy as np
cimport numpy as np

@cython.cdivision(True)
cdef inline double e1inv(double y, double d):
    # The 1, and 0 are the "lower tail" and "log" flags.
    return qchisq(1-(d/2.)*y, d, 1, 0) / 2. 

@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
def gap_ilm(int K=100):
    cdef:
        double u
        double u_sum = 0.
        Py_ssize_t k
        np.ndarray[np.float64_t, ndim=1] jumps = np.empty((K,),
                                                    dtype=np.float64)
        np.ndarray[np.float64_t, ndim=1] locs = np.empty((K,),
                                                    dtype=np.float64)
     
    for k in xrange(K):
        # Draw arrival time from unit-rate Poisson process
        u = rexp(1.) + u_sum
        u_sum += u
        
        jumps[k] = e1inv(u, 1e-9)
        locs[k] = runif(0., 1.)
    
    return (jumps, locs)

In [11]:
jumps, locs = gap_ilm(25)

In [12]:
stem(locs, jumps)


Out[12]:
<Container object of 3 artists>