This document demonstrates how to call functions from the Rmath library from cython code.
Note: Having R installed on your system is not enough. You have to build the Rmath library for standalone use. See the README for a discussion on this.
Remember: Make sure that libRmath.{so|dylib}
is either
/usr/local/lib
Below I have specified where to look for the library and header with the
-L path/to/libRmath.{so|dylib}
and -I /path/to/Rmath.h
in the %%cython
calls below. You may also need to set your {DY}LD_LIBRARY_PATH
environment variable to include the path to libRmath.{so|dylib}
so that the runtime can find the library (you cannot set this variable in python, it must be set in the shell that you run the interpreter from).
See examples/setup.py
for a basic setup script showing what to specify for real code that uses cythonRmath
.
In [1]:
%load_ext cythonmagic
In [2]:
%pylab inline
This is an implementation of a Gibbs sampler for a bivariate distribution that has become a somewhat "standard" example to compare peformance of various languages. It has been implemented in Matlab, python, cython, Julia, R, C/C++, and probably other languages.
The goal is to draw samples from the distribution
\begin{equation*} f(x, y) \propto x^2 \exp(-xy^2 - y^2 + 2y - 4x). \end{equation*}Gibbs sampling iteratively draws a sample from the distribution of $x|y$, and then from the distribution of $y|x$. These are given by:
\begin{align} x | y &\sim \text{Gamma}(3, y^2+4) \\ y | x &\sim \text{N} \left ( \frac{1}{1+x}, \frac{1}{2(1+x)} \right ) \end{align}The cython code below defines a function, gibbs
, that simulates from the distribution defined by $f$.
In [3]:
%%cython --compile-args=-DMATHLIB_STANDALONE=1 -I /Users/nfoti/src/Rmath/include -L /Users/nfoti/src/Rmath/src -lRmath -lm --force
# Change the . -I and -L paths to reflect where Rmath is on your system
"""
Gibbs sampler for the function:
f(x,y) = x^2 \exp(-xy^2 - y^2 + 2y - 4x).
The conditional distributions are given by:
x|y \sim Gamma(3, y^2 + 4)
y|x \sim Normal(\frac{1}{1+x}, \frac{1}{2(1+x)}.
Original version by Flavio Coelho.
Tweaked by Chris Fonnesbeck.
Based on CythonGSL version by Thomas V. Wiecki.
Ported to use cythonRmath by Nick Foti.
"""
cimport cython
from cython_rmath cimport *
import numpy as np
cimport numpy as np
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
def gibbs(int N=20000, int thin=500):
cdef:
double x = 0.
double y = 0.
Py_ssize_t i, j
np.ndarray[np.float64_t, ndim=2] samples = np.empty((N,2),
dtype=np.float64)
for i in xrange(N):
for j in xrange(thin):
x = rgamma(3., 1. / (y * y + 4))
y = rnorm(1. / (1. + x), 1. / sqrt(2.*x + 2.))
samples[i, 0] = x
samples[i, 1] = y
return samples
In [4]:
posterior = gibbs()
In [5]:
plt.hexbin(posterior[:,0], posterior[:,1])
Out[5]:
A gamma process is a pure-jump L\'{e}vy process where the jumps are approximately distributed according to $\text{Gamma}(\epsilon, 1)$, where $\epsilon$ is very small. As such, most of the jump sizes will be small, but a few will be large. We can simulate a sample path from a gamma process using the inverse L\'{e}vy measure (ILM) algorithm.
The ILM algorithm generalizes inversion sampling to Poisson processes and for a gamma process proceeds as follows:
The function $F^{-1}$ is called the inverse L\'{e}vy measure, hence the name, and for a gamma process is related to the inverse exponential integral, denoted $E_1^{-1}$. Ickstadt and Wolpert show that for a gamma process, $F^{-1}$ can be approximated as
e1inv(y, d=1e-9) = qchisq(1-(d/2.)*y, d) / 2.
where qchisq(p, d)
is the quantile function of the $\chi$-square distribution with d
degress of freedom. Scipy provides an implementation of qchisq
as scipy.stat.chi2.ppf
. Calling this function from cython will be inefficient because of the Python overhead. The Scipy function uses functions from the cephes
library to implement chi2.ppf
. I implemented my own qchisq
function by building cephes
and calling the low-level functions directly. However, I ran into numerous numerical issues. Arguments to my cephes
-based qchisq
resulted in 0.0
, where the qchisq
in R I would get non-zeros results. This was one reason for making the cythonRmath
wrapper.
The cython code below defines a function to simulate from a gamma process on $[0,1]$.
In [6]:
%%cython --compile-args=-DMATHLIB_STANDALONE=1 -I /Users/nfoti/src/Rmath/include -L /Users/nfoti/src/Rmath/src -lRmath -lm --force
cimport cython
from cython_rmath cimport *
import numpy as np
cimport numpy as np
@cython.cdivision(True)
cdef inline double e1inv(double y, double d):
# The 1, and 0 are the "lower tail" and "log" flags.
return qchisq(1-(d/2.)*y, d, 1, 0) / 2.
@cython.boundscheck(False)
@cython.wraparound(False)
@cython.cdivision(True)
def gap_ilm(int K=100):
cdef:
double u
double u_sum = 0.
Py_ssize_t k
np.ndarray[np.float64_t, ndim=1] jumps = np.empty((K,),
dtype=np.float64)
np.ndarray[np.float64_t, ndim=1] locs = np.empty((K,),
dtype=np.float64)
for k in xrange(K):
# Draw arrival time from unit-rate Poisson process
u = rexp(1.) + u_sum
u_sum += u
jumps[k] = e1inv(u, 1e-9)
locs[k] = runif(0., 1.)
return (jumps, locs)
In [11]:
jumps, locs = gap_ilm(25)
In [12]:
stem(locs, jumps)
Out[12]: