Methods for Sampling from the Unit Simplex

1. Uniform Distributions on the Unit Simplex

A $\textit{n}$-unit simplex (https://en.wikipedia.org/wiki/Simplex#The_standard_simplex) is defined as the subset of $\mathbb{R}^{n+1}$ given by $\Delta^{n} = \{(x_1,x_2,...,x_{n+1}) \in \mathbb{R}^{n+1} | \sum_{k=1}^{n+1}{x_k}=1; x_k \geq 0, k = 1,2,...,n+1\}$. Methods for sampling directly from the uniform distribution on this $\textit{n}$-unit simplex is not that obvious. The easiest method I have found for this task is from Luc Devroye's book "$\textbf{Non-Uniform Random Variate Generation} $" $\textbf{Chapter V}$ (http://www.nrbook.com/devroye/Devroye_files/chapter_five.pdf). This method utilizes the spacing between order statistics of random samples from the uniform distribution between 0 and 1.

For simplicity and easy visilization, here we sample from the $\textbf{3-unit simplex}$.



In [11]:

    
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt



In [12]:

    
## generate order statistics from uniform distribution between 0 and 1
U = [[np.random.uniform(), np.random.uniform()] for i in range(50000)]
for u in U:
    u.sort()



In [18]:

    
## calculate the spacing and plot the sampling results
x_1 = [u[0] for u in U]
x_2 = [u[1]-u[0] for u in U]
plt.plot(x_1,x_2,'.',markersize = 1)
plt.xlabel(r'x_1')
plt.ylabel(r'x_2')









    














    











    Out[18]:





<matplotlib.text.Text at 0x10ed57f28>

2. Exponential Distributions on the Unit Simplex

The exponential distribution on the $\textit{n}$-unit simplex $\Delta^n$ is defined as:

$$p((x_1,x_2,...,x_{n+1})) = \frac{\text{exp}(a_1 x_1 + a_2 x_2 + ... + a_{n+1}x_{n+1})}{Z}$$

$\text{where }(x_1,x_2,...,x_{n+1}) \in \Delta^n;$ $Z = \sum_{i=1}^{n+1}{\frac{\text{exp}(a_i)}{\prod_{j\neq i}{(a_i - a_j)}}},$ and $a_1,a_2,...,a_{n+1}$ are constants

This distribution arise from one of my research projects. In the project, sampling from the distribution is required. I originally thought that the distribution seems pretty standard and there must be some standard method for sampling directly from it, like the method for sampling from the uniform distribution on the unit simplex. However, after some searching, I realize it is not the case. I still do not find existing methods for sampling directly from it.

To work around it, I turn to Markov Chain Monte Carlo methods.

2.1 Gibbs Sampler

After a simple manipulation, we can see that sampling from the above distribution is equivalent to sampling from the following distribution:

$$p((x_1,x_2,...,x_n)) = \text{exp}(a_{n+1}) \cdot \frac{\text{exp}((a_1 - a_{n+1}) x_1 + (a_2 - a_{n+1}) x_2 + ... + (a_{n} - a_{n+1})x_{n})}{Z}$$

where $0 \leq \sum_{i=1}^{n}{x_i} \leq 1$ and $x_i \geq 0$ for $i = 1,2,...,n.$

One method for sampling from this distribution is the Gibbs sampler, which samples each individual $x_i$ successively from its conditional distribution given all other $x_j, j \neq i$.



In [21]:

    
def restricted_exp_sample(c = 1, a = 0, b = 1):
    """
    Sample from the exponential distribution $p(x) \propto \text{exp}(c*x)$ restricted in the interval [a,b]
    """
    u = np.random.uniform()
    return 1.0/c * np.log(np.exp(c*a) + u * (exp(c*b) - exp(c*a)))

def Gibbs_Sampler(a_list):



In [ ]:

2.2 Hit-and-Run Sampler



In [ ]: