Sampler


In [1]:
import sys
sys.path.append('../..')
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import pandas as pd

Intro

Welcome! In this section you'll learn about Sampler-class. Instances of Sampler can be used for flexible sampling of multivariate distributions.

To begin with, Sampler gives rise to several building-blocks classes such as

  • NumpySampler, or NS
  • ScipySampler - SS

What's more, Sampler incorporates a set of operations on Sampler-instances, among which are

  • "|" for building a mixture of two samplers: s = s1 | s2
  • "&" for setting a mixture-weight of a sampler: s = 0.6 & s1 | 0.4 & s2
  • " truncate" for truncating the support of underlying sampler's distribution: s.truncate(high=[1.0, 1.5])
  • ..all arithmetic operations: s = s1 + s2 or s = s1 + 0.5

These operations can be used for combining building-blocks samplers into complex multivariate-samplers, just like that:


In [2]:
from batchflow import NumpySampler as NS

# truncated normal and uniform
ns1 = NS('n', dim=2).truncate(2.0, 0.8, lambda m: np.sum(np.abs(m), axis=1)) + 4
ns2 = 2 * NS('u', dim=2).truncate(1, expr=lambda m: np.sum(m, axis=1)) - (1, 1)
ns3 = NS('n', dim=2).truncate(1.5, expr=lambda m: np.sum(np.square(m), axis=1)) + (4, 0)
ns4 = ((NS('n', dim=2).truncate(2.5, expr=lambda m: np.sum(np.square(m), axis=1)) * 4)
       .apply(lambda m: m.astype(np.int)) / 4 + (0, 3))

# a mixture of all four
ns = 0.4 & ns1 | 0.2 & ns2 | 0.39 & ns3 | 0.01 & ns4

In [3]:
# take a look at the heatmap of our sampler:
h = np.histogramdd(ns.sample(int(1e6)), bins=100, normed=True)
plt.imshow(h[0])


Out[3]:
<matplotlib.image.AxesImage at 0x293ccaf6cf8>

Building Samplers

1. Numpy, Scipy, TensorFlow - Samplers

To build a NumpySampler(NS) you need to specify a name of distribution from numpy.random (or its alias) and the number of independent dimensions:


In [4]:
from batchflow import NumpySampler as NS
ns = NS('n', dim=2)

take a look at a sample generated by our sampler:


In [5]:
smp = ns.sample(size=200)

In [6]:
plt.scatter(*np.transpose(smp))


Out[6]:
<matplotlib.collections.PathCollection at 0x293ccbc2710>

The same goes for ScipySampler based on scipy.stats-distributions, or SS ("mvn" stands for multivariate-normal):


In [7]:
from batchflow import ScipySampler as SS
ss = SS('mvn', mean=[0, 0], cov=[[2, 1], [1, 2]])  # note also that you can pass the same params as in
smp = ss.sample(2000)                              # scipy.sample.multivariate_normal, such as `mean` and `cov` 
plt.scatter(*np.transpose(smp))


Out[7]:
<matplotlib.collections.PathCollection at 0x293ccfd8b38>

2. HistoSampler as an estimate of a distribution generating a cloud of points

HistoSampler, or HS can be used for building samplers, with underlying distributions given by a histogram. You can either pass a np.histogram-output into the initialization of HS


In [8]:
from batchflow import HistoSampler as HS
histo = np.histogramdd(ss.sample(1000000))
hs = HS(histo)
plt.scatter(*np.transpose(hs.sample(150)))


Out[8]:
<matplotlib.collections.PathCollection at 0x293cd06eb70>

...or you can specify empty bins and estimate its weights using a method HS.update and a cloud of points:


In [9]:
hs = HS(edges=2 * [np.linspace(-4, 4)])
hs.update(ss.sample(1000000))
plt.imshow(hs.bins, interpolation='bilinear')


Out[9]:
<matplotlib.image.AxesImage at 0x293cd10e358>

3. Algebra of Samplers; operations on Samplers

Sampler-instances support artithmetic operations (+, *, -,...). Arithmetics works on either

  • (Sampler, Sampler) - pair
  • (Sampler, array-like) - pair

In [10]:
# blur using "+"
u = NS('u', dim=2)
noise = NS('n', dim=2)
blurred = u + noise * 0.2 # decrease the magnitude of the noise
both = blurred | u + (2, 2)

In [11]:
plt.imshow(np.histogramdd(both.sample(1000000), bins=100)[0])


Out[11]:
<matplotlib.image.AxesImage at 0x293cd1c6860>

You may also want to truncate a sampler's distribution so that sampling points belong to a specific region. The common use-case is to sample normal points inside a box.

..or, inside a ring:


In [12]:
n = NS('n', dim=2).truncate(3, 0.3, expr=lambda m: np.sum(m**2, axis=1))
plt.imshow(np.histogramdd(n.sample(1000000), bins=100)[0])


Out[12]:
<matplotlib.image.AxesImage at 0x293cd253ac8>

Not infrequently you need to obtain "normal" sample in integers. For this you can use Sampler.apply method:


In [13]:
n = (4 * NS('n', dim=2)).apply(lambda m: m.astype(np.int)).truncate([6, 6], [-6, -6])
plt.imshow(np.histogramdd(n.sample(1000000), bins=100)[0])


Out[13]:
<matplotlib.image.AxesImage at 0x293cd2e1748>

Note that Sampler.apply-method allows you to add an arbitrary transformation to a sampler. For instance, Box-Muller transform:


In [14]:
bm = lambda vec2: np.sqrt(-2 * np.log(vec2[:, 0:1])) * np.concatenate([np.cos(2 * np.pi * vec2[:, 1:2]),
                                                                       np.sin(2 * np.pi * vec2[:, 1:2])], axis=1)
n = NS('u', dim=2).apply(bm)

In [15]:
plt.imshow(np.histogramdd(u.sample(1000000), bins=100)[0])


Out[15]:
<matplotlib.image.AxesImage at 0x293cd364588>

Another useful thing is coordinate stacking ("&" stands for multiplication of distribution functions):


In [16]:
n, u = NS('n'), SS('u')  # initialize one-dimensional notrmal and uniform samplers
s = n & u  # stack them together
s.sample(3)


Out[16]:
array([[ 1.01195792,  0.29821604],
       [ 1.60838829,  0.47283626],
       [ 0.31652709,  0.48517023]])

4. Alltogether


In [17]:
ns1 = NS('n', dim=2).truncate(2.0, 0.8, lambda m: np.sum(np.abs(m), axis=1)) + 4
ns2 = 2 * NS('u', dim=2).truncate(1, expr=lambda m: np.sum(m, axis=1)) - (1, 1)
ns3 = NS('n', dim=2).truncate(1.5, expr=lambda m: np.sum(np.square(m), axis=1)) + (4, 0)
ns4 = ((NS('n', dim=2).truncate(2.5, expr=lambda m: np.sum(np.square(m), axis=1)) * 4)
       .apply(lambda m: m.astype(np.int)) / 4 + (0, 3))
ns = 0.4 & ns1 | 0.2 & ns2 | 0.39 & ns3 | 0.01 & ns4

In [18]:
plt.imshow(np.histogramdd(ns.sample(int(1e6)), bins=100, normed=True)[0])


Out[18]:
<matplotlib.image.AxesImage at 0x293cd3fc860>

In [ ]: