In [1]:
import sys
sys.path.append('../..')
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
import pandas as pd
Welcome! In this section you'll learn about Sampler-class. Instances of Sampler can be used for flexible sampling of multivariate distributions.
To begin with, Sampler gives rise to several building-blocks classes such as
NumpySampler, or NSScipySampler - SSWhat's more, Sampler incorporates a set of operations on Sampler-instances, among which are
|" for building a mixture of two samplers: s = s1 | s2&" for setting a mixture-weight of a sampler: s = 0.6 & s1 | 0.4 & s2truncate" for truncating the support of underlying sampler's distribution: s.truncate(high=[1.0, 1.5])s = s1 + s2 or s = s1 + 0.5These operations can be used for combining building-blocks samplers into complex multivariate-samplers, just like that:
In [2]:
from batchflow import NumpySampler as NS
# truncated normal and uniform
ns1 = NS('n', dim=2).truncate(2.0, 0.8, lambda m: np.sum(np.abs(m), axis=1)) + 4
ns2 = 2 * NS('u', dim=2).truncate(1, expr=lambda m: np.sum(m, axis=1)) - (1, 1)
ns3 = NS('n', dim=2).truncate(1.5, expr=lambda m: np.sum(np.square(m), axis=1)) + (4, 0)
ns4 = ((NS('n', dim=2).truncate(2.5, expr=lambda m: np.sum(np.square(m), axis=1)) * 4)
.apply(lambda m: m.astype(np.int)) / 4 + (0, 3))
# a mixture of all four
ns = 0.4 & ns1 | 0.2 & ns2 | 0.39 & ns3 | 0.01 & ns4
In [3]:
# take a look at the heatmap of our sampler:
h = np.histogramdd(ns.sample(int(1e6)), bins=100, normed=True)
plt.imshow(h[0])
Out[3]:
To build a NumpySampler(NS) you need to specify a name of distribution from numpy.random (or its alias) and the number of independent dimensions:
In [4]:
from batchflow import NumpySampler as NS
ns = NS('n', dim=2)
take a look at a sample generated by our sampler:
In [5]:
smp = ns.sample(size=200)
In [6]:
plt.scatter(*np.transpose(smp))
Out[6]:
The same goes for ScipySampler based on scipy.stats-distributions, or SS ("mvn" stands for multivariate-normal):
In [7]:
from batchflow import ScipySampler as SS
ss = SS('mvn', mean=[0, 0], cov=[[2, 1], [1, 2]]) # note also that you can pass the same params as in
smp = ss.sample(2000) # scipy.sample.multivariate_normal, such as `mean` and `cov`
plt.scatter(*np.transpose(smp))
Out[7]:
HistoSampler, or HS can be used for building samplers, with underlying distributions given by a histogram. You can either pass a np.histogram-output into the initialization of HS
In [8]:
from batchflow import HistoSampler as HS
histo = np.histogramdd(ss.sample(1000000))
hs = HS(histo)
plt.scatter(*np.transpose(hs.sample(150)))
Out[8]:
...or you can specify empty bins and estimate its weights using a method HS.update and a cloud of points:
In [9]:
hs = HS(edges=2 * [np.linspace(-4, 4)])
hs.update(ss.sample(1000000))
plt.imshow(hs.bins, interpolation='bilinear')
Out[9]:
Sampler-instances support artithmetic operations (+, *, -,...). Arithmetics works on either
Sampler, Sampler) - pairSampler, array-like) - pair
In [10]:
# blur using "+"
u = NS('u', dim=2)
noise = NS('n', dim=2)
blurred = u + noise * 0.2 # decrease the magnitude of the noise
both = blurred | u + (2, 2)
In [11]:
plt.imshow(np.histogramdd(both.sample(1000000), bins=100)[0])
Out[11]:
You may also want to truncate a sampler's distribution so that sampling points belong to a specific region. The common use-case is to sample normal points inside a box.
..or, inside a ring:
In [12]:
n = NS('n', dim=2).truncate(3, 0.3, expr=lambda m: np.sum(m**2, axis=1))
plt.imshow(np.histogramdd(n.sample(1000000), bins=100)[0])
Out[12]:
Not infrequently you need to obtain "normal" sample in integers. For this you can use Sampler.apply method:
In [13]:
n = (4 * NS('n', dim=2)).apply(lambda m: m.astype(np.int)).truncate([6, 6], [-6, -6])
plt.imshow(np.histogramdd(n.sample(1000000), bins=100)[0])
Out[13]:
Note that Sampler.apply-method allows you to add an arbitrary transformation to a sampler. For instance, Box-Muller transform:
In [14]:
bm = lambda vec2: np.sqrt(-2 * np.log(vec2[:, 0:1])) * np.concatenate([np.cos(2 * np.pi * vec2[:, 1:2]),
np.sin(2 * np.pi * vec2[:, 1:2])], axis=1)
n = NS('u', dim=2).apply(bm)
In [15]:
plt.imshow(np.histogramdd(u.sample(1000000), bins=100)[0])
Out[15]:
Another useful thing is coordinate stacking ("&" stands for multiplication of distribution functions):
In [16]:
n, u = NS('n'), SS('u') # initialize one-dimensional notrmal and uniform samplers
s = n & u # stack them together
s.sample(3)
Out[16]:
In [17]:
ns1 = NS('n', dim=2).truncate(2.0, 0.8, lambda m: np.sum(np.abs(m), axis=1)) + 4
ns2 = 2 * NS('u', dim=2).truncate(1, expr=lambda m: np.sum(m, axis=1)) - (1, 1)
ns3 = NS('n', dim=2).truncate(1.5, expr=lambda m: np.sum(np.square(m), axis=1)) + (4, 0)
ns4 = ((NS('n', dim=2).truncate(2.5, expr=lambda m: np.sum(np.square(m), axis=1)) * 4)
.apply(lambda m: m.astype(np.int)) / 4 + (0, 3))
ns = 0.4 & ns1 | 0.2 & ns2 | 0.39 & ns3 | 0.01 & ns4
In [18]:
plt.imshow(np.histogramdd(ns.sample(int(1e6)), bins=100, normed=True)[0])
Out[18]:
In [ ]: