author: Jacob Schreiber

contact: jmschreiber91@gmail.com

It is frequently the case that the data you have is not explained by a single underlying distribution. Typically this is because there are multiple phenomena occuring in the data set, each with their own underlying distribution. If we want to try to recover the underlying distributions, we need to have a model which has multiple components. An example could be sensor readings where the majority of the time a sensor shows no signal, but sometimes it detects some phenomena. Modeling both phenomena as a single distribution would be silly because the readings would come from two distinct phenomena.

A solution to the problem of having more than one single underlying distribution is to use a mixture of distributions instead of a single distribution, commonly called a mixture model. This type of compositional model builds a more complex probability distribution from a set of simpler ones. A common type, called a Gaussian Mixture Model, is composed of Gaussian distributions, but mathematically there is no need for these distributions to all be Gaussian. In fact, there is no need for these distributions to be simple probability distributions.

In this tutorial we'll explore how to do mixture modeling in pomegranate, compare against scikit-learn's implementation of Gaussian mixture models, and explore more complex types of mixture modeling that one can do with probabilistic modeling.

```
In [2]:
```%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set_style('whitegrid')
import numpy
from pomegranate import *
numpy.random.seed(0)
numpy.set_printoptions(suppress=True)
%load_ext watermark
%watermark -m -n -p numpy,scipy,pomegranate

```
```

```
In [12]:
```X = numpy.concatenate([numpy.random.normal((7, 2), 1, size=(100, 2)),
numpy.random.normal((2, 3), 1, size=(150, 2)),
numpy.random.normal((7, 7), 1, size=(100, 2))])
plt.figure(figsize=(8, 6))
plt.scatter(X[:,0], X[:,1])
plt.show()

```
```

`from_samples`

method, we can initialize a mixture model using it, additionally passing in the type(s) of distribution(s) to use and the number of components.

```
In [15]:
```model = GeneralMixtureModel.from_samples(MultivariateGaussianDistribution, 3, X)

```
In [53]:
```x = numpy.arange(-1, 10.1, .1)
y = numpy.arange(-1, 10.1, .1)
xx, yy = numpy.meshgrid(x, y)
x_ = numpy.array(list(zip(xx.flatten(), yy.flatten())))
p1 = MultivariateGaussianDistribution.from_samples(X).probability(x_).reshape(len(x), len(y))
p2 = model.probability(x_).reshape(len(x), len(y))
plt.figure(figsize=(14, 6))
plt.subplot(121)
plt.contourf(xx, yy, p1, cmap='Blues', alpha=0.8)
plt.scatter(X[:,0], X[:,1])
plt.subplot(122)
plt.contourf(xx, yy, p2, cmap='Blues', alpha=0.8)
plt.scatter(X[:,0], X[:,1])
plt.show()

```
```

```
In [111]:
```mu = numpy.random.normal(7, 1, size=250)
std = numpy.random.lognormal(-2.0, 0.4, size=250)
std[::2] += 0.3
dur = numpy.random.exponential(250, size=250)
dur[::2] -= 140
dur = numpy.abs(dur)
data = numpy.concatenate([numpy.random.normal(mu_, std_, int(t)) for mu_, std_, t in zip(mu, std, dur)])
plt.figure(figsize=(14, 4))
plt.title("Randomly Generated Signal", fontsize=16)
plt.plot(data)
plt.xlabel("Time", fontsize=14)
plt.ylabel("Signal", fontsize=14)
plt.xlim(0, 10000)
plt.show()

```
```

We can show the distribution of segment properties to highlight the differences.

```
In [112]:
```plt.figure(figsize=(14, 4))
plt.subplot(131)
plt.title("Segment Means", fontsize=14)
plt.hist(mu[::2], bins=20, alpha=0.7)
plt.hist(mu[1::2], bins=20, alpha=0.7)
plt.subplot(132)
plt.title("Segment STDs", fontsize=14)
plt.hist(std[::2], bins=20, alpha=0.7)
plt.hist(std[1::2], bins=20, alpha=0.7)
plt.subplot(133)
plt.title("Segment Durations", fontsize=14)
plt.hist(dur[::2], bins=numpy.arange(0, 1000, 25), alpha=0.7)
plt.hist(dur[1::2], bins=numpy.arange(0, 1000, 25), alpha=0.7)
plt.show()

```
```

```
In [113]:
```X = numpy.array([mu, std, dur]).T.copy()
model = GeneralMixtureModel.from_samples([NormalDistribution, LogNormalDistribution, ExponentialDistribution], 2, X)
model

```
Out[113]:
```

The API for general mixture models is similar to that of the rest of pomegranate. The model can be initialized either through passing in some pre-initialized distributions or by calling `from_samples`

on a data set, specifying the types of distributions you'd like.

You can simply pass in some distributions to initialize the mixture.

```
In [152]:
```model = GeneralMixtureModel([NormalDistribution(4, 1), NormalDistribution(7, 1)])

```
In [153]:
```model2 = GeneralMixtureModel([NormalDistribution(5, 1), ExponentialDistribution(0.3)])

They will produce very different probability distributions, but both work as a mixture.

```
In [154]:
```x = numpy.arange(0, 10.01, 0.05)
plt.figure(figsize=(14, 3))
plt.subplot(121)
plt.title("~Norm(4, 1) + ~Norm(7, 1)", fontsize=14)
plt.ylabel("Probability Density", fontsize=14)
plt.fill_between(x, 0, model.probability(x))
plt.ylim(0, 0.25)
plt.subplot(122)
plt.title("~Norm(5, 1) + ~Exp(0.3)", fontsize=14)
plt.ylabel("Probability Density", fontsize=14)
plt.fill_between(x, 0, model2.probability(x))
plt.ylim(0, 0.25)
plt.show()

```
```

```
In [159]:
```X = numpy.random.normal(3, 1, size=(200, 1))
X[::2] += 3
model = GeneralMixtureModel.from_samples(NormalDistribution, 2, X)

We can take a quick look at what it looks like:

```
In [160]:
```plt.figure(figsize=(6, 3))
plt.title("Learned Model", fontsize=14)
plt.ylabel("Probability Density", fontsize=14)
plt.fill_between(x, 0, model.probability(x))
plt.ylim(0, 0.25)
plt.show()

```
```

```
In [147]:
```X = numpy.concatenate([numpy.random.normal(5, 1, size=(200, 1)), numpy.random.exponential(1, size=(50, 1))])
model = GeneralMixtureModel.from_samples([NormalDistribution, ExponentialDistribution], 2, X)

```
In [148]:
```x = numpy.arange(0, 12.01, 0.01)
plt.figure(figsize=(6, 3))
plt.title("Learned Model", fontsize=14)
plt.ylabel("Probability Density", fontsize=14)
plt.fill_between(x, 0, model.probability(x))
plt.ylim(0, 0.5)
plt.show()

```
```

It looks like the mixture is capturing the distribution fairly well.

Next, if we want to make a mixture model that describes each feature using a different distribution, we can pass in a list of distributions---one or each feature---and it'll use that distribution to model the corresponding feature. For example, in the example below, the first feature will be modeled using a normal distribution, the second feature will be modeled using a log normal distribution, and the last feature will be modeled using an exponential distribution.

```
In [163]:
```X = numpy.array([mu, std, dur]).T.copy()
model = GeneralMixtureModel.from_samples([NormalDistribution, LogNormalDistribution, ExponentialDistribution], 2, X)

This is the same command we used in the second example. It creates two components, each of which model the three features respectively with those three distributions. The difference between this and the previous example is that the data here is multivariate and so univariate distributions are assumed to model different distributions.

Now that we have a mixture model, we can make predictions as to which component it is most likely to fall under. This is similar to the situation in which you predict which cluster a point falls under in k-means clustering. However, one of the benefits of using probabilistic models is that we can get a softer assignment of points based on probability, rather than a hard assignment to clusters for each point.

```
In [179]:
```X = numpy.random.normal(5, 1, size=(500, 1))
X[::2] += 3
x = numpy.arange(0, 10, .01)
model = GeneralMixtureModel.from_samples(NormalDistribution, 2, X)
plt.figure(figsize=(8, 4))
plt.hist(X, bins=50, normed=True)
plt.plot(x, model.distributions[0].probability(x), label="Distribution 1")
plt.plot(x, model.distributions[1].probability(x), label="Distribution 2")
plt.plot(x, model.probability(x), label="Mixture")
plt.legend(fontsize=14, loc=2)
plt.show()

```
```

The prediction task is identifying, for each point, whether it falls under the orange distribution or the green distribution. This is done using Bayes' rule, where the probability of each sample under each component is divided by the probability of the sample under all components of the mixture. The posterior probability of a sample belonging to component $i$ out of $k$ components is

\begin{equation} P(M_{i}|D) = \frac{P(D|M_{i})P(M_{i})}{\sum\limits_{i=1}^{k} P(D|M_{i})P(M_{i})} \end{equation}where $D$ is the data, $M_{i}$ is the component of the model (such as the green or orange distributions), $P(D|M_{i}$ is the likelihood calculated by the `probability`

function, $P(M_{i})$ is the prior probability of each component, stored under the `weights`

attribute of the model, and $P(M_{i}|D)$ is the posterior probability that sums to 1 over all components.

We can calculate these posterior probabilites using the `predict_proba`

method.

```
In [183]:
```X = numpy.arange(4, 9, 0.5).reshape(10, 1)
model.predict_proba(X)

```
Out[183]:
```