To train a Bernoulli Mixture Model, the formulae are:
where $\mathbf{\bar{x}_m} = \frac{1}{N_m} \sum_{n = 1}^N z_{n, m} \mathbf{x_n}$ and $N_m = \sum_{n = 1}^N z_{n, m}$
(1) see bmm.py for the complete implementation of the BMM
the source code of this project is available at https://github.com/toogy/mnist-em-bmm-gmm
In [1]:
# settings
data_path = '/home/data/ml/mnist'
k = 10
In [2]:
# we load pre-calculated k-means
import kmeans as kmeans_
kmeans = kmeans_.load_kmeans('kmeans-20.dat')
In [3]:
%matplotlib inline
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
import scipy
import bmm
import visualize
In [4]:
# loading the data
from mnist import load_mnist
train_data, train_labels = load_mnist(dataset='training', path=data_path)
# pre-processing the data (reshape + making it binary)
train_data = np.reshape(train_data, (60000, 784))
train_data_binary = np.where(train_data > 0.5, 1, 0)
In [5]:
# creating our model
model = bmm.bmm(k, n_iter=20, verbose=True)
In [6]:
model.fit(train_data_binary)
(2) Plot of the means $\mathbf{\mu}$ of the learnt mixture
In [7]:
visualize.plot_means(model.means)
(3) It is not possible to have one center per class with only 10 components even though there are only 10 different digits. Multiple components can represent the same digits (as we can see from the plot). When this happens, it is not possible to represent all of them with only 10 components.
It is possible to avoid this by initializing each component's $\mu_k$ to the mean of the corresponding digit calculated from the labelized dataset. But then it becomes supervised learning, which is not what we want.
Here is the result with this kind of initialization:
In [8]:
model = bmm.bmm(10, verbose=True)
model.fit(train_data_binary, means_init_heuristic='data_classes_mean', labels=train_labels)
In [9]:
visualize.plot_means(model.means)
(4) For each label we select the subset of the data that corresponds to this label and train a bmm to represent the corresponding class. We then have 10 bmm which, together, form a digit classifier.
In [10]:
import classifier
# number of components for each BMM
k = 7
bayesian_classifier = classifier.classifier(k, means_init_heuristic='kmeans',
means=kmeans, model_type='bmm')
bayesian_classifier.fit(train_data_binary, train_labels)
In [11]:
visualize.plot_means(bayesian_classifier.models[3].means)
In [12]:
visualize.plot_means(bayesian_classifier.models[8].means)
In [13]:
test_data, test_labels = load_mnist(dataset='testing', path=data_path)
test_data = np.reshape(test_data, (test_data.shape[0], 784))
test_data_binary = np.where(test_data > 0.5, 1, 0)
label_set = set(train_labels)
predicted_labels = bayesian_classifier.predict(test_data_binary, label_set)
print('accuracy: {}'.format(np.mean(predicted_labels == test_labels)))
BMM are adapted to binary images because they work with 0s and 1s. MNIST data initially was in the range $[0, 255]$. By binarizing the images, information is lost when it could make the model more accurate. GMM can work with real numbers and perform better than BMM for classifying digits.
The Gaussian mixture distribution can be written as a linear superposition of Gaussians in the form
$$p(\mathbf{x}) = \sum_{k=1}^K \pi_k \mathcal{N}(\mathbf{x}|\mathbf{\mu}_k, \mathbf{\Sigma}_k)$$
In [14]:
import sklearn.decomposition
d = 40
reducer = sklearn.decomposition.PCA(n_components=d)
reducer.fit(train_data)
train_data_reduced = reducer.transform(train_data)
test_data_reduced = reducer.transform(test_data)
kmeans_reduced = reducer.transform(kmeans)
In [15]:
import gmm
k = 20
model = gmm.gmm(k, verbose=True)
model.fit(train_data_reduced, means_init_heuristic='kmeans', means=kmeans_reduced)
In [16]:
means_projected = reducer.inverse_transform(model.means)
visualize.plot_means(means_projected)
In [17]:
bayesian_classifier = classifier.classifier(k, model_type='gmm',
means_init_heuristic='kmeans',
means=kmeans_reduced,
covariance_type='diag')
bayesian_classifier.fit(train_data_reduced, train_labels)
In [18]:
means_projected = reducer.inverse_transform(bayesian_classifier.models[4].means)
visualize.plot_means(means_projected)
In [19]:
predicted_labels = bayesian_classifier.predict(test_data_reduced, label_set)
print('accuracy: {}'.format(np.mean(predicted_labels == test_labels)))
The results are a little better than with the BMM
In [20]:
bayesian_classifier = classifier.classifier(k, model_type='gmm',
means_init_heuristic='kmeans',
means=kmeans_reduced,
covariance_type='full')
bayesian_classifier.fit(train_data_reduced, train_labels)
In [21]:
means_projected = reducer.inverse_transform(bayesian_classifier.models[4].means)
visualize.plot_means(means_projected)
In [22]:
predicted_labels = bayesian_classifier.predict(test_data_reduced, label_set)
print('accuracy: {}'.format(np.mean(predicted_labels == test_labels)))