Deep Semi-NMF demo the CMU PIE Pose dataset


In [1]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

In [2]:
from __future__ import print_function

import matplotlib.pyplot as plt
import numpy as np
import sklearn

from sklearn.cluster import KMeans
from dsnmf import DSNMF, appr_seminmf
from scipy.io import loadmat

In [3]:
mat = loadmat('PIE_pose27.mat', struct_as_record=False, squeeze_me=True)

data, gnd = mat['fea'].astype('float32'), mat['gnd']

# Normalise each feature to have an l2-norm equal to one.
data /= np.linalg.norm(data, 2, 1)[:, None]

In order to evaluate the different features we will use a simple k-means clustering with the only assumption of knowing the true number of classes existing in the dataset.


In [4]:
n_classes = np.unique(gnd).shape[0]
kmeans = KMeans(n_classes, precompute_distances=False)

Using the cluster indicators for each data sample we then use the normalised mutual information score to evalutate the similarity between the predicted labels and the ground truth labels.


In [5]:
def evaluate_nmi(X):
    pred = kmeans.fit_predict(X)
    score = sklearn.metrics.normalized_mutual_info_score(gnd, pred)
    
    return score

First we will perform k-means clustering on the raw feature space.

It will take some time, depending on your setup.


In [6]:
print("K-means on the raw pixels has an NMI of {:.2f}%".format(100 * evaluate_nmi(data)))


K-means on the raw pixels has an NMI of 39.62%

In [7]:
from sklearn.decomposition import PCA

fea = PCA(100).fit_transform(data)
score = evaluate_nmi(fea)

print("K-means clustering using the top 100 eigenvectors has an NMI of {:.2f}%".format(100 * score))


K-means clustering using the top 100 eigenvectors has an NMI of 6.10%

Now use a single layer DSNMF model -- i.e. Semi-NMF

Semi-NMF factorisation decomposes the original data-matrix

$$\mathbf X \approx \mathbf Z \mathbf H$$

subject to the elements of H being non-negative. The objective function of Semi-NMF is closely related to the one of K-means clustering. In fact, if we had a matrix ${\mathbf H}$ that was comprised only by zeros and ones (i.e. a binary matrix) then this would be exactly equivalent to K-means clustering. Instead, Semi-NMF only forces the elements to be non-negative and thus can be seen as a soft clustering method where the features matrix describes the compatibility of each component with a cluster centroid, a base in $\mathbf Z$.


In [8]:
Z, H = appr_seminmf(data.T, 100) # seminmf expects a num_features x num_samples matrix

In [9]:
print("K-means clustering using the Semi-NMF features has an NMI of {:.2f}%".format(100 * evaluate_nmi(H.T)))


K-means clustering using the Semi-NMF features has an NMI of 82.18%

Not bad! That's a huge improvement over using k-means on the raw pixels!

Let's try doing the same with a Deep Semi-NMF model with more than one layer.

Initialize a Deep Semi-NMF model with 2 layers

In Semi-NMF the goal is to construct a low-dimensional representation $\mathbf H^+$ of our original data $\mathbf X^\pm$, with the bases matrix $\mathbf Z^\pm$ serving as the mapping between our original data and its lower-dimensional representation.

In many cases the data we wish to analyse is often rather complex and has a collection of distinct, often unknown, attributes. In this example, we deal with datasets of human faces where the variability in the data does not only stem from the difference in the appearance of the subjects, but also from other attributes, such as the pose of the head in relation to the camera, or the facial expression of the subject. The multi-attribute nature of our data calls for a hierarchical framework that is better at representing it than a shallow Semi-NMF.

$$ \mathbf X^{\pm} \approx {\mathbf Z}_1^{\pm}{\mathbf Z}_2^{\pm}\cdots{\mathbf Z}_m^{\pm}{\mathbf H}^+_m $$

In this example we have a 2-layer network ($m=2$), with $\mathbf Z_1 \in \mathbb{R}^{1024\times 400}$, $\mathbf Z_2 \in \mathbb{R}^{400 \times 100}$, and $\mathbf H_2 \in \mathbb{R}^{100 \times 2856}$


In [10]:
dsnmf = DSNMF(data, layers=(400, 100))



Train the model


In [11]:
for epoch in range(1000):
    residual = float(dsnmf.train_fun())
    
    print("Epoch {}. Residual [{:.2f}]".format(epoch, residual), end="\r")



Evaluate it in terms of clustering performance using the normalised mutual information score.


In [12]:
fea = dsnmf.get_features().T # this is the last layers features i.e. h_2
pred = kmeans.fit_predict(fea)
score = sklearn.metrics.normalized_mutual_info_score(gnd, pred)

print("NMI: {:.2f}%".format(100 * score))


NMI: 98.25%