# Homework 1 (Face Recognition)

``````

In [1]:

%pylab inline

import numpy as np
from matplotlib import pyplot as plt

``````
``````

Populating the interactive namespace from numpy and matplotlib

``````

# Face recognition

The goal of this seminar is to build two simple (anv very similar) face recognition pipelines using `scikit-learn` package. Overall, we'd like to explore different representations and see which one works better.

## Prepare dataset

``````

In [2]:

import scipy.io

image_h, image_w = 32, 32

X_train = data['train_faces'].reshape((image_w, image_h, -1)).transpose((2, 1, 0)).reshape((-1, image_h * image_w))
y_train = data['train_labels'] - 1
X_test = data['test_faces'].reshape((image_w, image_h, -1)).transpose((2, 1, 0)).reshape((-1, image_h * image_w))
y_test = data['test_labels'] - 1

n_features = X_train.shape[1]
n_train = len(y_train)
n_test = len(y_test)
n_classes = len(np.unique(y_train))

print('  Image size        : {}x{}'.format(image_h, image_w))
print('  Train images      : {}'.format(n_train))
print('  Test images       : {}'.format(n_test))
print('  Number of classes : {}'.format(n_classes))

``````
``````

Image size        : 32x32
Train images      : 280
Test images       : 120
Number of classes : 40

``````

Now we are going to plot some samples from the dataset using the provided helper function.

``````

In [ ]:

def plot_gallery(images, titles, h, w, n_row=3, n_col=6):
"""Helper function to plot a gallery of portraits"""
plt.figure(figsize=(1.5 * n_col, 1.7 * n_row))
for i in range(n_row * n_col):
plt.subplot(n_row, n_col, i + 1)
plt.imshow(images[i].reshape((h, w)), cmap=plt.cm.gray, interpolation='nearest')
plt.title(titles[i], size=12)
plt.xticks(())
plt.yticks(())

``````
``````

In [ ]:

titles = [str(y[0]) for y in y_train]

plot_gallery(X_train, titles, image_h, image_w)

``````

## Nearest Neighbour baseline

The simplest way to do face recognition is to treat raw pixels as features and perform Nearest Neighbor Search in the Euclidean space. Let's use `KNeighborsClassifier` class.

``````

In [ ]:

from sklearn.neighbors import KNeighborsClassifier

# Use KNeighborsClassifier to calculate test score for the Nearest Neighbour classifier.

print('Test score: {}'.format(test_score))

``````

## Eigenfaces

All the dirty work will be done by the scikit-learn package. First we need to learn a dictionary of codewords. For that we preprocess the training set by making each face normalized (zero mean and unit variance)..

``````

In [ ]:

# Populate variable 'X_train_processed' with samples each of which has zero mean and unit variance.

``````

Now we are going to apply PCA to obtain a dictionary of codewords. `RamdomizedPCA` class is what we need.

``````

In [ ]:

from sklearn.decomposition import RandomizedPCA

n_components = 64

# Populate 'pca' with a trained instance of RamdomizedPCA.

``````

We plot a bunch of principal components.

``````

In [ ]:

# Visualize principal components.

``````

This time we don't have any restriction on number of non-zero coefficients in the vector decomposition, so the codes are not sparse anymore:

``````

In [ ]:

# Transform training data and plot decomposition coefficients.

``````

Train an SVM and apply it to the encoded test data.

``````

In [ ]:

# Populate 'test_score' with test accuracy of an SVM classifier.

print('Test score: {}'.format(test_score))

``````

How many components are sufficient to reach the same accuracy level?

``````

In [ ]:

n_components = [1, 2, 4, 8, 16, 32, 64]
accuracy = []

# Try different numbers of components and populate 'accuracy' list.

plt.figure(figsize=(10, 6))
plt.plot(n_nonzero, accuracy)

print('Max accuracy: {}'.format(max(accuracy)))

``````