Machine Learning Breakout: Facial Recognition

This exercise will walk you through the process of using machine learning for facial recognition.


In [ ]:
from __future__ import print_function, division

%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# use seaborn for better matplotlib styles
import seaborn; seaborn.set(style='white')

1. Fetch & explore the data

The data we'll use is a number of snapshots of the faces of world leaders. We'll fetch the data as follows:


In [ ]:
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
  • Explore this data, which is layed out very similarly to the digits data we saw earlier. How many samples are there? How many features? How many classes, or targets?
  • Use subplots and plt.imshow to plot several of the images. How many pixels are in each image?
  • Use sklearn.metrics.train_test_split to split the data into a training set and a test set.

2. Projecting the Data

Lets use some dimensionality reduction routines to try and understand the data. Just a warning: you'll probably find that, unlike in the case of the handwritten digits, the projections will be a bit too jumbled to gain much insight. Still, it's always a useful step in understanding your data!

  • Project the data to two-dimensions with Principal Component Analysis, and scatter-plot the results
  • Project the data to two dimensinos with Isomap and scatter-plot the results

3: Classification of unknown images

Here we'll perform a classification task on our data. Given a training set, we want to build a classifier that will accurately predict the test set

  • Start by splitting your data into a train and test set (you can use sklearn.cross_validation.train_test_split)
  • We'll use a support vector classifier (sklearn.svm.SVC) to classify the data. Import this and instantiate the estimator.
  • Perform an initial fit on the data, predict the test labels, and use sklearn.metrics.accuracy_score to see how well you're doing.
  • The estimator can be tuned to make the fit better. we'll do this by adjusting the C parameter of SVC. Look at the SVC doc string and try some choices for the kernel, for C and for gamma. What's the best accuracy you can find?
  • For this best estimator, print the sklearn.metrics.classification_report and sklearn.metrics.confusion_matrix, and plot some of the images with the true and predicted label. How well does it do?