In [ ]:
from __future__ import print_function, division
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# use seaborn for better matplotlib styles
import seaborn; seaborn.set(style='white')
In [ ]:
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
plt.imshow to plot several of the images. How many pixels are in each image?sklearn.metrics.train_test_split to split the data into a training set and a test set.Lets use some dimensionality reduction routines to try and understand the data. Just a warning: you'll probably find that, unlike in the case of the handwritten digits, the projections will be a bit too jumbled to gain much insight. Still, it's always a useful step in understanding your data!
Here we'll perform a classification task on our data. Given a training set, we want to build a classifier that will accurately predict the test set
sklearn.cross_validation.train_test_split)sklearn.svm.SVC) to classify the data. Import this and instantiate the estimator.sklearn.metrics.accuracy_score to see how well you're doing.C parameter of SVC. Look at the SVC doc string and try some choices for the kernel, for C and for gamma. What's the best accuracy you can find?sklearn.metrics.classification_report and sklearn.metrics.confusion_matrix, and plot some of the images with the true and predicted label. How well does it do?