In [1]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# use seaborn plotting defaults
import seaborn as sns; sns.set()
We'll use this code to load the data:
In [2]:
from sklearn.datasets import fetch_lfw_people
faces = fetch_lfw_people(min_faces_per_person=70, resize=0.4)
X, y = faces.data, faces.target
The mean of the data (found in the mean_ attribute) and each component of the data (found in the rows of the components_ attribute) can be reshaped and interpreted as an image.
plt.imshowcomponents_ matrixYou'll have to play around with the colormap and grid settings to make this look OK
For several faces, plot the true image plus the reconstruction (computed using inverse_transform) for several different values of n_components. (you might even use IPython's interactive functions to make this exploration easier).
Does the 90% variance choice seem to correspond to a good visual representation of each picture?
Note: As you experiment with this, you may want to use RandomizedPCA rather than PCA for this task. RandomizedPCA is an approximate method with the same interface as PCA, but operates much more quickly.