Title: Dimensionality Reduction With PCA
Slug: dimensionality_reduction_with_pca
Summary: How to reduce the dimensions of the feature matrix for machine learning in Python. Date: 2017-09-13 12:00
Category: Machine Learning
Tags: Feature Engineering Authors: Chris Albon

Preliminaries



In [1]:

    
# Load libraries
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn import datasets

Load Data



In [2]:

    
# Load the data
digits = datasets.load_digits()

Standardize Feature Values



In [3]:

    
# Standardize the feature matrix
X = StandardScaler().fit_transform(digits.data)

Conduct Principal Component Analysis



In [4]:

    
# Create a PCA that will retain 99% of the variance
pca = PCA(n_components=0.99, whiten=True)

# Conduct PCA
X_pca = pca.fit_transform(X)

View Results



In [5]:

    
# Show results
print('Original number of features:', X.shape[1])
print('Reduced number of features:', X_pca.shape[1])









    



Original number of features: 64
Reduced number of features: 54