Title: Dimensionality Reduction With PCA
Slug: dimensionality_reduction_with_pca
Summary: How to reduce the dimensions of the feature matrix for machine learning in Python.
Date: 2017-09-13 12:00
Category: Machine Learning
Tags: Feature Engineering
Authors: Chris Albon
In [1]:
# Load libraries
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn import datasets
In [2]:
# Load the data
digits = datasets.load_digits()
In [3]:
# Standardize the feature matrix
X = StandardScaler().fit_transform(digits.data)
In [4]:
# Create a PCA that will retain 99% of the variance
pca = PCA(n_components=0.99, whiten=True)
# Conduct PCA
X_pca = pca.fit_transform(X)
In [5]:
# Show results
print('Original number of features:', X.shape[1])
print('Reduced number of features:', X_pca.shape[1])