Title: Dimensionality Reduction On Sparse Feature Matrix
Slug: dimensionality_reduction_on_sparse_feature_matrix
Summary: How to conduct dimensionality reduction when the feature matrix is sparse using Python.
Date: 2017-09-13 12:00
Category: Machine Learning
Tags: Feature Engineering
Authors: Chris Albon
In [1]:
# Load libraries
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import TruncatedSVD
from scipy.sparse import csr_matrix
from sklearn import datasets
import numpy as np
In [2]:
# Load the data
digits = datasets.load_digits()
# Standardize the feature matrix
X = StandardScaler().fit_transform(digits.data)
# Make sparse matrix
X_sparse = csr_matrix(X)
In [3]:
# Create a TSVD
tsvd = TruncatedSVD(n_components=10)
In [4]:
# Conduct TSVD on sparse matrix
X_sparse_tsvd = tsvd.fit(X_sparse).transform(X_sparse)
In [5]:
# Show results
print('Original number of features:', X_sparse.shape[1])
print('Reduced number of features:', X_sparse_tsvd.shape[1])
In [6]:
# Sum of first three components' explained variance ratios
tsvd.explained_variance_ratio_[0:3].sum()
Out[6]: