Title: Variance Thresholding For Feature Selection
Slug: variance_thresholding_for_feature_selection
Summary: How to select the best features for machine learning using variance thresholding in Python.
Date: 2017-09-14 12:00
Category: Machine Learning
Tags: Feature Selection Authors: Chris Albon

Preliminaries


In [1]:
from sklearn import datasets
from sklearn.feature_selection import VarianceThreshold

Load Data


In [2]:
# Load iris data
iris = datasets.load_iris()

# Create features and target
X = iris.data
y = iris.target

Conduct Variance Thresholding


In [3]:
# Create VarianceThreshold object with a variance with a threshold of 0.5
thresholder = VarianceThreshold(threshold=.5)

# Conduct variance thresholding
X_high_variance = thresholder.fit_transform(X)

View high variance features


In [4]:
# View first five rows with features with variances above threshold
X_high_variance[0:5]


Out[4]:
array([[ 5.1,  1.4,  0.2],
       [ 4.9,  1.4,  0.2],
       [ 4.7,  1.3,  0.2],
       [ 4.6,  1.5,  0.2],
       [ 5. ,  1.4,  0.2]])