Title: Chi-Squared For Feature Selection
Slug: chi-squared_for_feature_selection
Summary: How to remove irrelevant features using chi-squared for machine learning in Python.
Date: 2017-09-14 12:00
Category: Machine Learning
Tags: Feature Selection
Authors: Chris Albon
In [7]:
# Load libraries
from sklearn.datasets import load_iris
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import chi2
In [8]:
# Load iris data
iris = load_iris()
# Create features and target
X = iris.data
y = iris.target
# Convert to categorical data by converting data to integers
X = X.astype(int)
In [9]:
# Select two features with highest chi-squared statistics
chi2_selector = SelectKBest(chi2, k=2)
X_kbest = chi2_selector.fit_transform(X, y)
In [10]:
# Show results
print('Original number of features:', X.shape[1])
print('Reduced number of features:', X_kbest.shape[1])