Title: Select Important Features In Random Forest
Slug: select_important_features_in_random_forest
Summary: How to select important features in random forest in scikit-learn.
Date: 2017-09-21 12:00
Category: Machine Learning
Tags: Trees And Forests
Authors: Chris Albon
In [1]:
# Load libraries
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
from sklearn.feature_selection import SelectFromModel
In [2]:
# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target
In [3]:
# Create random forest classifier
clf = RandomForestClassifier(random_state=0, n_jobs=-1)
In [4]:
# Create object that selects features with importance greater than or equal to a threshold
selector = SelectFromModel(clf, threshold=0.3)
# Feature new feature matrix using selector
X_important = selector.fit_transform(X, y)
In [7]:
# View first five observations of the features
X_important[0:5]
Out[7]:
In [6]:
# Train random forest using most important featres
model = clf.fit(X_important, y)