Title: Feature Importance
Slug: feature_importance
Summary: How to identify important features in random forest in scikit-learn.
Date: 2017-09-21 12:00
Category: Machine Learning
Tags: Trees And Forests
Authors: Chris Albon

Preliminaries


In [1]:
# Load libraries
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt

Load Iris Flower Dataset


In [2]:
# Load data
iris = datasets.load_iris()
X = iris.data
y = iris.target

Train A Decision Tree Model


In [3]:
# Create decision tree classifer object
clf = RandomForestClassifier(random_state=0, n_jobs=-1)

# Train model
model = clf.fit(X, y)

View Feature Importance


In [4]:
# Calculate feature importances
importances = model.feature_importances_

Visualize Feature Importance


In [5]:
# Sort feature importances in descending order
indices = np.argsort(importances)[::-1]

# Rearrange feature names so they match the sorted feature importances
names = [iris.feature_names[i] for i in indices]

# Create plot
plt.figure()

# Create plot title
plt.title("Feature Importance")

# Add bars
plt.bar(range(X.shape[1]), importances[indices])

# Add feature names as x-axis labels
plt.xticks(range(X.shape[1]), names, rotation=90)

# Show plot
plt.show()