Title: One-Hot Encode Features With Multiple Labels
Slug: one-hot_encode_features_with_multiple_labels
Summary: How to one-hot encode nominal categorical features with multiple labels per observation for machine learning in Python.
Date: 2016-09-06 12:00
Category: Machine Learning
Tags: Preprocessing Structured Data
Authors: Chris Albon
In [4]:
# Load libraries
from sklearn.preprocessing import MultiLabelBinarizer
import numpy as np
In [5]:
# Create NumPy array
y = [('Texas', 'Florida'),
('California', 'Alabama'),
('Texas', 'Florida'),
('Delware', 'Florida'),
('Texas', 'Alabama')]
In [6]:
# Create MultiLabelBinarizer object
one_hot = MultiLabelBinarizer()
# One-hot encode data
one_hot.fit_transform(y)
Out[6]:
In [7]:
# View classes
one_hot.classes_
Out[7]: