Title: One-Hot Encode Nominal Categorical Features
Slug: one-hot_encode_nominal_categorical_features
Summary: How to one-hot encode nominal categorical features for machine learning in Python.
Date: 2016-09-06 12:00
Category: Machine Learning
Tags: Preprocessing Structured Data
Authors: Chris Albon

Preliminaries


In [2]:
# Load libraries
from sklearn.preprocessing import LabelBinarizerr
import numpy as np
import pandas as pd

Create Data With One Class Label


In [3]:
# Create NumPy array
x = np.array([['Texas'], 
              ['California'], 
              ['Texas'], 
              ['Delaware'], 
              ['Texas']])

One-hot Encode Data (Method 1)


In [4]:
# Create LabelBinzarizer object
one_hot = LabelBinarizer()

# One-hot encode data
one_hot.fit_transform(x)


Out[4]:
array([[0, 0, 1],
       [1, 0, 0],
       [0, 0, 1],
       [0, 1, 0],
       [0, 0, 1]])

View Column Headers


In [5]:
# View classes
one_hot.classes_


Out[5]:
array(['California', 'Delaware', 'Texas'],
      dtype='<U10')

One-hot Encode Data (Method 2)


In [6]:
# Dummy feature
pd.get_dummies(x[:,0])


Out[6]:
California Delaware Texas
0 0 0 1
1 1 0 0
2 0 0 1
3 0 1 0
4 0 0 1