Title: Convert Pandas Categorical Data For Scikit-Learn
Slug: convert_pandas_categorical_column_into_integers_for_scikit-learn
Summary: Convert Pandas Categorical Column Into Integers For Scikit-Learn
Date: 2016-11-30 12:00
Category: Machine Learning
Tags: Preprocessing Structured Data
Authors: Chris Albon

Preliminaries


In [1]:
# Import required packages
from sklearn import preprocessing
import pandas as pd

Create DataFrame


In [2]:
raw_data = {'patient': [1, 1, 1, 2, 2],
        'obs': [1, 2, 3, 1, 2],
        'treatment': [0, 1, 0, 1, 0],
        'score': ['strong', 'weak', 'normal', 'weak', 'strong']}
df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])

Fit The Label Encoder


In [3]:
# Create a label (category) encoder object
le = preprocessing.LabelEncoder()

In [4]:
# Fit the encoder to the pandas column
le.fit(df['score'])


Out[4]:
LabelEncoder()

View The Labels


In [5]:
# View the labels (if you want)
list(le.classes_)


Out[5]:
['normal', 'strong', 'weak']

Transform Categories Into Integers


In [6]:
# Apply the fitted encoder to the pandas column
le.transform(df['score'])


Out[6]:
array([1, 2, 0, 2, 1])

Transform Integers Into Categories


In [7]:
# Convert some integers into their category names
list(le.inverse_transform([2, 2, 1]))


Out[7]:
['weak', 'weak', 'strong']