Title: Convert Pandas Categorical Data For Scikit-Learn
Slug: convert_pandas_categorical_column_into_integers_for_scikit-learn
Summary: Convert Pandas Categorical Column Into Integers For Scikit-Learn
Date: 2016-11-30 12:00
Category: Machine Learning
Tags: Preprocessing Structured Data
Authors: Chris Albon
In [1]:
# Import required packages
from sklearn import preprocessing
import pandas as pd
In [2]:
raw_data = {'patient': [1, 1, 1, 2, 2],
'obs': [1, 2, 3, 1, 2],
'treatment': [0, 1, 0, 1, 0],
'score': ['strong', 'weak', 'normal', 'weak', 'strong']}
df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
In [3]:
# Create a label (category) encoder object
le = preprocessing.LabelEncoder()
In [4]:
# Fit the encoder to the pandas column
le.fit(df['score'])
Out[4]:
In [5]:
# View the labels (if you want)
list(le.classes_)
Out[5]:
In [6]:
# Apply the fitted encoder to the pandas column
le.transform(df['score'])
Out[6]:
In [7]:
# Convert some integers into their category names
list(le.inverse_transform([2, 2, 1]))
Out[7]: