Title: Convert A String Categorical Variable With Patsy
Slug: pandas_convert_string_categorical_to_numeric_with_patsy
Summary: Convert A String Categorical Variable With Patsy
Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon
In [1]:
import pandas as pd
import patsy
In [2]:
raw_data = {'patient': [1, 1, 1, 0, 0],
'obs': [1, 2, 3, 1, 2],
'treatment': [0, 1, 0, 1, 0],
'score': ['strong', 'weak', 'normal', 'weak', 'strong']}
df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
df
Out[2]:
In [3]:
# On the 'score' variable in the df dataframe, convert to a categorical variable, and spit out a dataframe
patsy.dmatrix('score', df, return_type='dataframe')
Out[3]:
In [4]:
# On the 'score' variable in the df dataframe, convert to a categorical variable, and spit out a dataframe
patsy.dmatrix('score - 1', df, return_type='dataframe')
Out[4]:
In [5]:
patsy.dmatrix('patient + treatment + patient:treatment-1', df, return_type='dataframe')
Out[5]: