Title: Breaking Up A String Into Columns Using Regex In Pandas
Slug: pandas_regex_to_create_columns
Summary: Breaking Up A String Into Columns Using Regex In Pandas
Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon
Based on this tutorial in nbviewer.
In [1]:
import re
import pandas as pd
In [2]:
# Create a dataframe with a single column of strings
data = {'raw': ['Arizona 1 2014-12-23 3242.0',
'Iowa 1 2010-02-23 3453.7',
'Oregon 0 2014-06-20 2123.0',
'Maryland 0 2014-03-14 1123.6',
'Florida 1 2013-01-15 2134.0',
'Georgia 0 2012-07-14 2345.6']}
df = pd.DataFrame(data, columns = ['raw'])
df
Out[2]:
In [3]:
# Which rows of df['raw'] contain 'xxxx-xx-xx'?
df['raw'].str.contains('....-..-..', regex=True)
Out[3]:
In [4]:
# In the column 'raw', extract single digit in the strings
df['female'] = df['raw'].str.extract('(\d)', expand=True)
df['female']
Out[4]:
In [5]:
# In the column 'raw', extract xxxx-xx-xx in the strings
df['date'] = df['raw'].str.extract('(....-..-..)', expand=True)
df['date']
Out[5]:
In [6]:
# In the column 'raw', extract ####.## in the strings
df['score'] = df['raw'].str.extract('(\d\d\d\d\.\d)', expand=True)
df['score']
Out[6]:
In [7]:
# In the column 'raw', extract the word in the strings
df['state'] = df['raw'].str.extract('([A-Z]\w{0,})', expand=True)
df['state']
Out[7]:
In [8]:
df
Out[8]: