Title: Create A Pipeline In Pandas
Slug: pandas_create_pipeline
Summary: Create a pipeline in pandas.
Date: 2017-01-16 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon
Pandas' pipeline feature allows you to string together Python functions in order to build a pipeline of data processing.
In [1]:
import pandas as pd
In [2]:
# Create empty dataframe
df = pd.DataFrame()
# Create a column
df['name'] = ['John', 'Steve', 'Sarah']
df['gender'] = ['Male', 'Male', 'Female']
df['age'] = [31, 32, 19]
# View dataframe
df
Out[2]:
In [3]:
# Create a function that
def mean_age_by_group(dataframe, col):
# groups the data by a column and returns the mean age per group
return dataframe.groupby(col).mean()
In [4]:
# Create a function that
def uppercase_column_name(dataframe):
# Capitalizes all the column headers
dataframe.columns = dataframe.columns.str.upper()
# And returns them
return dataframe
In [5]:
# Create a pipeline that applies the mean_age_by_group function
(df.pipe(mean_age_by_group, col='gender')
# then applies the uppercase column name function
.pipe(uppercase_column_name)
)
Out[5]: