Title: Create A Pipeline In Pandas
Slug: pandas_create_pipeline
Summary: Create a pipeline in pandas.
Date: 2017-01-16 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon

Pandas' pipeline feature allows you to string together Python functions in order to build a pipeline of data processing.

Preliminaries


In [1]:
import pandas as pd

Create Dataframe


In [2]:
# Create empty dataframe
df = pd.DataFrame()

# Create a column
df['name'] = ['John', 'Steve', 'Sarah']
df['gender'] = ['Male', 'Male', 'Female']
df['age'] = [31, 32, 19]

# View dataframe
df


Out[2]:
name gender age
0 John Male 31
1 Steve Male 32
2 Sarah Female 19

Create Functions To Process Data


In [3]:
# Create a function that
def mean_age_by_group(dataframe, col):
    # groups the data by a column and returns the mean age per group
    return dataframe.groupby(col).mean()

In [4]:
# Create a function that
def uppercase_column_name(dataframe):
    # Capitalizes all the column headers
    dataframe.columns = dataframe.columns.str.upper()
    # And returns them
    return dataframe

Create A Pipeline Of Those Functions


In [5]:
# Create a pipeline that applies the mean_age_by_group function
(df.pipe(mean_age_by_group, col='gender')
   # then applies the uppercase column name function
   .pipe(uppercase_column_name)
)


Out[5]:
AGE
gender
Female 19.0
Male 31.5