Title: Create A Pipeline In Pandas
Slug: pandas_create_pipeline
Summary: Create a pipeline in pandas.
Date: 2017-01-16 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon

Pandas' pipeline feature allows you to string together Python functions in order to build a pipeline of data processing.

Preliminaries



In [1]:

    
import pandas as pd

Create Dataframe



In [2]:

    
# Create empty dataframe
df = pd.DataFrame()

# Create a column
df['name'] = ['John', 'Steve', 'Sarah']
df['gender'] = ['Male', 'Male', 'Female']
df['age'] = [31, 32, 19]

# View dataframe
df









    Out[2]:






  
    
      
      name
      gender
      age
    
  
  
    
      0
      John
      Male
      31
    
    
      1
      Steve
      Male
      32
    
    
      2
      Sarah
      Female
      19

Create Functions To Process Data



In [3]:

    
# Create a function that
def mean_age_by_group(dataframe, col):
    # groups the data by a column and returns the mean age per group
    return dataframe.groupby(col).mean()



In [4]:

    
# Create a function that
def uppercase_column_name(dataframe):
    # Capitalizes all the column headers
    dataframe.columns = dataframe.columns.str.upper()
    # And returns them
    return dataframe

Create A Pipeline Of Those Functions



In [5]:

    
# Create a pipeline that applies the mean_age_by_group function
(df.pipe(mean_age_by_group, col='gender')
   # then applies the uppercase column name function
   .pipe(uppercase_column_name)
)