Title: Applying Operations Over pandas Dataframes
Slug: pandas_apply_operations_to_dataframes
Summary: Applying Operations Over pandas Dataframes
Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon

Import Modules


In [1]:
import pandas as pd
import numpy as np

Create a dataframe


In [2]:
data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'year': [2012, 2012, 2013, 2014, 2014], 
        'reports': [4, 24, 31, 2, 3],
        'coverage': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, index = ['Cochice', 'Pima', 'Santa Cruz', 'Maricopa', 'Yuma'])
df


Out[2]:
coverage name reports year
Cochice 25 Jason 4 2012
Pima 94 Molly 24 2012
Santa Cruz 57 Tina 31 2013
Maricopa 62 Jake 2 2014
Yuma 70 Amy 3 2014

Create a capitalization lambda function


In [3]:
capitalizer = lambda x: x.upper()

Apply the capitalizer function over the column 'name'

apply() can apply a function along any axis of the dataframe


In [4]:
df['name'].apply(capitalizer)


Out[4]:
Cochice       JASON
Pima          MOLLY
Santa Cruz     TINA
Maricopa       JAKE
Yuma            AMY
Name: name, dtype: object

Map the capitalizer lambda function over each element in the series 'name'

map() applies an operation over each element of a series


In [5]:
df['name'].map(capitalizer)


Out[5]:
Cochice       JASON
Pima          MOLLY
Santa Cruz     TINA
Maricopa       JAKE
Yuma            AMY
Name: name, dtype: object

Apply a square root function to every single cell in the whole data frame

applymap() applies a function to every single element in the entire dataframe.


In [6]:
# Drop the string variable so that applymap() can run
df = df.drop('name', axis=1)

# Return the square root of every cell in the dataframe
df.applymap(np.sqrt)


Out[6]:
coverage reports year
Cochice 5.000000 2.000000 44.855323
Pima 9.695360 4.898979 44.855323
Santa Cruz 7.549834 5.567764 44.866469
Maricopa 7.874008 1.414214 44.877611
Yuma 8.366600 1.732051 44.877611

Applying A Function Over A Dataframe

Create a function that multiplies all non-strings by 100


In [7]:
# create a function called times100
def times100(x):
    # that, if x is a string,
    if type(x) is str:
        # just returns it untouched
        return x
    # but, if not, return it multiplied by 100
    elif x:
        return 100 * x
    # and leave everything else
    else:
        return

Apply the times100 over every cell in the dataframe


In [8]:
df.applymap(times100)


Out[8]:
coverage reports year
Cochice 2500 400 201200
Pima 9400 2400 201200
Santa Cruz 5700 3100 201300
Maricopa 6200 200 201400
Yuma 7000 300 201400