Title: Make New Columns Using Functions
Slug: pandas_make_new_columns_using_functions
Summary: Make New Columns Using Functions
Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon


In [1]:
# Import modules
import pandas as pd

In [2]:
# Example dataframe
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'], 
        'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'], 
        'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
        'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
df


Out[2]:
regiment company name preTestScore postTestScore
0 Nighthawks 1st Miller 4 25
1 Nighthawks 1st Jacobson 24 94
2 Nighthawks 2nd Ali 31 57
3 Nighthawks 2nd Milner 2 62
4 Dragoons 1st Cooze 3 70
5 Dragoons 1st Jacon 4 25
6 Dragoons 2nd Ryaner 24 94
7 Dragoons 2nd Sone 31 57
8 Scouts 1st Sloan 2 62
9 Scouts 1st Piger 3 70
10 Scouts 2nd Riani 2 62
11 Scouts 2nd Ali 3 70

Create one column as a function of two columns


In [3]:
# Create a function that takes two inputs, pre and post
def pre_post_difference(pre, post):
    # returns the difference between post and pre
    return post - pre

In [4]:
# Create a variable that is the output of the function
df['score_change'] = pre_post_difference(df['preTestScore'], df['postTestScore'])

# View the dataframe
df


Out[4]:
regiment company name preTestScore postTestScore score_change
0 Nighthawks 1st Miller 4 25 21
1 Nighthawks 1st Jacobson 24 94 70
2 Nighthawks 2nd Ali 31 57 26
3 Nighthawks 2nd Milner 2 62 60
4 Dragoons 1st Cooze 3 70 67
5 Dragoons 1st Jacon 4 25 21
6 Dragoons 2nd Ryaner 24 94 70
7 Dragoons 2nd Sone 31 57 26
8 Scouts 1st Sloan 2 62 60
9 Scouts 1st Piger 3 70 67
10 Scouts 2nd Riani 2 62 60
11 Scouts 2nd Ali 3 70 67

Create two columns as a function of one column


In [5]:
# Create a function that takes one input, x
def score_multipler_2x_and_3x(x):
    # returns two things, x multiplied by 2 and x multiplied by 3
    return x*2, x*3

In [6]:
# Create two new variables that take the two outputs of the function
df['post_score_x2'], df['post_score_x3'] = zip(*df['postTestScore'].map(score_multipler_2x_and_3x))
df


Out[6]:
regiment company name preTestScore postTestScore score_change post_score_x2 post_score_x3
0 Nighthawks 1st Miller 4 25 21 50 75
1 Nighthawks 1st Jacobson 24 94 70 188 282
2 Nighthawks 2nd Ali 31 57 26 114 171
3 Nighthawks 2nd Milner 2 62 60 124 186
4 Dragoons 1st Cooze 3 70 67 140 210
5 Dragoons 1st Jacon 4 25 21 50 75
6 Dragoons 2nd Ryaner 24 94 70 188 282
7 Dragoons 2nd Sone 31 57 26 114 171
8 Scouts 1st Sloan 2 62 60 124 186
9 Scouts 1st Piger 3 70 67 140 210
10 Scouts 2nd Riani 2 62 60 124 186
11 Scouts 2nd Ali 3 70 67 140 210