Title: Apply Operations To Groups In Pandas
Slug: pandas_apply_operations_to_groups
Summary: Apply Operations To Groups In Pandas
Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon
In [1]:
# import modules
import pandas as pd
In [2]:
# Create dataframe
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
df
Out[2]:
In [3]:
# Create a groupby variable that groups preTestScores by regiment
groupby_regiment = df['preTestScore'].groupby(df['regiment'])
groupby_regiment
Out[3]:
"This grouped variable is now a GroupBy object. It has not actually computed anything yet except for some intermediate data about the group key df['key1']. The idea is that this object has all of the information needed to then apply some operation to each of the groups." - Python for Data Analysis
In [4]:
list(df['preTestScore'].groupby(df['regiment']))
Out[4]:
In [5]:
df['preTestScore'].groupby(df['regiment']).describe()
Out[5]:
In [6]:
groupby_regiment.mean()
Out[6]:
In [7]:
df['preTestScore'].groupby([df['regiment'], df['company']]).mean()
Out[7]:
In [8]:
df['preTestScore'].groupby([df['regiment'], df['company']]).mean().unstack()
Out[8]:
In [9]:
df.groupby(['regiment', 'company']).mean()
Out[9]:
In [10]:
df.groupby(['regiment', 'company']).size()
Out[10]:
In [11]:
# Group the dataframe by regiment, and for each regiment,
for name, group in df.groupby('regiment'):
# print the name of the regiment
print(name)
# print the data of that regiment
print(group)
In [12]:
list(df.groupby(df.dtypes, axis=1))
Out[12]:
In [13]:
df.groupby('regiment').mean().add_prefix('mean_')
Out[13]:
In [14]:
def get_stats(group):
return {'min': group.min(), 'max': group.max(), 'count': group.count(), 'mean': group.mean()}
In [15]:
bins = [0, 25, 50, 75, 100]
group_names = ['Low', 'Okay', 'Good', 'Great']
df['categories'] = pd.cut(df['postTestScore'], bins, labels=group_names)
In [16]:
df['postTestScore'].groupby(df['categories']).apply(get_stats).unstack()
Out[16]: