Title: Group Data By Time
Slug: pandas_group_data_by_time
Summary: Group Data By Time
Date: 2016-03-11 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon

On March 13, 2016, version 0.18.0 of Pandas was released, with significant changes in how the resampling function operates. This tutorial follows v0.18.0 and will not work for previous versions of pandas.

First let's load the modules we care about

Preliminaries



In [1]:

    
# Import required packages
import pandas as pd
import datetime
import numpy as np

Next, let's create some sample data that we can group by time as an sample. In this example I am creating a dataframe with two columns with 365 rows. One column is a date, the second column is a numeric value.

Create Data



In [2]:

    
# Create a datetime variable for today
base = datetime.datetime.today()
# Create a list variable that creates 365 days of rows of datetime values
date_list = [base - datetime.timedelta(days=x) for x in range(0, 365)]



In [3]:

    
# Create a list variable of 365 numeric values
score_list = list(np.random.randint(low=1, high=1000, size=365))



In [4]:

    
# Create an empty dataframe
df = pd.DataFrame()

# Create a column from the datetime variable
df['datetime'] = date_list
# Convert that column into a datetime datatype
df['datetime'] = pd.to_datetime(df['datetime'])
# Set the datetime column as the index
df.index = df['datetime'] 
# Create a column from the numeric score variable
df['score'] = score_list



In [5]:

    
# Let's take a took at the data
df.head()









    Out[5]:






  
    
      
      datetime
      score
    
    
      datetime
      
      
    
  
  
    
      2016-06-02 09:57:54.793972
      2016-06-02 09:57:54.793972
      900
    
    
      2016-06-01 09:57:54.793972
      2016-06-01 09:57:54.793972
      121
    
    
      2016-05-31 09:57:54.793972
      2016-05-31 09:57:54.793972
      547
    
    
      2016-05-30 09:57:54.793972
      2016-05-30 09:57:54.793972
      504
    
    
      2016-05-29 09:57:54.793972
      2016-05-29 09:57:54.793972
      304

Group Data By Date

In pandas, the most common way to group by time is to use the .resample() function. In v0.18.0 this function is two-stage. This means that 'df.resample('M')' creates an object to which we can apply other functions ('mean', 'count', 'sum', etc.)



In [6]:

    
# Group the data by month, and take the mean for each group (i.e. each month)
df.resample('M').mean()



In [7]:

    
# Group the data by month, and take the sum for each group (i.e. each month)
df.resample('M').sum()

Grouping Options

There are many options for grouping. You can learn more about them in Pandas's timeseries docs, however, I have also listed them below for your convience.

Value	Description
B	business day frequency
C	custom business day frequency (experimental)
D	calendar day frequency
W	weekly frequency
M	month end frequency
BM	business month end frequency
CBM	custom business month end frequency
MS	month start frequency
BMS	business month start frequency
CBMS	custom business month start frequency
Q	quarter end frequency
BQ	business quarter endfrequency
QS	quarter start frequency
BQS	business quarter start frequency
A	year end frequency
BA	business year end frequency
AS	year start frequency
BAS	business year start frequency
BH	business hour frequency
H	hourly frequency
T	minutely frequency
S	secondly frequency
L	milliseonds
U	microseconds
N	nanosecondsa

	score
datetime
2015-06-30	513.629630
2015-07-31	561.516129
2015-08-31	448.032258
2015-09-30	548.000000
2015-10-31	480.419355
2015-11-30	487.033333
2015-12-31	499.935484
2016-01-31	429.193548
2016-02-29	520.413793
2016-03-31	349.806452
2016-04-30	395.500000
2016-05-31	503.451613
2016-06-30	510.500000

	score
datetime
2015-06-30	13868
2015-07-31	17407
2015-08-31	13889
2015-09-30	16440
2015-10-31	14893
2015-11-30	14611
2015-12-31	15498
2016-01-31	13305
2016-02-29	15092
2016-03-31	10844
2016-04-30	11865
2016-05-31	15607
2016-06-30	1021

	datetime	score
datetime
2016-06-02 09:57:54.793972	2016-06-02 09:57:54.793972	900
2016-06-01 09:57:54.793972	2016-06-01 09:57:54.793972	121
2016-05-31 09:57:54.793972	2016-05-31 09:57:54.793972	547
2016-05-30 09:57:54.793972	2016-05-30 09:57:54.793972	504
2016-05-29 09:57:54.793972	2016-05-29 09:57:54.793972	304