Title: Group A Time Series With pandas
Slug: pandas_group_by_time Summary: Group A Time Series With pandas Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon

Import required modules


In [24]:
import pandas as pd
import numpy as np

Create a dataframe


In [25]:
df = pd.DataFrame()

df['german_army'] = np.random.randint(low=20000, high=30000, size=100)
df['allied_army'] = np.random.randint(low=20000, high=40000, size=100)
df.index = pd.date_range('1/1/2014', periods=100, freq='H')

df.head()


Out[25]:
german_army allied_army
2014-01-01 00:00:00 28755 33938
2014-01-01 01:00:00 25176 28631
2014-01-01 02:00:00 23261 39685
2014-01-01 03:00:00 28686 27756
2014-01-01 04:00:00 24588 25681

Truncate the dataframe


In [26]:
df.truncate(before='1/2/2014', after='1/3/2014')


Out[26]:
german_army allied_army
2014-01-02 00:00:00 26401 20189
2014-01-02 01:00:00 29958 23934
2014-01-02 02:00:00 24492 39075
2014-01-02 03:00:00 25707 39262
2014-01-02 04:00:00 27129 35961
2014-01-02 05:00:00 27903 25418
2014-01-02 06:00:00 20409 25163
2014-01-02 07:00:00 25736 34794
2014-01-02 08:00:00 24057 27209
2014-01-02 09:00:00 26875 33402
2014-01-02 10:00:00 23963 38575
2014-01-02 11:00:00 27506 31859
2014-01-02 12:00:00 23564 25750
2014-01-02 13:00:00 27958 24365
2014-01-02 14:00:00 24915 38866
2014-01-02 15:00:00 23538 33820
2014-01-02 16:00:00 23361 30080
2014-01-02 17:00:00 27284 22922
2014-01-02 18:00:00 24176 32155
2014-01-02 19:00:00 23924 27763
2014-01-02 20:00:00 23111 32343
2014-01-02 21:00:00 20348 28907
2014-01-02 22:00:00 27136 38634
2014-01-02 23:00:00 28649 29950
2014-01-03 00:00:00 21292 26395

Set the dataframe's index


In [27]:
df.index = df.index + pd.DateOffset(months=4, days=5)

View the dataframe


In [28]:
df.head()


Out[28]:
german_army allied_army
2014-05-06 00:00:00 28755 33938
2014-05-06 01:00:00 25176 28631
2014-05-06 02:00:00 23261 39685
2014-05-06 03:00:00 28686 27756
2014-05-06 04:00:00 24588 25681

Lead a variable 1 hour


In [29]:
df.shift(1).head()


Out[29]:
german_army allied_army
2014-05-06 00:00:00 NaN NaN
2014-05-06 01:00:00 28755.0 33938.0
2014-05-06 02:00:00 25176.0 28631.0
2014-05-06 03:00:00 23261.0 39685.0
2014-05-06 04:00:00 28686.0 27756.0

Lag a variable 1 hour


In [30]:
df.shift(-1).tail()


Out[30]:
german_army allied_army
2014-05-09 23:00:00 26903.0 39144.0
2014-05-10 00:00:00 27576.0 39759.0
2014-05-10 01:00:00 25232.0 35246.0
2014-05-10 02:00:00 23391.0 21044.0
2014-05-10 03:00:00 NaN NaN

Aggregate into days by summing up the value of each hourly observation


In [31]:
df.resample('D').sum()


Out[31]:
german_army allied_army
2014-05-06 605161 755962
2014-05-07 608100 740396
2014-05-08 589744 700297
2014-05-09 607092 719283
2014-05-10 103102 135193

Aggregate into days by averaging up the value of each hourly observation


In [32]:
df.resample('D').mean()


Out[32]:
german_army allied_army
2014-05-06 25215.041667 31498.416667
2014-05-07 25337.500000 30849.833333
2014-05-08 24572.666667 29179.041667
2014-05-09 25295.500000 29970.125000
2014-05-10 25775.500000 33798.250000

Aggregate into days by taking the min value up the value of each hourly observation


In [33]:
df.resample('D').median()


Out[33]:
german_army allied_army
2014-05-06 24882.0 31310.0
2014-05-07 25311.0 30969.5
2014-05-08 24422.5 28318.0
2014-05-09 24941.5 32082.5
2014-05-10 26067.5 37195.0

Aggregate into days by taking the median value of each day's worth of hourly observation


In [34]:
df.resample('D').median()


Out[34]:
german_army allied_army
2014-05-06 24882.0 31310.0
2014-05-07 25311.0 30969.5
2014-05-08 24422.5 28318.0
2014-05-09 24941.5 32082.5
2014-05-10 26067.5 37195.0

Aggregate into days by taking the first value of each day's worth of hourly observation


In [35]:
df.resample('D').first()


Out[35]:
german_army allied_army
2014-05-06 28755 33938
2014-05-07 26401 20189
2014-05-08 21292 26395
2014-05-09 25764 22613
2014-05-10 26903 39144

Aggregate into days by taking the last value of each day's worth of hourly observation


In [36]:
df.resample('D').last()


Out[36]:
german_army allied_army
2014-05-06 28214 32110
2014-05-07 28649 29950
2014-05-08 28379 32600
2014-05-09 26752 22379
2014-05-10 23391 21044

Aggregate into days by taking the first, last, highest, and lowest value of each day's worth of hourly observation


In [37]:
df.resample('D').ohlc()


Out[37]:
german_army allied_army
open high low close open high low close
2014-05-06 28755 29206 20037 28214 33938 39955 23417 32110
2014-05-07 26401 29958 20348 28649 20189 39262 20189 29950
2014-05-08 21292 29786 20296 28379 26395 38197 20404 32600
2014-05-09 25764 29952 20738 26752 22613 39695 20189 22379
2014-05-10 26903 27576 23391 23391 39144 39759 21044 21044