Title: Pandas Time Series Basics
Slug: pandas_time_series_basics
Summary: Pandas Time Series Basics
Date: 2016-05-01 12:00
Category: Python
Tags: Data Wrangling
Authors: Chris Albon

Import modules


In [15]:
from datetime import datetime
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as pyplot

Create a dataframe


In [16]:
data = {'date': ['2014-05-01 18:47:05.069722', '2014-05-01 18:47:05.119994', '2014-05-02 18:47:05.178768', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.230071', '2014-05-02 18:47:05.280592', '2014-05-03 18:47:05.332662', '2014-05-03 18:47:05.385109', '2014-05-04 18:47:05.436523', '2014-05-04 18:47:05.486877'], 
        'battle_deaths': [34, 25, 26, 15, 15, 14, 26, 25, 62, 41]}
df = pd.DataFrame(data, columns = ['date', 'battle_deaths'])
print(df)


                         date  battle_deaths
0  2014-05-01 18:47:05.069722             34
1  2014-05-01 18:47:05.119994             25
2  2014-05-02 18:47:05.178768             26
3  2014-05-02 18:47:05.230071             15
4  2014-05-02 18:47:05.230071             15
5  2014-05-02 18:47:05.280592             14
6  2014-05-03 18:47:05.332662             26
7  2014-05-03 18:47:05.385109             25
8  2014-05-04 18:47:05.436523             62
9  2014-05-04 18:47:05.486877             41

Convert df['date'] from string to datetime


In [17]:
df['date'] = pd.to_datetime(df['date'])

Set df['date'] as the index and delete the column


In [18]:
df.index = df['date']
del df['date']
df


Out[18]:
battle_deaths
date
2014-05-01 18:47:05.069722 34
2014-05-01 18:47:05.119994 25
2014-05-02 18:47:05.178768 26
2014-05-02 18:47:05.230071 15
2014-05-02 18:47:05.230071 15
2014-05-02 18:47:05.280592 14
2014-05-03 18:47:05.332662 26
2014-05-03 18:47:05.385109 25
2014-05-04 18:47:05.436523 62
2014-05-04 18:47:05.486877 41

View all observations that occured in 2014


In [19]:
df['2014']


Out[19]:
battle_deaths
date
2014-05-01 18:47:05.069722 34
2014-05-01 18:47:05.119994 25
2014-05-02 18:47:05.178768 26
2014-05-02 18:47:05.230071 15
2014-05-02 18:47:05.230071 15
2014-05-02 18:47:05.280592 14
2014-05-03 18:47:05.332662 26
2014-05-03 18:47:05.385109 25
2014-05-04 18:47:05.436523 62
2014-05-04 18:47:05.486877 41

View all observations that occured in May 2014


In [20]:
df['2014-05']


Out[20]:
battle_deaths
date
2014-05-01 18:47:05.069722 34
2014-05-01 18:47:05.119994 25
2014-05-02 18:47:05.178768 26
2014-05-02 18:47:05.230071 15
2014-05-02 18:47:05.230071 15
2014-05-02 18:47:05.280592 14
2014-05-03 18:47:05.332662 26
2014-05-03 18:47:05.385109 25
2014-05-04 18:47:05.436523 62
2014-05-04 18:47:05.486877 41

Observations after May 3rd, 2014


In [21]:
df[datetime(2014, 5, 3):]


Out[21]:
battle_deaths
date
2014-05-03 18:47:05.332662 26
2014-05-03 18:47:05.385109 25
2014-05-04 18:47:05.436523 62
2014-05-04 18:47:05.486877 41

Observations between May 3rd and May 4th


In [22]:
df['5/3/2014':'5/4/2014']


Out[22]:
battle_deaths
date
2014-05-03 18:47:05.332662 26
2014-05-03 18:47:05.385109 25
2014-05-04 18:47:05.436523 62
2014-05-04 18:47:05.486877 41

Truncation observations after May 2nd 2014


In [23]:
df.truncate(after='5/3/2014')


Out[23]:
battle_deaths
date
2014-05-01 18:47:05.069722 34
2014-05-01 18:47:05.119994 25
2014-05-02 18:47:05.178768 26
2014-05-02 18:47:05.230071 15
2014-05-02 18:47:05.230071 15
2014-05-02 18:47:05.280592 14

Observations of May 2014


In [24]:
df.ix['5-2014']


Out[24]:
battle_deaths
date
2014-05-01 18:47:05.069722 34
2014-05-01 18:47:05.119994 25
2014-05-02 18:47:05.178768 26
2014-05-02 18:47:05.230071 15
2014-05-02 18:47:05.230071 15
2014-05-02 18:47:05.280592 14
2014-05-03 18:47:05.332662 26
2014-05-03 18:47:05.385109 25
2014-05-04 18:47:05.436523 62
2014-05-04 18:47:05.486877 41

Count the number of observations per timestamp


In [25]:
df.groupby(level=0).count()


Out[25]:
battle_deaths
date
2014-05-01 18:47:05.069722 1
2014-05-01 18:47:05.119994 1
2014-05-02 18:47:05.178768 1
2014-05-02 18:47:05.230071 2
2014-05-02 18:47:05.280592 1
2014-05-03 18:47:05.332662 1
2014-05-03 18:47:05.385109 1
2014-05-04 18:47:05.436523 1
2014-05-04 18:47:05.486877 1

Mean value of battle_deaths per day


In [26]:
df.resample('D').mean()


Out[26]:
battle_deaths
date
2014-05-01 29.5
2014-05-02 17.5
2014-05-03 25.5
2014-05-04 51.5

Total value of battle_deaths per day


In [27]:
df.resample('D').sum()


Out[27]:
battle_deaths
date
2014-05-01 59
2014-05-02 70
2014-05-03 51
2014-05-04 103

Plot of the total battle deaths per day


In [28]:
df.resample('D').sum().plot()


Out[28]:
<matplotlib.axes._subplots.AxesSubplot at 0x1148a9860>