In [2]:
# Loading the libraries
import pandas as pd
import numpy as np
import time
import datetime

Table of contents

Python: Date Time

1. Converting strings to datetime objects

The best way to understand how to handle dates with Python is to use an example. Here we will use a very small dataset with only 10 observations. The first column shows the date and the second shows the temperature in celsius degrees for that day.


In [32]:
# Loading the dataset 
data = pd.read_csv('parse_date.csv', sep=',')

# Show the head - this is actually the entire data set. 
data.head(10)


Out[32]:
date temperature
0 01/01/2013 4
1 02/01/2013 0
2 03/01/2013 5
3 04/01/2013 7
4 05/01/2013 3
5 06/01/2013 1
6 07/01/2013 6
7 08/01/2013 10
8 09/01/2013 4
9 10/01/2013 -1

In [36]:
# Selecting the cell on the first row and the first column
print data.iloc[0,0]
# Printing the type of this cell
type(data.iloc[0,0])


01/01/2013
Out[36]:
str

We see here that when loaded into our dataframe, the values in the 'date' column are considered as strings and not as real dates. The first step to take is therefore to convert these strings into datetime objects that we can easily manipulate. Fortunately, Python has the very useful strptime(data, format) method that does exactly what we want. Let's see how this works. Note that the strptime() method is part of the datetime class of the datetime module, hence the datetime.datetime.strptime().


In [9]:
# Let's take only one date value in our table and convert it into a datatime object
date = datetime.datetime.strptime(data.iloc[0,0],'%d/%m/%Y').date()

# Print the new datetime object
print date 
# Check the type of the converted value
print type(date)


2013-01-01
<type 'datetime.date'>

Success! Our string is now a datetime object that we can format it as we wish. Now let's convert all the date strings into datetime objects.


In [12]:
# Create a function that will transform all the strings into dates.
def parse_date(raw_date):
    return datetime.datetime.strptime(raw_date, '%d/%m/%Y').date()

In [13]:
# Let's apply our format to all the dates in our DataFrame
data['date'] = data['date'].map(parse_date)

Word of caution: Whenever you define your function, make sure that the date format you specify in the strptime() method is the same as the one in the date string. In our case, the string ('01/01/2013') has the following format: it starts with the day of the month %d ('01'), followed by a '/', followed by the month %m (01), another '/' and ends up with the full year %Y (2013).

To learn more about the different formats, check this link.


In [14]:
# Let's have a look at the whold dataset
data.head()


Out[14]:
date temperature
0 2013-01-01 4
1 2013-01-02 0
2 2013-01-03 5
3 2013-01-04 7
4 2013-01-05 3

In [15]:
type(data.iloc[0,0])


Out[15]:
datetime.date

2. Converting a datetime object to a string

Now that our dates are properly transformed our strings into date objects we can manipulate these dates as much as we want. One thing that is often required to change the format of the date. Python has another method called strftime(format) (yes with an f) that does the opposite of the strptime() method by converting a datetime object into a string. Let's see how we can convert our dates with the strftime() method.

* Long Date


In [16]:
# We want to have the complete date - See the new column "Long date" for the result
data['Long Date'] = data['date'].map(lambda x: x.strftime('%A, %B %d, %Y'))
data.head()


Out[16]:
date temperature Long Date
0 2013-01-01 4 Tuesday, January 01, 2013
1 2013-01-02 0 Wednesday, January 02, 2013
2 2013-01-03 5 Thursday, January 03, 2013
3 2013-01-04 7 Friday, January 04, 2013
4 2013-01-05 3 Saturday, January 05, 2013

* Short Date


In [46]:
# Now we want to just see a short date - see the "Short date" column
data['Short Date'] = data['date'].map(lambda x: x.strftime('%b %d, %Y'))
data.head()


Out[46]:
date temperature Long Date Short Date
0 2013-01-01 4 Tuesday, January 01, 2013 Jan 01, 2013
1 2013-01-02 0 Wednesday, January 02, 2013 Jan 02, 2013
2 2013-01-03 5 Thursday, January 03, 2013 Jan 03, 2013
3 2013-01-04 7 Friday, January 04, 2013 Jan 04, 2013
4 2013-01-05 3 Saturday, January 05, 2013 Jan 05, 2013

* Weekday

Sometimes you may want to analyze the difference between days of the week. Maybe for instance you want to know if there is more traffic on Mondays than on Sundays. One cool trick is to convert the dates into their corresponding weekdays.


In [47]:
# Add the weekday column to our dataframe
data['weekday'] = data['date'].map(lambda x: x.strftime('%A'))
data.head()


Out[47]:
date temperature Long Date Short Date weekday
0 2013-01-01 4 Tuesday, January 01, 2013 Jan 01, 2013 Tuesday
1 2013-01-02 0 Wednesday, January 02, 2013 Jan 02, 2013 Wednesday
2 2013-01-03 5 Thursday, January 03, 2013 Jan 03, 2013 Thursday
3 2013-01-04 7 Friday, January 04, 2013 Jan 04, 2013 Friday
4 2013-01-05 3 Saturday, January 05, 2013 Jan 05, 2013 Saturday

* Month

Similar to the weekday, you can now analyze the time series by looking at the difference between months.


In [48]:
# Add the month column to our dataframe
data['month'] = data['date'].map(lambda x: x.strftime('%B'))
data.head()


Out[48]:
date temperature Long Date Short Date weekday month
0 2013-01-01 4 Tuesday, January 01, 2013 Jan 01, 2013 Tuesday January
1 2013-01-02 0 Wednesday, January 02, 2013 Jan 02, 2013 Wednesday January
2 2013-01-03 5 Thursday, January 03, 2013 Jan 03, 2013 Thursday January
3 2013-01-04 7 Friday, January 04, 2013 Jan 04, 2013 Friday January
4 2013-01-05 3 Saturday, January 05, 2013 Jan 05, 2013 Saturday January

* Other format examples

The configurations are endless, here are some other examples of format that you may want to use.

  • %B %d, %Y => February 10, 2014
  • %b %d, %Y => Feb 10, 2014
  • %m/%d/%y => 2/10/14
  • %d-%m-%Y => 10-02-2014
  • %Y-%m-%d => 2014-02-10
  • %f => 1380213810

Pandas:

Pandas is also extremely powerful with time series and you can use it to easily convert, generate and index dates. One difference to have in mind when using Pandas is the one between a timestamp and a time span. A timestamp refers to a specific point in time while a time span is a period. Let's explore this difference in more details.

1. Timestamps


In [33]:
# Create a timestamp using Timestamp
timestamps = pd.Timestamp('2014-01-01')

print timestamps


2014-01-01 00:00:00

2. Time spans

With time spans you can create any type of period: Hour, Day, Week, Month, Quarter...


In [24]:
# An hour
hour = pd.Period('2014-01', freq='H')
print hour


2014-01-01 00:00

In [25]:
# A week
week = pd.Period('2014-01', freq='W')
print week


2013-12-30/2014-01-05

In [27]:
# A month
month = pd.Period('2014-01', freq='M')
print month


2014-01

In [29]:
# A quarter
quarter = pd.Period('2014-01', freq='Q')
print quarter


2014Q1

3. Converting dates

With Pandas you can also convert strings into dates using the to_datetime() method. Let's try with our dataset.


In [59]:
# Converting dates to timestamps
timestamp = pd.to_datetime(data['date'])
print timestamp


0   2013-01-01 00:00:00
1   2013-01-02 00:00:00
2   2013-01-03 00:00:00
3   2013-01-04 00:00:00
4   2013-01-05 00:00:00
5   2013-01-06 00:00:00
6   2013-01-07 00:00:00
7   2013-01-08 00:00:00
8   2013-01-09 00:00:00
9   2013-01-10 00:00:00
Name: date, dtype: datetime64[ns]

4. Indexing dates from converted data

You might want to you use the dates as index for your dataframe but if you look at the cell above, converting the dates with to_datetime() will not do this automatically. To use the dates as index, you will need one extra step. You can either use the DateTimeIndex() - for timestamps - or PeriodIndex() method - for time spans.


In [64]:
# Array of timestamps used as index
date_index = pd.DatetimeIndex(timestamp, freq='D')

print date_index

# Array of periods used as index
period_index = pd.PeriodIndex(timestamp, freq= 'D') 

print period_index


<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00, ..., 2013-01-10 00:00:00]
Length: 10, Freq: D, Timezone: None
PeriodIndex([u'2013-01-01', u'2013-01-02', u'2013-01-03', u'2013-01-04', u'2013-01-05', u'2013-01-06', u'2013-01-07', u'2013-01-08', u'2013-01-09', u'2013-01-10'], freq='D')

Please note that the offset aliases are different between Python and Pandas. To see the complete list of aliases please check the following link

5. Generating data

Until now we have worked with dates extracted from a csv file but what if we want to generate a new time series? Pandas has of course some handful tools for this too. Use the method date_range(). With this method you can either specify the start date and the number of periods you need or specify both the start and the end dates. Both methods are presented in the cell below.


In [36]:
# You specify the number of periods - here 24 months
rng = pd.date_range('1/1/2014', periods=24, freq='M')

# You can also specify the begining and the end of your data series
start = '1/1/2013'
end = '1/1/2014'

rng = pd.date_range(start, end, freq='M')

6. Indexing dates from generated data

If you want to generate a time series of random values and use your dates as index, follow these 2 steps:


In [3]:
rng = pd.date_range('1/1/2014', periods=24, freq='M')

ts = pd.Series(np.random.randn(len(rng)), index=rng)

ts.head()


Out[3]:
2014-01-31    0.245097
2014-02-28   -0.402680
2014-03-31   -0.824059
2014-04-30   -0.746537
2014-05-31    0.823781
Freq: M, dtype: float64

7. Selecting data with dates

Now that you have generated your time series, you want to select only the first 3 months or the last 3. For this you can use first(period) or last(period) methods. Between the brackets, you specify the number of periods that you want and the type of periods (hour, week, month,...).


In [8]:
#We want to select the 5 first months
ts.first('5M')


Out[8]:
2014-01-31    0.245097
2014-02-28   -0.402680
2014-03-31   -0.824059
2014-04-30   -0.746537
2014-05-31    0.823781
2014-06-30    1.681618
Freq: M, dtype: float64

In [9]:
# Select the last 3 months
ts.last('3M')


Out[9]:
2015-10-31    0.257090
2015-11-30   -0.360173
2015-12-31   -0.549250
Freq: M, dtype: float64

You can also select specific period of time from your time series. For this I would recommend to use a mask.


In [6]:
# Selecting the data from january only
mask = (ts.index.month ==1)

january_data = ts[mask]

january_data


Out[6]:
2014-01-31    0.245097
2015-01-31    1.203963
dtype: float64

In [7]:
# Selecting the data from the first quarter
mask = (ts.index.month >= 1) & (ts.index.month <=3)

first_quarter_data = ts[mask]

first_quarter_data


Out[7]:
2014-01-31    0.245097
2014-02-28   -0.402680
2014-03-31   -0.824059
2015-01-31    1.203963
2015-02-28   -0.220109
2015-03-31   -0.490754
dtype: float64

In [8]:
from IPython.core.display import HTML
def css_styling():
    styles = open("styles/custom.css", "r").read()
    return HTML(styles)
css_styling()


Out[8]: