Pandas III - Datetime Series

Pandas has proven very successful as a tool for working with time series data, especially in the financial data analysis space.

In working with time series data, we will frequently seek to:

  • generate sequences of fixed-frequency dates and time spans
  • conform or convert time series to a particular frequency
  • compute “relative” dates based on various non-standard time increments (e.g. 5 business days before the last business day of the year), or “roll” dates forward or backward

http://pandas.pydata.org/pandas-docs/stable/timeseries.html


In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

pd.set_option('max_columns', 50)

We'll be using the MovieLens dataset in many examples going forward. The dataset contains 100,000 ratings made by 943 users on 1,682 movies.


In [2]:
# pass in column names for each CSV
u_cols = ['user_id', 'age', 'sex', 'occupation', 'zip_code']
df_users = pd.read_csv('data/MovieLens-100k/u.user', sep='|', names=u_cols)

r_cols = ['user_id', 'movie_id', 'rating', 'unix_timestamp']
df_ratings = pd.read_csv('data/MovieLens-100k/u.data', sep='\t', names=r_cols)

m_cols = ['movie_id', 'title', 'release_date', 'video_release_date', 'imdb_url']
df_movies = pd.read_csv('data/MovieLens-100k/u.item', sep='|', names=m_cols, usecols=range(5))# only load the first five columns

Section in progress

Summary

  1. Part1
    a)
    b)
    c)
  2. Part2
    a)
    b)

1. Part 1

a)