What follows is a fairly thorough introduction to the library.
I chose to break it into three parts as I felt it was too long and daunting as one.
If you'd like to follow along, you can find the necessary CSV files here and the MovieLens dataset download link here.
My goal for this tutorial is to teach the basics of pandas by comparing and contrasting its syntax with SQL.
If you're interested in learning more about the library, pandas author Wes McKinney has written Python for Data Analysis, which covers it in much greater detail.
In [2]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
pd.set_option('max_columns', 50)
%matplotlib inline
In [ ]: