Image is Creative Comments Attribution licensed. The original image can be found on flickr.
In [13]:
import pandas
%matplotlib inline
They can be easily created from flat text files like CSVs
In [14]:
sce = pandas.read_csv(
'fake_student_courseenrollment.csv',
header=None,
names=['id', 'user_id', 'course_id', 'created', 'is_active', 'mode'],
parse_dates=['created']
)
Cool, what does this DataFrame look like?
In [15]:
sce.info()
In [16]:
sce.head()
Out[16]:
In [17]:
sce[ sce['course_id'] == 'ValleyOfPeace/AdvancedKungFu/Second' ]
Out[17]:
In [18]:
sce[ sce['course_id'] == 'ValleyOfPeace/KungFu101/First' ]
Out[18]:
In [19]:
sce[ sce['created'] > '20130302' ]
Out[19]:
In [20]:
users = pandas.read_csv('fake_user_names.csv', header=None, names=['user_id', 'full_name'])
users.info()
In [25]:
pandas.merge(sce, users, on='user_id', how='left')
Out[25]:
Let's take a look at a quick example of the type of thing you can do.
In [22]:
sce_idx = sce.set_index('created')
sce_idx.info()
I wonder how many records are created each day?
In [23]:
records_per_day = sce_idx['id'].resample('D', how='count')
records_per_day.head()
Out[23]:
In [24]:
records_per_day.plot()
Out[24]: