Image is Creative Comments Attribution licensed. The original image can be found on flickr.
In [34]:
import pandas
%matplotlib inline
They can be easily created from flat text files like CSVs
In [35]:
sce = pandas.read_csv(
'fake_student_courseenrollment.csv',
header=None,
names=['id', 'user_id', 'course_id', 'created', 'is_active', 'mode'],
parse_dates=['created']
)
Cool, what does this DataFrame look like?
In [36]:
sce.info()
In [37]:
sce.head()
Out[37]:
In [38]:
sce[ sce['course_id'] == 'FooX/OtherX/Baz' ]
Out[38]:
In [39]:
sce[ sce['course_id'] == 'FooX/BarX/Baz' ]
Out[39]:
In [40]:
sce[ sce['created'] > '20130302' ]
Out[40]:
In [41]:
users = pandas.read_csv('fake_user_names.csv', header=None, names=['user_id', 'full_name'])
users.info()
In [42]:
pandas.merge(sce, users, on='user_id')
Out[42]:
Let's take a look at a quick example of the type of thing you can do.
In [43]:
sce_idx = sce.set_index('created')
sce_idx.info()
I wonder how many records are created each day?
In [44]:
records_per_day = sce_idx['id'].resample('D', how='count')
records_per_day.head()
Out[44]:
In [45]:
records_per_day.plot()
Out[45]: