In [1]:
%matplotlib
%pylab inline
import pandas as ps
import numpy as np
from ggplot import *
import os
os.chdir('/home/potterzot/code/learning/exploratorydataanalysis')
fbook = ps.read_csv('data/pseudo_facebook.tsv', sep='\t')
fbook.columns
Out[1]:
In [66]:
p = ggplot(fbook, aes('dob_day')) + \
geom_histogram(fill='steelblue', color='black') + \
scale_x_continuous(breaks=range(1,31,2), limit=range(1,31), align='middle') + \
scale_y_continuous(breaks=range(0,8000,4000), limit=range(0,5000), labels='comma') + \
xlab('Day of Birth') + \
ylab('Count of Users') + \
ggtitle('Frequency of Birth Day')
print(p)
We'd like to be able to facet our graphs, so we can look at day of the month by month. In ggplot this is relatively easy using the facet_wrap and facet_grid layers. Here is the faceted graph:
In [68]:
ggplot(fbook, aes('dob_day')) + \
geom_histogram(fill='steelblue', color='black') + \
scale_x_continuous(breaks=range(1,31,2), limit=range(1,31), align='middle') + \
scale_y_continuous(breaks=range(0,8000,4000), limit=range(0,5000), labels='comma') + \
xlab('Day of Birth') + \
ylab('Count of Users') + \
ggtitle('Frequency of Birth Day') + \
facet_wrap('dob_month')
Out[68]:
This is less than ideal because we'd like to adjust the x-axis and the y-axis in better ways, but it's not too terrible.
Now we want to look at friend counts by gender.
In [76]:
ggplot(fbook, aes('friend_count')) + \
geom_histogram() + \
scale_x_continuous(breaks=range(0,1000,500)) + \
facet_wrap('gender')
Out[76]:
At the moment we can adjust the scale but can't adjust the max and min of the x axis, and can only adjust the formatting of the y-axis... Another big problem with ggplot and with python graphics in general.
In [78]:
import vincent
In [81]:
bar = vincent.Bar(fbook['friend_count'])
bar.axis_titles(x='Friend Count', y='Number of Users')
bar.to_json('vega.json')
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: