Lesson 3: Graphing with ggplot and Other Fun Things

To start with, we need to import several modules and make sure we are in the right directory. We are also going to load the facebook pseudodata


In [1]:
%matplotlib
%pylab inline
import pandas as ps
import numpy as np
from ggplot import *
import os

os.chdir('/home/potterzot/code/learning/exploratorydataanalysis')
fbook = ps.read_csv('data/pseudo_facebook.tsv', sep='\t')
fbook.columns


Using matplotlib backend: Qt4Agg
Populating the interactive namespace from numpy and matplotlib
Out[1]:
Index(['userid', 'age', 'dob_day', 'dob_year', 'dob_month', 'gender', 'tenure', 'friend_count', 'friendships_initiated', 'likes', 'likes_received', 'mobile_likes', 'mobile_likes_received', 'www_likes', 'www_likes_received'], dtype='object')

In [66]:
p = ggplot(fbook, aes('dob_day')) + \
  geom_histogram(fill='steelblue', color='black') + \
  scale_x_continuous(breaks=range(1,31,2), limit=range(1,31), align='middle') + \
  scale_y_continuous(breaks=range(0,8000,4000), limit=range(0,5000), labels='comma') + \
  xlab('Day of Birth') + \
  ylab('Count of Users') + \
  ggtitle('Frequency of Birth Day')
print(p)


<ggplot: (-9223363302240664039)>

We'd like to be able to facet our graphs, so we can look at day of the month by month. In ggplot this is relatively easy using the facet_wrap and facet_grid layers. Here is the faceted graph:


In [68]:
ggplot(fbook, aes('dob_day')) + \
  geom_histogram(fill='steelblue', color='black') + \
  scale_x_continuous(breaks=range(1,31,2), limit=range(1,31), align='middle') + \
  scale_y_continuous(breaks=range(0,8000,4000), limit=range(0,5000), labels='comma') + \
  xlab('Day of Birth') + \
  ylab('Count of Users') + \
  ggtitle('Frequency of Birth Day') + \
  facet_wrap('dob_month')


/usr/lib/python3.4/site-packages/ggplot/ggplot.py:198: RuntimeWarning: Facetting is currently not supported with geom_bar. See
                    https://github.com/yhat/ggplot/issues/196 for more information
  warnings.warn(msg, RuntimeWarning)
Out[68]:
<ggplot: (-9223363302241720362)>

This is less than ideal because we'd like to adjust the x-axis and the y-axis in better ways, but it's not too terrible.

Now we want to look at friend counts by gender.


In [76]:
ggplot(fbook, aes('friend_count')) + \
    geom_histogram() + \
    scale_x_continuous(breaks=range(0,1000,500)) + \
    facet_wrap('gender')


Out[76]:
<ggplot: (8734612378779)>

At the moment we can adjust the scale but can't adjust the max and min of the x axis, and can only adjust the formatting of the y-axis... Another big problem with ggplot and with python graphics in general.

Vincent/Vega

Another option for producing pything graphics is the vincent library, which uses vega and the d3 library to produce web graphics, so lets give that a try:


In [78]:
import vincent

In [81]:
bar = vincent.Bar(fbook['friend_count'])
bar.axis_titles(x='Friend Count', y='Number of Users')
bar.to_json('vega.json')

In [ ]:


In [ ]:


In [ ]:


In [ ]:


In [ ]: