Byte 2 Google Data Labs Notebook

This is an example notebook that illustrates use of Pandas and Seaborn to load and explore data from a Fusion Table.

We start by loading required libraries:



In [1]:

    
# Import libraries.
import csv
import httplib2
from apiclient.discovery import build
import urllib
import json
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns









    



---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-646d63de88ed> in <module>()
      2 import csv
      3 import httplib2
----> 4 from apiclient.discovery import build
      5 import urllib
      6 import json

/Users/nbanovic/Documents/Teaching/data/data/Assignments/jmankoff-explore/apiclient/__init__.py in <module>()
     16   )
     17 
---> 18 from googleapiclient import channel
     19 from googleapiclient import discovery
     20 from googleapiclient import errors

/Users/nbanovic/Documents/Teaching/data/data/Assignments/jmankoff-explore/googleapiclient/channel.py in <module>()
     61 import uuid
     62 
---> 63 from googleapiclient import errors
     64 from oauth2client import util
     65 import six

/Users/nbanovic/Documents/Teaching/data/data/Assignments/jmankoff-explore/googleapiclient/errors.py in <module>()
     24 import json
     25 
---> 26 from oauth2client import util
     27 
     28 

ImportError: cannot import name 'util'

Load Data from a Fusion Table

Here we download the data and store it in a local JSON file for easier access.



In [3]:

    
# This API key is provided by google as described in the tutorial
API_KEY = 'AIzaSyCuKKwcT5mUYZP_Q-heqCPjvdWyacRuX00'

# This is the table id for the fusion table
TABLE_ID = '1ymz3EtGdi4qKGMl5AxEFXtTlgk3tKi8iCpjTzvM'

# open the data stored in a file called "data.json"
try:
    fp = open("data.json")
    response = json.load(fp)

# but if that file does not exist, download the data from fusiontables
except IOError:
    service = build('fusiontables', 'v1', developerKey=API_KEY)
    query = "SELECT * FROM " + TABLE_ID + " WHERE  AnimalType = 'DOG'"
    response = service.query().sql(sql=query).execute()
    fp = open("data.json", "w")
    json.dump(response, fp)



In [4]:

    
# Check how many rows we have.
print len(response['rows'])

Load JSON Data into a Dataframe

Now that we have the JSON file locally, we will load it into a Pandas dataframe for easy access.



In [5]:

    
data_df = pd.DataFrame(response[u'rows'], columns = response[u'columns'])



In [6]:

    
# Show a few rows.
data_df.head()









    Out[6]:






  
    
      
      Outcome
      AnimalID
      AnimalType
      IntakeDate
      IntakeYear
      IntakeMonth
      Name
      Breed
      Age
      Sex
      ...
      OutcomeSubtype
      ZipFound
      Latitude
      Longitude
      ZipPlaced
      OutcomeDate
      OutcomeYear
      OutcomeMonth
      icon type
      IconName
    
  
  
    
      0
      
      A346562
      DOG
      2011-04-16T00:00:00Z
      2011
      April
      DIXIE
      
      Older than 1 year
      MALE
      ...
      
      40218
      38.1938
      -85.6573
      NaN
      2011-04-16T00:00:00Z
      2011
      April
      NaN
      
    
    
      1
      
      A364651
      DOG
      2011-03-11T00:00:00Z
      2011
      March
      KAYLEE
      YORKSHIRE TERR - POODLE TOY
      Older than 1 year
      FEMALE
      ...
      
      40202
      38.2529
      -85.7514
      40218
      2011-04-01T00:00:00Z
      2011
      April
      NaN
      
    
    
      2
      
      A369673
      DOG
      2011-03-31T00:00:00Z
      2011
      March
      AUSTIN
      
      Older than 1 year
      MALE
      ...
      
      40228
      38.1376
      -85.6274
      NaN
      2011-04-12T00:00:00Z
      2011
      April
      NaN
      
    
    
      3
      
      A415767
      DOG
      2011-04-12T00:00:00Z
      2011
      April
      EBONY
      
      Older than 1 year
      FEMALE
      ...
      
      40215
      38.1919
      -85.7841
      40215
      2011-04-12T00:00:00Z
      2011
      April
      NaN
      
    
    
      4
      
      A420365
      DOG
      2011-06-19T00:00:00Z
      2011
      June
      MARLEY
      SIBERIAN HUSKY - MIX
      Older than 1 year
      FEMALE
      ...
      
      40216
      38.1889
      -85.8314
      40216
      2011-06-21T00:00:00Z
      2011
      June
      NaN
      
    
  

5 rows × 24 columns

Select and Clean Up Data

Pandas makes it easy to access different columns and rows in the dataframe. Here we create a new data frame containing only Latitude and Longitude.



In [22]:

    
lat_lon_df = data_df[['Latitude', 'Longitude']]
lat_lon_df.head()

Now we can quickly convert both columns to numeric and remove values that could not be converted (NaN). It is all done on one line!



In [19]:

    
lat_lon_df = lat_lon_df.apply(lambda x: pd.to_numeric(x, errors='ignore')).dropna()

Visualize Data

We can now use Seaborn to quickly plot the data from the data frame. In this example we quickly compare the two variables in a pairplot.



In [21]:

    
sns.pairplot(lat_lon_df)









    Out[21]:





<seaborn.axisgrid.PairGrid at 0x7ff6c409d3d0>



In [ ]:

	AnimalID	AnimalType	IntakeDate	IntakeYear	IntakeMonth	Name	Breed	Age	Sex	...	ZipFound	Latitude	Longitude	ZipPlaced	OutcomeDate	OutcomeYear	OutcomeMonth	icon type
0	A346562	DOG	2011-04-16T00:00:00Z	2011	April	DIXIE		Older than 1 year	MALE	...	40218	38.1938	-85.6573	NaN	2011-04-16T00:00:00Z	2011	April	NaN
1	A364651	DOG	2011-03-11T00:00:00Z	2011	March	KAYLEE	YORKSHIRE TERR - POODLE TOY	Older than 1 year	FEMALE	...	40202	38.2529	-85.7514	40218	2011-04-01T00:00:00Z	2011	April	NaN
2	A369673	DOG	2011-03-31T00:00:00Z	2011	March	AUSTIN		Older than 1 year	MALE	...	40228	38.1376	-85.6274	NaN	2011-04-12T00:00:00Z	2011	April	NaN
3	A415767	DOG	2011-04-12T00:00:00Z	2011	April	EBONY		Older than 1 year	FEMALE	...	40215	38.1919	-85.7841	40215	2011-04-12T00:00:00Z	2011	April	NaN
4	A420365	DOG	2011-06-19T00:00:00Z	2011	June	MARLEY	SIBERIAN HUSKY - MIX	Older than 1 year	FEMALE	...	40216	38.1889	-85.8314	40216	2011-06-21T00:00:00Z	2011	June	NaN