Introductory Geocoding and Mapping

Juan Shishido

School of Information

GSR, D-Lab

Imports


In [1]:
import json
import requests
import pandas as pd
from pprint import pprint

Using the APIs

A function with options for both DSTK and Photon.

This is just for demonstration purposes. In most situations, you'll probably not want to combine both the DSTK and Photon APIs into a single function. Of course, it's based on preference, so you might, in fact, want to do that. Just know that Photon provides more information (even multiple results, in some cases) than DSTK.


In [2]:
def single_address(address, api='dstk'):
    '''
    Individual address lookup with
    either DSTK or Photon
    
    Default is DSTK's /street2coordinates
    For DSTK's Google-style: 'google'
    For Photon: 'photon'
    
    Address must be a string
    '''
    
    # API check
    assert api in ('dstk', 'google', 'photon')
    
    # Type check
    assert type(address) == str
    
    # /street2coordinates
    dstk_dstk = 'http://www.datasciencetoolkit.org/street2coordinates/'
    
    # Google-style
    dstk_google = 'http://www.datasciencetoolkit.org/maps/api/geocode/json?sensor=false&address='
    
    # Photon
    photon = 'http://photon.komoot.de/api/?q='
    
    # API
    if api == 'dstk':
        url_prefix = dstk_dstk
    elif api == 'google':
        url_prefix = dstk_google
    elif api == 'photon':
        url_prefix = photon
    
    # URL
    url = url_prefix + address.replace(' ', '+')
    
    # Response
    response = requests.get(url)
    return json.loads(response.text)

Data Science Toolkit

Street Address to Coordinates


In [3]:
google_hq = single_address('1600 Amphitheatre Pkwy, Mountain View, CA')
pprint(google_hq)


{u'1600 Amphitheatre Pkwy, Mountain View, CA': {u'confidence': 0.902,
                                                u'country_code': u'US',
                                                u'country_code3': u'USA',
                                                u'country_name': u'United States',
                                                u'fips_county': u'06085',
                                                u'latitude': 37.423471,
                                                u'locality': u'Mountain View',
                                                u'longitude': -122.086546,
                                                u'region': u'CA',
                                                u'street_address': u'1600 Amphitheatre Pkwy',
                                                u'street_name': u'Amphitheatre Pkwy',
                                                u'street_number': u'1600'}}

Google-style


In [4]:
google = single_address('1600 Amphitheatre Pkwy, Mountain View, CA', 'google')
pprint(google)


{u'results': [{u'address_components': [{u'long_name': u'1600',
                                        u'short_name': u'1600',
                                        u'types': [u'street_number']},
                                       {u'long_name': u'Amphitheatre Pkwy',
                                        u'short_name': u'Amphitheatre Pkwy',
                                        u'types': [u'route']},
                                       {u'long_name': u'Mountain View',
                                        u'short_name': u'Mountain View',
                                        u'types': [u'locality',
                                                   u'political']},
                                       {u'long_name': u'CA',
                                        u'short_name': u'CA',
                                        u'types': [u'administrative_area_level_1',
                                                   u'political']},
                                       {u'long_name': u'United States',
                                        u'short_name': u'US',
                                        u'types': [u'country',
                                                   u'political']}],
               u'formatted_address': u'1600 Amphitheatre Pkwy, Mountain View, CA',
               u'geometry': {u'location': {u'lat': 37.423471,
                                           u'lng': -122.086546},
                             u'location_type': u'ROOFTOP',
                             u'viewport': {u'northeast': {u'lat': 37.424471,
                                                          u'lng': -122.085546},
                                           u'southwest': {u'lat': 37.422471,
                                                          u'lng': -122.087546}}},
               u'types': [u'street_address']}],
 u'status': u'OK'}

DSTK provides a Google-style option to make it easier for people already using Google's geocoding API. Simply replace maps.googleapis.com with www.datasciencetoolkit.org.

Photon


In [5]:
google = single_address('1600 Amphitheatre Pkwy, Mountain View, CA', 'photon')
pprint(google)


{u'features': [{u'geometry': {u'coordinates': [-122.0850862, 37.4228139],
                              u'type': u'Point'},
                u'properties': {u'city': u'Mountain View',
                                u'country': u'United States of America',
                                u'housenumber': u'1600',
                                u'name': u'Google Headquaters',
                                u'osm_id': 2192620021,
                                u'osm_key': u'office',
                                u'osm_type': u'N',
                                u'osm_value': u'commercial',
                                u'postcode': u'94043',
                                u'state': u'California',
                                u'street': u'Amphitheatre Parkway'},
                u'type': u'Feature'}],
 u'type': u'FeatureCollection'}

Photon provides much more data than DSTK. In can also returns multiple entries. In those cases, you'll need to parse through the JSON to get what you need.

Batch Geocoding

Command-line Interface

The standard IPython kernel allows running code in other languages using the %%magic syntax.

You can use cURL to access the DSTK API and even save the output to a file. First, invoke the bash magic command. It is only active in the cell in which it's called.

Single address

With the code below, you're submitting a POST request to the DSTK server. It prints the results and provides a table with additional information.


In [6]:
%%bash

curl -d "1600 Amphitheatre Pkwy, Mountain View, CA" \
     http://www.datasciencetoolkit.org/street2coordinates


{
  "1600 Amphitheatre Pkwy, Mountain View, CA": {
    "country_code3": "USA",
    "latitude": 37.423471,
    "country_name": "United States",
    "longitude": -122.086546,
    "street_address": "1600 Amphitheatre Pkwy",
    "region": "CA",
    "confidence": 0.902,
    "street_number": "1600",
    "locality": "Mountain View",
    "street_name": "Amphitheatre Pkwy",
    "fips_county": "06085",
    "country_code": "US"
  }
}
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   467  100   426  100    41   2229    214 --:--:-- --:--:-- --:--:--  2230

Note: The backslash in the command is simply to allow us to type the command across multiple lines.

Addresses from a file

This command has three main components (listed from back-to-front):

http://www.datasciencetoolkit.org/street2coordinates

-d @data/bartaddresses.txt

-o data/bartcoordinates.json

The first of these tells cURL the location of the API.

The next one relates to the data to be sent to the API. This uses the -d flag. Use the @ symbol to indicate that the addresses should be read from a file.

The -o flag and the argument that follows it, tells cURL to save the output to a file named bartcoordinates.json.


In [7]:
%%bash

curl -o data/bartcoordinates.json -d @data/bartaddresses.txt \
     http://www.datasciencetoolkit.org/street2coordinates


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 20151  100 18211  100  1940  18659   1987 --:--:-- --:--:-- --:--:-- 18658

Look in ./data to find your geocoded addresses.

Processing the Data

Load


In [8]:
json_data = pd.read_json('data/bartcoordinates.json').T

In [9]:
stations = pd.read_csv('data/bartstations.csv')

Transform


In [10]:
json_data = json_data.reset_index()
json_data = json_data.rename(columns = {'index':'address'})

In [11]:
json_data['address'] = json_data['address'].str.lower()

In [12]:
stations['address'] = stations['address'].str.lower()

Merge

First, create a key field in json_data, using its index. Then, merge json_data and stations on address.


In [13]:
bart = json_data.merge(stations, on = 'address', how='inner')

Maps

GeoJSON


In [14]:
geo_data = {
    'type': 'FeatureCollection', 
    'features': []
}

for i in bart.index:
    feature = {
        'type': 'Feature', 
        'geometry': {
            "type": "Point", 
            "coordinates": [float(bart['longitude'][i]), float(bart['latitude'][i])]
        },
            'properties': {
                'station_name': bart['station_name'][i]
            }
    }

    # Add the feature into the GeoJSON wrapper
    geo_data['features'].append(feature)

with open('map/geojson/bart_coords.geojson', 'wb') as f:
    json.dump(geo_data, f, indent=2)

In [15]:
with open('map/geojson/bart_coords.geojson', 'rb') as infile:
    lines = infile.readlines()

    with open('map/geojson/bart_coords.js', 'wb') as outfile:
        outfile.write('var bart = ')
        outfile.writelines(lines)

infile.close()
outfile.close()

In [ ]: