Geocoding with Python

Agenda:

  • Geocoding addresses to latitude/longitude
  • Exploring locations with the Google Places API
  • Reverse geocoding latitude/longitude to an address
  • Reverse geocoding latitude/longitude to block FIPS code

You'll need a Google API key to use the Google Maps Geocoding API and the Google Places API Web Service:

  1. Go to https://console.developers.google.com/project and sign in
  2. Create a new project and call it cp255, then click create
  3. On the screen with all the APIs listed, click "Google Places API Web Service" under Google Maps APIs, then click the Enable API button
  4. Go to Credentials and click create credentials, choose API Key
  5. Copy your API key when it is displayed, then create keys.py with the line google_maps_api_key='YOUR-KEY-HERE'

In [1]:
import pandas as pd, requests, time
from geopy.geocoders import GoogleV3

# import my api key saved in a local file i called keys.py
from keys import google_maps_api_key

In [2]:
# set the pause duration between api requests
pause = 0.1

Part 1: Geocoding addresses to lat-long

We will use the Google Maps geocoding API. You don't need an API key for this.


In [3]:
locations = pd.DataFrame()
locations['address'] = ['350 5th Ave, New York, NY 10118',
                        '100 Larkin St, San Francisco, CA 94102',
                        'Wurster Hall, Berkeley, CA']
locations


Out[3]:
address
0 350 5th Ave, New York, NY 10118
1 100 Larkin St, San Francisco, CA 94102
2 Wurster Hall, Berkeley, CA

In [4]:
# function that accepts an address string, sends it to the Google API, and returns the lat-long API result
def geocode(address):
    time.sleep(pause) #pause for some duration before each request, to not hammer their server
    url = 'http://maps.googleapis.com/maps/api/geocode/json?address={}&sensor=false' #api url with placeholders
    request = url.format(address) #fill in the placeholder with a variable
    response = requests.get(request) #send the request to the server and get the response
    data = response.json() #convert the response json string into a dict
    
    if len(data['results']) > 0: #if google was able to geolocate our address, extract lat-long from result
        latitude = data['results'][0]['geometry']['location']['lat']
        longitude = data['results'][0]['geometry']['location']['lng']
        return '{},{}'.format(latitude, longitude) #return lat-long as a string in the format google likes

In [5]:
# for each value in the address column, geocode it, save results as new df column
locations['latlng'] = locations['address'].map(geocode)
locations


Out[5]:
address latlng
0 350 5th Ave, New York, NY 10118 40.7487097,-73.9856556
1 100 Larkin St, San Francisco, CA 94102 37.7791156,-122.4157662
2 Wurster Hall, Berkeley, CA 37.8707352,-122.2548935

In [6]:
# parse the result into separate lat and lon columns for easy mapping
locations['latitude'] = locations['latlng'].map(lambda x: x.split(',')[0])
locations['longitude'] = locations['latlng'].map(lambda x: x.split(',')[1])
locations


Out[6]:
address latlng latitude longitude
0 350 5th Ave, New York, NY 10118 40.7487097,-73.9856556 40.7487097 -73.9856556
1 100 Larkin St, San Francisco, CA 94102 37.7791156,-122.4157662 37.7791156 -122.4157662
2 Wurster Hall, Berkeley, CA 37.8707352,-122.2548935 37.8707352 -122.2548935

Part 2: Google Places API

We will use Google's Places API to look up places in the vicinity of some location. You need an API key for this.


In [7]:
# google places api url, with placeholders
url = 'https://maps.googleapis.com/maps/api/place/search/json?keyword={}&location={}&radius={}&key={}&sensor=false'

# what keyword to search for
keyword = 'restaurant'

# define the radius (in meters) for the search
radius = 1000

# define the location coordinates (of wurster hall)
location = locations.loc[2, 'latlng']
location


Out[7]:
'37.8707352,-122.2548935'

In [8]:
# add our variables into the url, submit the request to the api, and load the response
request = url.format(keyword, location, radius, google_maps_api_key)
response = requests.get(request)
data = response.json()

In [9]:
places = pd.DataFrame(data['results'])
places = places[['name', 'geometry', 'rating', 'vicinity']]
places.head()


Out[9]:
name geometry rating vicinity
0 Freehouse {'location': {'lat': 37.869081, 'lng': -122.25... 4.0 2700 Bancroft Way, Berkeley
1 Finfiné Ethiopian Restaurant {'location': {'lat': 37.8639074, 'lng': -122.2... 4.4 2556 Telegraph Avenue, Berkeley
2 Saturn Cafe {'location': {'lat': 37.8697714, 'lng': -122.2... 4.2 2175 Allston Way, Berkeley
3 Subway {'location': {'lat': 37.8625827, 'lng': -122.2... 3.3 2618 Telegraph Avenue, Berkeley
4 Subway {'location': {'lat': 37.8752456, 'lng': -122.2... NaN 2509 Hearst Avenue, Berkeley

In [10]:
# parse out lat-long and return it as a series -> this creates a dataframe of all the results when you .apply()
def parse_coords(geometry):
    if isinstance(geometry, dict):
        lng = geometry['location']['lng']
        lat = geometry['location']['lat']
        return pd.Series({'latitude':lat, 'longitude':lng})
    
# test our function
places['geometry'].head().apply(parse_coords)


Out[10]:
latitude longitude
0 37.869081 -122.254349
1 37.863907 -122.258973
2 37.869771 -122.266004
3 37.862583 -122.258960
4 37.875246 -122.259667

In [11]:
# now run our function on the whole dataframe and SAVE THE OUTPUT to 2 new dataframe columns
places[['latitude', 'longitude']] = places['geometry'].apply(parse_coords)
places_clean = places.drop('geometry', axis=1)
places_clean.sort_values(by='rating', ascending=False).head()


Out[11]:
name rating vicinity latitude longitude
17 Kiraku 4.6 2566 Telegraph Avenue, Berkeley 37.863684 -122.258986
14 KoJa Kitchen 4.6 2395 Telegraph Avenue, Berkeley 37.867121 -122.258619
13 Musashi 4.6 2126 Dwight Way, Berkeley 37.863992 -122.266524
15 Remy's 4.4 2506 Haste Street, Berkeley 37.865924 -122.258037
1 Finfiné Ethiopian Restaurant 4.4 2556 Telegraph Avenue, Berkeley 37.863907 -122.258973

Part 3: Reverse geocoding (address lookup)

We'll use Google's reverse geocoding API.

You can do this manually, just like in the previous two sections, but it's a little more complicated to parse Google's address components results. If we just want addresses, we can use geopy to simply call Google's API automatically for us.


In [12]:
# load usa point data and keep only the first 5 rows
usa = pd.read_csv('data/usa-latlong.csv')
usa = usa.head()
usa


Out[12]:
latitude longitude
0 34.537094 -82.630303
1 35.025700 -78.970500
2 39.151817 -77.163810
3 38.636738 -121.319550
4 47.616955 -122.348921

In [13]:
# create a column to put lat-long into the format google likes - this just makes it easier to call their API
usa['latlng'] = usa.apply(lambda row: '{},{}'.format(row['latitude'], row['longitude']), axis=1)
usa


Out[13]:
latitude longitude latlng
0 34.537094 -82.630303 34.537094,-82.630303
1 35.025700 -78.970500 35.0257,-78.9705
2 39.151817 -77.163810 39.151817,-77.16381
3 38.636738 -121.319550 38.636738,-121.31955
4 47.616955 -122.348921 47.616955,-122.34892099999999

In [14]:
# tell geopy to reverse geocode some lat-long string using Google's API and return the address
def reverse_geopy(latlng):
    time.sleep(pause)
    geolocator = GoogleV3()
    address, _ = geolocator.reverse(latlng, exactly_one=True)
    return address

usa['address'] = usa['latlng'].map(reverse_geopy)
usa


Out[14]:
latitude longitude latlng address
0 34.537094 -82.630303 34.537094,-82.630303 111 Simpson Rd, Anderson, SC 29621, USA
1 35.025700 -78.970500 35.0257,-78.9705 5340 Sumac Cir, Fayetteville, NC 28304, USA
2 39.151817 -77.163810 39.151817,-77.16381 8200 Spiceberry Cir, Gaithersburg, MD 20877, USA
3 38.636738 -121.319550 38.636738,-121.31955 7836-7840 Fair Oaks Blvd, Carmichael, CA 95608...
4 47.616955 -122.348921 47.616955,-122.34892099999999 225 Cedar St, Seattle, WA 98121, USA

What if you just want the city or state?

You could try to parse the address strings, but you're relying on them always having a consistent format. This might not be the case if you have international location data. In this case, you should call the API manually and extract the individual address components you are interested in.


In [15]:
# pass the Google API latlng data to reverse geocode it
def reverse_geocode(latlng):
    time.sleep(pause)
    url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng={}'
    request = url.format(latlng)
    response = requests.get(request)
    data = response.json()
    if len(data['results']) > 0:
        return data['results'][0] #if we got results, return the first result
    
geocode_results = usa['latlng'].map(reverse_geocode)

Now look inside each reverse geocode result to see if address_components exists. If it does, look inside each component to see if we can find the city or the state. Google calls the city name by the abstract term 'locality' and the state name by the abstract term 'administrative_area_level_1' ...this just lets them use the same terminology anywhere in the world.


In [16]:
def get_city(geocode_result):
     if 'address_components' in geocode_result:
        for address_component in geocode_result['address_components']:
            if 'locality' in address_component['types']:
                return address_component['long_name']
                
def get_state(geocode_result):
     if 'address_components' in geocode_result:
        for address_component in geocode_result['address_components']:
            if 'administrative_area_level_1' in address_component['types']:
                return address_component['long_name']

In [17]:
# now map our functions to extract city and state names
usa['city'] = geocode_results.map(get_city)                
usa['state'] = geocode_results.map(get_state)
usa


Out[17]:
latitude longitude latlng address city state
0 34.537094 -82.630303 34.537094,-82.630303 111 Simpson Rd, Anderson, SC 29621, USA Anderson South Carolina
1 35.025700 -78.970500 35.0257,-78.9705 5340 Sumac Cir, Fayetteville, NC 28304, USA Fayetteville North Carolina
2 39.151817 -77.163810 39.151817,-77.16381 8200 Spiceberry Cir, Gaithersburg, MD 20877, USA Gaithersburg Maryland
3 38.636738 -121.319550 38.636738,-121.31955 7836-7840 Fair Oaks Blvd, Carmichael, CA 95608... Carmichael California
4 47.616955 -122.348921 47.616955,-122.34892099999999 225 Cedar St, Seattle, WA 98121, USA Seattle Washington

Part 4: Reverse geocoding to FIPS

We'll use the FCC's (very slow) Census Block Conversions API to turn lat/long into a block FIPS code. FIPS codes contain from left to right: the location's 2-digit state code, 3-digit county code, 6-digit census tract code, and 4-digit census block code (the first digit of which is the census block group code). Now you can join your data to tract (etc) level census data without doing a spatial join.


In [18]:
# pass the FCC API lat/long and get FIPS data back - return block fips and county name
def get_fips(row):
    time.sleep(pause)
    url = 'http://data.fcc.gov/api/block/find?format=json&latitude={}&longitude={}'
    request = url.format(row['latitude'], row['longitude'])
    response = requests.get(request)
    data = response.json()
    
    # return multiple values as a series - this will create a dataframe with multiple columns
    return pd.Series({'fips_code':data['Block']['FIPS'], 'county':data['County']['name']})

In [19]:
# get block fips code and county name from FCC as new dataframe, then concatenate to join them
fips = usa.apply(get_fips, axis=1)
usa = pd.concat([usa, fips], axis=1)
usa


Out[19]:
latitude longitude latlng address city state county fips_code
0 34.537094 -82.630303 34.537094,-82.630303 111 Simpson Rd, Anderson, SC 29621, USA Anderson South Carolina Anderson 450070112021073
1 35.025700 -78.970500 35.0257,-78.9705 5340 Sumac Cir, Fayetteville, NC 28304, USA Fayetteville North Carolina Cumberland 370510019034012
2 39.151817 -77.163810 39.151817,-77.16381 8200 Spiceberry Cir, Gaithersburg, MD 20877, USA Gaithersburg Maryland Montgomery 240317007104010
3 38.636738 -121.319550 38.636738,-121.31955 7836-7840 Fair Oaks Blvd, Carmichael, CA 95608... Carmichael California Sacramento 060670078012000
4 47.616955 -122.348921 47.616955,-122.34892099999999 225 Cedar St, Seattle, WA 98121, USA Seattle Washington King 530330080011008

In [ ]: