Juan Shishido, Andrew Chong, Patty Frontiera
Adapted from Juan Shishido's tutorial at: http://people.ischool.berkeley.edu/~juanshishido/geocoding-workshop/intro-geocoding.html
School of Information
GSR, D-Lab
In [ ]:
import json
import requests
import pandas as pd
import geopy
from pprint import pprint
In [ ]:
def get_geocoded_object_photon(address):
# URL
url = 'http://photon.komoot.de/api/?q=' + address.replace(' ', '+')
# Response
response = requests.get(url)
return json.loads(response.text)
get_geocoded_object_photon("1600 Amphitheatre Pkwy, Mountain View, CA")
The Geopy Module makes it easy to try a number of different geocoding APIs using the same syntax. It is updated frequently so it is also a good way to keep abreast of the popular geocoding services.
You can read Geopy's Documentation here: https://geopy.readthedocs.org/en/1.10.0/#.
We will use Geopy to geocode addresses with the Google Geocoding Service.
The Google Geocoder and is extremely popular. It is very accurate for two reasons. It has a robust and sophisticated address parser and a great reference database of streets, parcels, and properties. However, it limits you to 2,500 free address geocodes per day. Moreover, you need to read the Google Geocoder terms of use to make sure that you do not violate them.
In [ ]:
# Create a geocoding object
google_geocoder = geopy.geocoders.GoogleV3()
# Geocode one address with this geocoder
response = google_geocoder.geocode("1600 Amphitheatre Pkwy, Mountain View, CA", google_geocoder)
type(response)
In [ ]:
# response.keys()
response[0]
response[1]
response[1][0]
response[1][1]
In [ ]:
def get_coords_geopy(address, geocoder_instance):
response = geocoder_instance.geocode(address)
lat_lon = response[1][0], response[1][1]
return lat_lon
# Now try it
coords = get_coords_geopy("1600 Amphitheatre Pkwy, Mountain View, CA", google_geocoder)
coords
In [ ]:
bart = pd.read_csv("data/bartstations.csv")
type(bart)
bart.keys()
Pandas in 10 minutes: http://pandas.pydata.org/pandas-docs/stable/10min.html
In [ ]:
bart['weekday_visitors'] = 1000*bart.index
bart['weekend_visitors'] = 2000*bart.index
bart['weekly_visitors'] = bart['weekday_visitors'] + bart['weekend_visitors']
bart
In [ ]:
one_address = "1245 Broadway, Oakland, CA 94612"
one_address_caps = one_address.upper()
bart['address_in_CAPS'] = bart['address'].apply(lambda x: x.upper())
bart
In [ ]:
google_geocoder = geopy.geocoders.GoogleV3()
bart['latitude'], bart['longitude'] = zip(*bart['address'].apply(lambda x: get_coords_geopy(x, google_geocoder)))
bart[['station_name', 'latitude', 'longitude']]
In [ ]:
geo_data = {
'type': 'FeatureCollection',
'features': []
}
for i in bart.index:
feature = {
'type': 'Feature',
'geometry': {
"type": "Point",
"coordinates": [float(bart['longitude'][i]), float(bart['latitude'][i])]
},
'properties': {
'station_name': bart['station_name'][i]
}
}
# Add the feature into the GeoJSON wrapper
geo_data['features'].append(feature)
with open('map/geojson/bart_coords.geojson', 'w') as f:
json.dump(geo_data, f, indent=2)
In [ ]:
with open('map/geojson/bart_coords.geojson', 'r') as infile:
lines = infile.readlines()
with open('map/geojson/bart_coords.js', 'w') as outfile:
outfile.write('var bart = ')
outfile.writelines(lines)
infile.close()
outfile.close()
In [ ]:
The Census Geocoding API provides a great, free service for geocoding US street addresses. You can read about the Census Geocoder and test it using the online interface at http://geocoding.geo.census.gov/geocoder. Note, the online geocoding interface allows one to geocode up to 1000 addresses at a time without programming via a file upload utility. Below we show you how to programmatically access the Census Geocoding API.
The Census Geocding API requires addresses to be in a strict format:
1,409 Main St,Oakland,CA,94605 2,310 Main St,,CA,94605 </Pre>
Note: The id column can contain any unique value, for example a place name rather than a numeric id.
house 1,409 Main St,Oakland,CA,94605 house 2,310 Main St,,CA,94605 </Pre>
In [ ]:
# Identify the file with the addresses in the format required by the geocoder.
# Take a look at the contents of this file to see how these are structured.
cgfile = 'data/bart_addresses_census_format.csv'
# Configure API parameters
# You can read about them here: http://geocoding.geo.census.gov/geocoder/Geocoding_Services_API.html
url = 'http://geocoding.geo.census.gov/geocoder/geographies/addressbatch'
payload = {'benchmark':'Public_AR_Current','vintage':'ACS2013_Current'}
files = {'addressFile': (cgfile, open(cgfile, 'rb'), 'text/csv')}
# Submit the file of addresses to geocode
r = requests.post(url, files=files, data = payload)
# Review the output
print (r.text)
In [ ]:
# Save geocoded data to file
with open('data/bart_geocoded_addresses_from_census.csv', 'w') as outfile:
outfile.writelines(r.text)
outfile.close()
In [ ]:
# Read geocoded data into pandas data frame
colnames = ['station_name','inaddr','ismatch','matchtype','maddr','lon_lat','tlid','sideofst','fipstate','fipcounty','fiptract','junk']
bart_geocoded = pd.read_csv('data/bart_geocoded_addresses_from_census.csv',sep=",", header=None)
bart_geocoded.columns = colnames
bart_geocoded
del bart_geocoded['junk'] #delete junk column at end of file
bart_geocoded
In [ ]:
# Subset data frame to select only matched addresses
bart_geocoded_match = bart_geocoded.loc[bart_geocoded['ismatch']== 'Match']
In [ ]:
# Reformat and create a new data frame that we can map
data = {'station': bart_geocoded_match['station_name'],
'longitude': bart_geocoded_match['lon_lat'].str.split(',', expand=True)[0],
'latitude': bart_geocoded_match['lon_lat'].str.split(',', expand=True)[1],
'address': bart_geocoded_match['maddr']
}
bart.df = pd.DataFrame(data, columns = ['station', 'longitude', 'latitude','address'])
bart.df
In [ ]:
# Your code here to create the geojson data
# Now save the geojson data to a file
with open('map/geojson/bart_coords2.geojson', 'w') as f:
json.dump(geo_data, f, indent=2)
In [ ]:
# Read in the geojson data and write it out to a javascript file so we can map it
with open('map/geojson/bart_coords2.geojson', 'r') as infile:
lines = infile.readlines()
with open('map/geojson/bart_coords2.js', 'w') as outfile:
outfile.write('var bart = ')
outfile.writelines(lines)
infile.close()
outfile.close()
In [ ]: