Agenda:
You'll need a Google API key to use the Google Maps Geocoding API and the Google Places API Web Service:
google_maps_api_key='YOUR-KEY-HERE'
In [1]:
import pandas as pd, requests, time
from geopy.geocoders import GoogleV3
# import my api key saved in a local file i called keys.py
from keys import google_maps_api_key
In [2]:
# set the pause duration between api requests
pause = 0.1
In [3]:
locations = pd.DataFrame()
locations['address'] = ['350 5th Ave, New York, NY 10118',
'100 Larkin St, San Francisco, CA 94102',
'Wurster Hall, Berkeley, CA']
locations
Out[3]:
In [4]:
# function that accepts an address string, sends it to the Google API, and returns the lat-long API result
def geocode(address):
time.sleep(pause) #pause for some duration before each request, to not hammer their server
url = 'http://maps.googleapis.com/maps/api/geocode/json?address={}&sensor=false' #api url with placeholders
request = url.format(address) #fill in the placeholder with a variable
response = requests.get(request) #send the request to the server and get the response
data = response.json() #convert the response json string into a dict
if len(data['results']) > 0: #if google was able to geolocate our address, extract lat-long from result
latitude = data['results'][0]['geometry']['location']['lat']
longitude = data['results'][0]['geometry']['location']['lng']
return '{},{}'.format(latitude, longitude) #return lat-long as a string in the format google likes
In [5]:
# for each value in the address column, geocode it, save results as new df column
locations['latlng'] = locations['address'].map(geocode)
locations
Out[5]:
In [6]:
# parse the result into separate lat and lon columns for easy mapping
locations['latitude'] = locations['latlng'].map(lambda x: x.split(',')[0])
locations['longitude'] = locations['latlng'].map(lambda x: x.split(',')[1])
locations
Out[6]:
In [7]:
# google places api url, with placeholders
url = 'https://maps.googleapis.com/maps/api/place/search/json?keyword={}&location={}&radius={}&key={}&sensor=false'
# what keyword to search for
keyword = 'restaurant'
# define the radius (in meters) for the search
radius = 1000
# define the location coordinates (of wurster hall)
location = locations.loc[2, 'latlng']
location
Out[7]:
In [8]:
# add our variables into the url, submit the request to the api, and load the response
request = url.format(keyword, location, radius, google_maps_api_key)
response = requests.get(request)
data = response.json()
In [9]:
places = pd.DataFrame(data['results'])
places = places[['name', 'geometry', 'rating', 'vicinity']]
places.head()
Out[9]:
In [10]:
# parse out lat-long and return it as a series -> this creates a dataframe of all the results when you .apply()
def parse_coords(geometry):
if isinstance(geometry, dict):
lng = geometry['location']['lng']
lat = geometry['location']['lat']
return pd.Series({'latitude':lat, 'longitude':lng})
# test our function
places['geometry'].head().apply(parse_coords)
Out[10]:
In [11]:
# now run our function on the whole dataframe and SAVE THE OUTPUT to 2 new dataframe columns
places[['latitude', 'longitude']] = places['geometry'].apply(parse_coords)
places_clean = places.drop('geometry', axis=1)
places_clean.sort_values(by='rating', ascending=False).head()
Out[11]:
We'll use Google's reverse geocoding API.
You can do this manually, just like in the previous two sections, but it's a little more complicated to parse Google's address components results. If we just want addresses, we can use geopy to simply call Google's API automatically for us.
In [12]:
# load usa point data and keep only the first 5 rows
usa = pd.read_csv('data/usa-latlong.csv')
usa = usa.head()
usa
Out[12]:
In [13]:
# create a column to put lat-long into the format google likes - this just makes it easier to call their API
usa['latlng'] = usa.apply(lambda row: '{},{}'.format(row['latitude'], row['longitude']), axis=1)
usa
Out[13]:
In [14]:
# tell geopy to reverse geocode some lat-long string using Google's API and return the address
def reverse_geopy(latlng):
time.sleep(pause)
geolocator = GoogleV3()
address, _ = geolocator.reverse(latlng, exactly_one=True)
return address
usa['address'] = usa['latlng'].map(reverse_geopy)
usa
Out[14]:
You could try to parse the address strings, but you're relying on them always having a consistent format. This might not be the case if you have international location data. In this case, you should call the API manually and extract the individual address components you are interested in.
In [15]:
# pass the Google API latlng data to reverse geocode it
def reverse_geocode(latlng):
time.sleep(pause)
url = 'https://maps.googleapis.com/maps/api/geocode/json?latlng={}'
request = url.format(latlng)
response = requests.get(request)
data = response.json()
if len(data['results']) > 0:
return data['results'][0] #if we got results, return the first result
geocode_results = usa['latlng'].map(reverse_geocode)
Now look inside each reverse geocode result to see if address_components exists. If it does, look inside each component to see if we can find the city or the state. Google calls the city name by the abstract term 'locality' and the state name by the abstract term 'administrative_area_level_1' ...this just lets them use the same terminology anywhere in the world.
In [16]:
def get_city(geocode_result):
if 'address_components' in geocode_result:
for address_component in geocode_result['address_components']:
if 'locality' in address_component['types']:
return address_component['long_name']
def get_state(geocode_result):
if 'address_components' in geocode_result:
for address_component in geocode_result['address_components']:
if 'administrative_area_level_1' in address_component['types']:
return address_component['long_name']
In [17]:
# now map our functions to extract city and state names
usa['city'] = geocode_results.map(get_city)
usa['state'] = geocode_results.map(get_state)
usa
Out[17]:
We'll use the FCC's (very slow) Census Block Conversions API to turn lat/long into a block FIPS code. FIPS codes contain from left to right: the location's 2-digit state code, 3-digit county code, 6-digit census tract code, and 4-digit census block code (the first digit of which is the census block group code). Now you can join your data to tract (etc) level census data without doing a spatial join.
In [18]:
# pass the FCC API lat/long and get FIPS data back - return block fips and county name
def get_fips(row):
time.sleep(pause)
url = 'http://data.fcc.gov/api/block/find?format=json&latitude={}&longitude={}'
request = url.format(row['latitude'], row['longitude'])
response = requests.get(request)
data = response.json()
# return multiple values as a series - this will create a dataframe with multiple columns
return pd.Series({'fips_code':data['Block']['FIPS'], 'county':data['County']['name']})
In [19]:
# get block fips code and county name from FCC as new dataframe, then concatenate to join them
fips = usa.apply(get_fips, axis=1)
usa = pd.concat([usa, fips], axis=1)
usa
Out[19]:
In [ ]: