In [1]:
# Import packages

import json, string

import folium
import requests
import geocoder
import pandas as pd

from folium import Map, Marker, GeoJson, LayerControl
from ediblepickle import checkpoint

%matplotlib inline

In [2]:
# Read API keys from file

with open("secrets/.wmata") as fin:
    wmata_key = fin.read().strip()
with open("secrets/.walkscore") as fin:
    walkscore_key = fin.read().strip()

Folium: A Leaflet Wrapper for Python

What can Folium and by extension, Leaflet can do to help you explore the structure of cities?

  1. Live bus tracking
  2. Neighborhood choropleths
  3. Walk Score® along bus routes

Leaflet is currently one of the most popular JavaScript libraries for mapping.

  • Interactive through web browser
  • Flexibility of JavaScript
  • Mobile-friendly
  • Great API
  • Open source

Python is a great language for the web. Using the same language, you can

Python's continuing utility is due to its ecosystem, not its syntax or speed. For example, according to Wikipedia, TensorFlow was the fastest growing deep learning framework in fall 2016. The second fastest was a high-level Python library called Keras which can seamlessly plug into TensorFlow's architecture.

There's a similar relationship between Leaflet.js and the Python package Folium.

Folium

Folium uses the Leaflet API to allow users to write Python code to generate and manipulate interactive JavaScript maps. This also allows for drawing those maps in Jupyter notebooks.

Interactivity

You can drag, zoom, click, and hover. More generally you can provide input and get output, even if that output requires backend calculations. JavaScript was designed to ferry information between the frontend and the backend seamlessly.

  • More seamless than redrawing maps
  • Answer questions in real time
  • Simultaneous exploration and explanation

All this makes for a good tool.

Documentation

Bus tracker

The WMATA API is free to use. Learn more here.

  • Investigate real time positions compared to scheduled locations to say something about actual vs. ideal transit
  • Look at how connections between regions vary throughout the day

Interactivity:

  • Input: location of interest, search radius
  • Output: locations and information of buses in the area

Geocoder

An alternative to calling eg. the OpenStreetMap API, geocoder provides a wrapper around a number of popular geocoding services.


In [3]:
location = 'Union Station'
loc = geocoder.osm(location)

In [4]:
loc.json


Out[4]:
{'accuracy': 0.662122692724334,
 'address': 'Union Station, Union Station Drive Northeast, NoMa, Near Northeast, Washington, District of Columbia, 20549, USA',
 'bbox': {'northeast': [38.8983537, -77.0049061],
  'southwest': [38.8968333, -77.0074118]},
 'city': 'Washington',
 'confidence': 9,
 'country': 'USA',
 'country_code': 'us',
 'icon': 'https://nominatim.openstreetmap.org/images/mapicons/transport_train_station2.p.20.png',
 'importance': 0.662122692724334,
 'lat': 38.89764285,
 'lng': -77.0061463947582,
 'neighborhood': 'NoMa',
 'ok': True,
 'osm_id': '67100178',
 'osm_type': 'way',
 'place_id': '91391856',
 'place_rank': '30',
 'postal': '20549',
 'quality': 'station',
 'raw': {'place_id': '91391856',
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'way',
  'osm_id': '67100178',
  'boundingbox': ['38.8968333', '38.8983537', '-77.0074118', '-77.0049061'],
  'lat': '38.89764285',
  'lon': '-77.0061463947582',
  'display_name': 'Union Station, Union Station Drive Northeast, NoMa, Near Northeast, Washington, District of Columbia, 20549, USA',
  'place_rank': '30',
  'category': 'railway',
  'type': 'station',
  'importance': 0.662122692724334,
  'icon': 'https://nominatim.openstreetmap.org/images/mapicons/transport_train_station2.p.20.png',
  'address': {'station': 'Union Station',
   'road': 'Union Station Drive Northeast',
   'neighbourhood': 'NoMa',
   'suburb': 'Near Northeast',
   'city': 'Washington',
   'state': 'District of Columbia',
   'postcode': '20549',
   'country': 'USA',
   'country_code': 'us'}},
 'region': 'District of Columbia',
 'state': 'District of Columbia',
 'status': 'OK',
 'street': 'Union Station Drive Northeast',
 'suburb': 'Near Northeast',
 'type': 'station'}

In [5]:
# Maps are hierarchical objects
latlng = [loc.lat, loc.lng]

bus_map = Map(location=latlng,
              zoom_start=15)
bus_map.add_child(Marker(location=latlng, popup=loc.address, icon = folium.Icon(color = 'blue')))
# bus_map.add_child(GeoJson(loc.geojson))
bus_map


Out[5]:

In [6]:
# Saving maps

# bus_map.save('bus_map.html')

In [7]:
# Set general WMATA parameters

session = requests.Session()
session.mount('https://api.wmata.com', requests.adapters.HTTPAdapter(max_retries = 2))

headers = {'api_key': wmata_key}

radius = '1000'

Checkpointing

Caching and checkpointing is crucial for dealing with APIs sustainably and respectfully. You should never hit an endpoint twice for the same data.

Edible Pickle is a checkpointing tool that allows you to save the expensive results of a function so that it need not be run again if that result is already present.

In the following cell, setting refresh = True will make the function get current data instead of relying on the cache.


In [8]:
# Call API for bus locations

bus_endpoint = 'https://api.wmata.com/Bus.svc/json/jBusPositions'

@checkpoint(key = string.Template('{0}x{1}_radius{2}.buslist'), work_dir = 'cache/', refresh = False)
def get_buses(lat, lon, radius):
    """
    All values passed as strings and radius in meters
    """

    params = {
              # 'RouteID': 'B12',
              'Lat': lat,
              'Lon': lon,
              'Radius': radius
             }

    response = session.get(bus_endpoint, params = params, headers = headers)
    if not response.status_code == 200:
        raise ValueError("Response status not 200")
    else:
        return response.json()['BusPositions']

In [9]:
bus_list = get_buses(loc.lat, loc.lng, radius)
# buses_in_the_area = len(bus_list)

In [10]:
# example response element

bus_list[0]


Out[10]:
{'VehicleID': '2933',
 'Lat': 38.896061,
 'Lon': -77.007278,
 'Deviation': -3.0,
 'DateTime': '2018-10-06T14:11:11',
 'TripID': '877419070',
 'RouteID': 'D6',
 'DirectionNum': 0,
 'DirectionText': 'EAST',
 'TripHeadsign': 'STADIUM-ARMORY',
 'TripStartTime': '2018-10-06T13:20:00',
 'TripEndTime': '2018-10-06T14:32:00'}

In [11]:
for bus in bus_list:
    folium.features.RegularPolygonMarker(location = [bus['Lat'], bus['Lon']],
                                         popup = 'Route %s to %s' % (bus['RouteID'], bus['TripHeadsign']),
                                         number_of_sides = 3,
                                         radius = 10,
                                         weight = 1,
                                         fill_opacity = 0.8).add_to(bus_map)
bus_map


Out[11]:

Exercises

  • Use the WMATA bus routes endpoint to identify a bus' next stop, and use math to rotate its triangle to the right direction
  • Overlay trains in the area on the map

Neighborhood choropleths

  • Visualize metrics of interest that have different values for each region
  • Overlay metrics to perform an "and" query

Interactivity:

  • Input: neighborhood shape information, a value for each neighborhood to plot
  • Output: neighborhood regions highlighted on map, colored by value

In [12]:
nh_map = Map(location = latlng,
             zoom_start = 13,
             tiles = 'Stamen Toner')

GeoJSON

A file format that combines geographical data with associated JSON attributes. You can find or create these datasets in a variety of ways. This particular dataset comes from this GitHub repository.


In [13]:
with open('geojson/neighborhood-composition.geojson') as fin:
    gjdata = json.load(fin)
nhoods = gjdata['features']
nhoods[0]


Out[13]:
{'type': 'Feature',
 'properties': {'OBJECTID': 1,
  'AREA_': 2775872.25,
  'PERIMETER': 8433.114,
  'DC_CEN_TRA': 2,
  'DC_CEN_T_1': 1,
  'POP90': 4485,
  'HUNITS90': 1667,
  'TRACT': 16,
  'POVRATE': 2.41,
  'POVGROUP': 0,
  'NHOOD_COMP': 'S',
  'SHAPE_Length': 8689.281078107126,
  'SHAPE_Area': 2709315.5366511554},
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-77.03745337664726, 38.99320776056069],
    [-77.03658614499436, 38.99247259661867],
    [-77.03643001297164, 38.99231607382144],
    [-77.03601180967503, 38.99186822549753],
    [-77.0349859465934, 38.99102913456264],
    [-77.03412183812416, 38.990416147387194],
    [-77.03343609259034, 38.98985529644307],
    [-77.0328786111678, 38.98945097705591],
    [-77.03235456116573, 38.98901619799956],
    [-77.03110580582891, 38.988020562346556],
    [-77.02966757546743, 38.98692057424375],
    [-77.0284802142459, 38.986003184930354],
    [-77.0281847628857, 38.985720551782485],
    [-77.02678564697948, 38.98468141384226],
    [-77.02655153719147, 38.98451184254842],
    [-77.02654544760266, 38.98384248850553],
    [-77.0265111456808, 38.982757429554475],
    [-77.02649679506852, 38.98200074245761],
    [-77.02648324223682, 38.981433371820444],
    [-77.02643488048763, 38.97940230794649],
    [-77.02641393359247, 38.97834412551212],
    [-77.026421389052, 38.978033060771644],
    [-77.02783658662884, 38.97804455126507],
    [-77.0296668935345, 38.97804874951685],
    [-77.0328694919084, 38.97809208758694],
    [-77.03338355694684, 38.97762801828364],
    [-77.0347368220949, 38.97636563438237],
    [-77.03527786634794, 38.9758065387571],
    [-77.03632688692504, 38.97495479558093],
    [-77.03632055608156, 38.97313035090405],
    [-77.0363280131229, 38.97220658095834],
    [-77.03754686112063, 38.97210879889243],
    [-77.03723444310336, 38.97325843084184],
    [-77.0394448509513, 38.97282798625889],
    [-77.03912878155018, 38.97385601317927],
    [-77.03945724163759, 38.97403086074082],
    [-77.03940165556399, 38.974265742158565],
    [-77.03997045195433, 38.97511365624699],
    [-77.03970886442639, 38.975626953334285],
    [-77.03987085569673, 38.97636627576869],
    [-77.04216162636152, 38.97760500462075],
    [-77.04274129569094, 38.97791358723361],
    [-77.04353269140222, 38.97819164953665],
    [-77.04385603296039, 38.97845683901735],
    [-77.04397904709626, 38.97914831827377],
    [-77.04406831160242, 38.979326602483205],
    [-77.04419662234963, 38.97958316234636],
    [-77.04453686431249, 38.98011798664016],
    [-77.0438350291736, 38.98054013390485],
    [-77.04318312438473, 38.98061865788188],
    [-77.04198297536863, 38.98005307219469],
    [-77.04016185114382, 38.98132197934236],
    [-77.03980122643688, 38.9821419947868],
    [-77.04012915557624, 38.98279929352518],
    [-77.04080525777114, 38.983134336451606],
    [-77.0409484859708, 38.983208324932185],
    [-77.0418045192918, 38.98360172621602],
    [-77.04155232808833, 38.983811599433096],
    [-77.04194803320962, 38.983902792483505],
    [-77.04277439858414, 38.98409103514218],
    [-77.04278341526684, 38.984095256697465],
    [-77.04433244411406, 38.98393388314525],
    [-77.04579342313589, 38.98416391255852],
    [-77.0477803883073, 38.985482632007816],
    [-77.04814150363909, 38.98548248392899],
    [-77.04871601389934, 38.98550786540717],
    [-77.04888029372168, 38.98571276975437],
    [-77.04925792045731, 38.98585352680511],
    [-77.04965214012891, 38.9862504961783],
    [-77.05007898313231, 38.98635280062867],
    [-77.05051647811025, 38.986142027512535],
    [-77.05104746762281, 38.986390819991456],
    [-77.05127711550999, 38.98617292798969],
    [-77.05170388600492, 38.986172740111975],
    [-77.05227855351363, 38.98640308589269],
    [-77.05242071031925, 38.986642937974366],
    [-77.05233731970786, 38.986912627119445],
    [-77.05249342356012, 38.98699954045045],
    [-77.0522560979285, 38.9871816486133],
    [-77.04717469567625, 38.991145302484604],
    [-77.04622107765816, 38.9918784464043],
    [-77.04190522886462, 38.99525181377237],
    [-77.04098789436863, 38.99591663726905],
    [-77.03745337664726, 38.99320776056069]]]}}

In [14]:
# Create Pandas DataFrame

nhdata = pd.DataFrame([nhood['properties'] for nhood in nhoods], columns = sorted(nhoods[0]['properties'].keys()))

In [15]:
nhdata.head()


Out[15]:
AREA_ DC_CEN_TRA DC_CEN_T_1 HUNITS90 NHOOD_COMP OBJECTID PERIMETER POP90 POVGROUP POVRATE SHAPE_Area SHAPE_Length TRACT
0 2775872.250 2 1 1667 S 1 8433.114 4485 0 2.41 2.709316e+06 8689.281078 16.0
1 4781802.500 3 2 2309 S 2 10249.021 5570 0 2.21 4.753419e+06 10567.377368 15.0
2 607200.688 4 3 1287 E 3 3938.597 2987 0 9.57 6.007651e+05 3867.393986 17.1
3 924548.875 5 5 1086 S 4 4830.491 2604 0 4.33 9.311315e+05 4857.307985 17.2
4 487822.625 6 4 10 E 5 2771.135 765 0 0.00 4.878226e+05 2771.134778 18.1

In [16]:
# Using Pandas to create derived variables

nhdata['Density'] = nhdata['POP90'] / nhdata['AREA_']
nhdata.describe()


Out[16]:
AREA_ DC_CEN_TRA DC_CEN_T_1 HUNITS90 OBJECTID PERIMETER POP90 POVGROUP POVRATE SHAPE_Area SHAPE_Length TRACT Density
count 1.920000e+02 192.000000 192.000000 192.000000 192.000000 192.000000 192.000000 192.000000 192.000000 1.920000e+02 192.000000 192.000000 192.000000
mean 9.256678e+05 97.500000 96.500000 1450.463542 96.500000 4134.226021 3160.937500 0.973958 17.318021 9.242352e+05 4136.915084 58.466667 0.005863
std 1.245642e+06 55.569776 55.569776 781.478079 55.569776 2505.293802 1444.572962 0.840312 13.166314 1.242515e+06 2517.643312 30.633035 0.004097
min 1.625355e+05 2.000000 1.000000 0.000000 1.000000 1675.831000 0.000000 0.000000 0.000000 1.625355e+05 1675.830873 1.000000 0.000000
25% 3.529960e+05 49.750000 48.750000 1035.500000 48.750000 2711.860000 2253.500000 0.000000 7.417500 3.481418e+05 2711.859936 29.750000 0.003059
50% 5.473621e+05 97.500000 96.500000 1350.500000 96.500000 3455.979500 3079.000000 1.000000 13.575000 5.510787e+05 3456.624765 64.600000 0.005039
75% 1.016516e+06 145.250000 144.250000 1888.000000 144.250000 4756.924500 4182.250000 2.000000 24.025000 1.015430e+06 4765.208261 84.425000 0.008043
max 1.142948e+07 193.000000 192.000000 4555.000000 192.000000 25651.359000 7767.000000 2.000000 79.630000 1.144449e+07 25611.404561 99.700000 0.020654

Colormaps

Check out ColorBrewer for advice about coloring for cartography.


In [17]:
# Set up colormaps to represent the range of values

from branca.colormap import linear

popcolors = linear.GnBu_06.scale(
    nhdata['POP90'].min(),
    nhdata['POP90'].max() / 1.5)

povcolors = linear.PuRd_06.scale(
    nhdata['POVRATE'].min(),
    nhdata['POVRATE'].max() / 2)

print(popcolors(1000))

povcolors


#ceecc6
Out[17]:
0.039.815

In [18]:
# Adds a caption to the map that shows the color scale

popcolors.caption = 'Population Scale'
popcolors.add_to(nh_map)


Out[18]:
05178.0

In [19]:
GeoJson(gjdata,
        name = 'population',
        style_function = lambda feature: {
            'color': 'black',
            'weight': 1,
            'dashArray': '5, 5',
            'fillColor': popcolors(feature['properties']['POP90'])
            }
        ).add_to(nh_map)


Out[19]:
<folium.features.GeoJson at 0x7f8c8edde0f0>

In [20]:
GeoJson(gjdata,
        name = 'poverty rate',
        style_function = lambda feature: {
            'color': 'black',
            'weight': 1,
            'dashArray': '5, 5',
            'fillColor': povcolors(feature['properties']['POVRATE'])
            }
        ).add_to(nh_map)


Out[20]:
<folium.features.GeoJson at 0x7f8c8ed57fd0>

In [21]:
LayerControl().add_to(nh_map)


Out[21]:
<folium.map.LayerControl at 0x7f8c8ed53eb8>

In [22]:
# Colormaps can be changed on the fly

nh_map


Out[22]:

More succinct choropleths

This example notebook goes through some other techniques for creating choropleths with additional functionality all within one choropleth method.

Exercises

  • Come up with a standard way of setting colormap thresholds for different variable distributions
  • Get a shapefile from the DC OpenData API and convert it to GeoJSON using ogr2ogr or a similar tool

Walk Score

A measure of how dependent an address is on having a car. For example, areas that require cars are more expensive than they seem to live in. An API is available.

84

  • Investigate how transit routes lie on top of underlying people - what's between the bus stops?
  • Is the purpose of a bus route to increase "walkability" for populations?

Interactivity:

  • Input: bus route
  • Output: line along the shape of the route, colored by score

In [23]:
# Get Metrobus route data from WMATA

route_endpoint = 'https://api.wmata.com/Bus.svc/json/jRouteDetails'
date = '2018-09-01'  # Changing this constant will likely require hitting the API

def get_route_shape(route, date):  # eg. 'L2', 'YYYY-MM-DD'
    params = {'RouteID': route}
    if date:
        params['Date'] = date
    response = session.get(route_endpoint, params = params, headers = headers)
    if response.status_code != 200:
        raise ValueError("Error: Response status not 200")
    else:
        return response.json()['Direction0']['Shape']

# for stop in route_shape:
#     Marker(location=[stop['Lat'], stop['Lon']], popup=str(stop['SeqNum'])).add_to(m)

In [24]:
# Get Walkscore data - 500 ft grid resolution

"""
http://api.walkscore.com/score?format=json&
address=1119%8th%20Avenue%20Seattle%20WA%2098101&lat=47.6085&
lon=-122.3295&transit=1&bike=1&wsapikey=<YOUR-WSAPIKEY>
"""

walkscore_endpoint = 'http://api.walkscore.com/score'

def get_walkscore(pin):
    params = {'format': 'json',
              'wsapikey': walkscore_key,
              'lat': pin[0],
              'lon': pin[1],
              'transit': '1',
              'bike': '1',
              'address': geocoder.osm(pin, method='reverse').address}
    response = requests.get(walkscore_endpoint, params = params)
    if response.status_code != 200:
        return None
    else:
        return response.json()

In [25]:
@checkpoint(key = string.Template('{0}_scores_{1}.panda'), work_dir = 'cache/', refresh = False)
def get_route_scores(route, date):
    shape = get_route_shape(route, date)
    pins = [(pin['Lat'], pin['Lon']) for pin in shape]
    
    walk_scores = []
    transit_scores = []
    bike_scores = []
    for pin in pins:
        score_json = get_walkscore(pin)
        if not score_json:
            walk_scores.append(-1)
            transit_scores.append(-1)
            bike_scores.append(-1)
            continue

        walk_scores.append(score_json.get('walkscore', -1))
        transit_scores.append(score_json.get('transit', {}).get('score', -1))
        bike_scores.append(score_json.get('bike', {}).get('score', -1))
    df = pd.DataFrame({'pin': pins,
                       'walk_score': walk_scores,
                       'transit_score': transit_scores,
                       'bike_score': bike_scores})
    df = df[['pin', 'walk_score', 'bike_score', 'transit_score']]
    return df

In [26]:
# Example response from Walk Score API

test = get_walkscore(latlng)

In [27]:
test


Out[27]:
{'status': 1,
 'walkscore': 93,
 'description': "Walker's Paradise",
 'updated': '2018-05-13 04:20:29.764740',
 'logo_url': 'https://cdn.walk.sc/images/api-logo.png',
 'more_info_icon': 'https://cdn.walk.sc/images/api-more-info.gif',
 'more_info_link': 'https://www.redfin.com/how-walk-score-works',
 'ws_link': 'https://www.walkscore.com/score/50-Massachusetts-Avenue-Northeast-NoMa-Washington-District-of-Columbia-20002-USA/lat=38.89764285/lng=-77.0061463947582/?utm_source=github.com&utm_medium=ws_api&utm_campaign=ws_api',
 'help_link': 'https://www.redfin.com/how-walk-score-works',
 'snapped_lat': 38.898,
 'snapped_lon': -77.0055,
 'transit': {'score': None, 'description': None, 'summary': None},
 'bike': {'score': 94, 'description': "Biker's Paradise"}}

In [28]:
# This is where the magic happens
# Cached data for: L2, V5, E4, W4, 38B, 70
# Maps here: https://www.wmata.com/schedules/maps/

route = 'L2'

df = get_route_scores(route, date)

In [29]:
print(df.shape)
df.head()


(415, 4)
Out[29]:
pin walk_score bike_score transit_score
0 (38.901537, -77.038512) 97 90 None
1 (38.901525, -77.038512) 97 90 None
2 (38.901503, -77.038512) 97 90 None
3 (38.901347, -77.038512) 97 90 None
4 (38.901347, -77.038702) 97 90 None

In [30]:
score_map = Map(location = loc.latlng, zoom_start = 12, tiles = 'Stamen Terrain')

In [31]:
color_line = folium.features.ColorLine(
    df['pin'],
    colors = df['walk_score'],
    colormap = ['y', 'orange', 'r'],
    weight = 6,
    name = u'Route %s Walk Score\u00ae' % route)
color_line.add_to(score_map)


Out[31]:
<folium.features.ColorLine at 0x7f8c8eccf7f0>

In [32]:
# This adds the population layer back in

GeoJson(gjdata,
        name = 'population',
        style_function = lambda feature: {
            'color': 'black',
            'weight': 1,
            'dashArray': '5, 5',
            'fillColor': popcolors(feature['properties']['POP90'])
            }
        ).add_to(score_map)


Out[32]:
<folium.features.GeoJson at 0x7f8c8eccf048>

In [33]:
LayerControl().add_to(score_map)


Out[33]:
<folium.map.LayerControl at 0x7f8c8ecc56d8>

In [34]:
score_map


Out[34]:

In [35]:
# score_map.save("score_map.html")

Exercises

  • Use a for loop to add all colorlines to the map
  • Overlay Walk Score fluctuations with census block or block group populations
  • Cluster bus routes by "purpose" based on how much time they spend in highly walkable areas

Call to Action

"There's a lot of energy behind open source."

Go learn, go do, go share!

© Ariel M'ndange-Pfupfu 2018


In [ ]: