UrbanAccess Demo

Author: UrbanSim

This notebook provides a brief overview of the main functionality of UrbanAccess with examples using AC Transit and BART GTFS data and OpenStreetMap (OSM) pedestrian network data to create an integrated transit and pedestrian network for Oakland, CA for use in Pandana network accessibility queries.

UrbanAccess on UDST: https://github.com/UDST/urbanaccess

UrbanAccess documentation: https://udst.github.io/urbanaccess/index.html

UrbanAccess citation:

Samuel D. Blanchard and Paul Waddell, 2017, "UrbanAccess: Generalized Methodology for Measuring Regional Accessibility with an Integrated Pedestrian and Transit Network" Transportation Research Record: Journal of the Transportation Research Board, 2653: 35–44.

Notes:

  • GTFS feeds are constantly updated. The feeds in this notebook may change over time which may result in slight differences in results.
  • Output cells in this notebook have been cleared to reduce file size.

Installation:

For UrbanAccess installation instructions see: https://udst.github.io/urbanaccess/installation.html

This notebook contains optional Pandana examples which require the installation of Pandana, for instructions see here: http://udst.github.io/pandana/installation.html

Outline:

  • The settings object
  • The feeds object and searching for GTFS feeds
  • Downloading GTFS data
  • Loading GTFS data into a UrbanAccess transit data object
  • Creating a transit network
  • Downloading OSM data
  • Creating a pedestrian network
  • Creating an integrated transit and pedestrian network
  • Saving a network to disk
  • Loading a network from disk
  • Visualizing the network
  • Adding average headways to network travel time
  • Using a UrbanAccess network with Pandana

In [ ]:
import pandas as pd
import pandana as pdna
import time

import urbanaccess as ua
from urbanaccess.config import settings
from urbanaccess.gtfsfeeds import feeds
from urbanaccess import gtfsfeeds
from urbanaccess.gtfs.gtfsfeeds_dataframe import gtfsfeeds_dfs
from urbanaccess.network import ua_network, load_network

%matplotlib inline

In [ ]:
# Pandana currently uses depreciated parameters in matplotlib, this hides the warning until its fixed
import warnings
import matplotlib.cbook
warnings.filterwarnings("ignore",category=matplotlib.cbook.mplDeprecation)

The settings object

The settings object is a global urbanaccess_config object that can be used to set default options in UrbanAccess. In general, these options do not need to be changed.


In [ ]:
settings.to_dict()

For example, you can stop printing in notebooks and only print to console by setting:


In [ ]:
settings.log_console = True

turn on printing for now


In [ ]:
settings.log_console = False

The feeds object

The GTFS feeds object is a global urbanaccess_gtfsfeeds object that allows you to save and manage information needed to download multiple GTFS feeds. This object is a dictionary of the names of GTFS feeds or agencies and the URLs to use to download the corresponding feeds.


In [ ]:
feeds.to_dict()

Searching for GTFS feeds

You can use the search function to find feeds on the GTFS Data Exchange (Note: the GTFS Data Exchange is no longer being maintained as of Summer 2016 so feeds here may be out of date)

Let's search for feeds for transit agencies in the GTFS Data Exchange that we know serve Oakland, CA: 1) Bay Area Rapid Transit District (BART) which runs the metro rail service and 2) AC Transit which runs bus services.

Let's start by finding the feed for the Bay Area Rapid Transit District (BART) by using the search term Bay Area Rapid Transit:


In [ ]:
gtfsfeeds.search(search_text='Bay Area Rapid Transit',
                 search_field=None,
                 match='contains')

Now that we see what can be found on the GTFS Data Exchange. Let's run this again but this time let's add the feed from your search to the feed download list


In [ ]:
gtfsfeeds.search(search_text='Bay Area Rapid Transit',
                 search_field=None,
                 match='contains',
                 add_feed=True)

If you know of a GTFS feed located elsewhere or one that is more up to date, you can add additional feeds located at custom URLs by adding a dictionary with the key as the name of the service/agency and the value as the URL.

Let's do this for AC Transit which also operates in Oakland, CA.

The link to their feed is here: http://www.actransit.org/planning-focus/data-resource-center/ and let's get the latest version as of June 18, 2017


In [ ]:
feeds.add_feed(add_dict={'ac transit': 'http://www.actransit.org/wp-content/uploads/GTFSJune182017B.zip'})

Note the two GTFS feeds now in your feeds object ready to download


In [ ]:
feeds.to_dict()

Downloading GTFS data

Use the download function to download all the feeds in your feeds object at once. If no parameters are specified the existing feeds object will be used to acquire the data.

By default, your data will be downloaded into the directory of this notebook in the folder: data


In [ ]:
gtfsfeeds.download()

Load GTFS data into an UrbanAccess transit data object

Now that we have downloaded our data let's load our individual GTFS feeds (currently a series of text files stored on disk) into a combined network of Pandas DataFrames.

  • You can specify one feed or multiple feeds that are inside a root folder using the gtfsfeed_path parameter. If you want to aggregate multiple transit networks together, all the GTFS feeds you want to aggregate must be inside of a single root folder.
  • Turn on validation and set a bounding box with the remove_stops_outsidebbox parameter turned on to ensure all your GTFS feed data are within a specified area.

Let's specify a bounding box of coordinates for the City of Oakland to subset the GTFS data to. You can generate a bounding box by going to http://boundingbox.klokantech.com/ and selecting the CSV format.


In [ ]:
validation = True
verbose = True
# bbox for City of Oakland
bbox = (-122.355881,37.632226,-122.114775,37.884725)
remove_stops_outsidebbox = True
append_definitions = True

loaded_feeds = ua.gtfs.load.gtfsfeed_to_df(gtfsfeed_path=None,
                                           validation=validation,
                                           verbose=verbose,
                                           bbox=bbox,
                                           remove_stops_outsidebbox=remove_stops_outsidebbox,
                                           append_definitions=append_definitions)

The transit data object

The output is a global urbanaccess_gtfs_df object that can be accessed with the specified variable loaded_feeds. This object holds all the individual GTFS feed files aggregated together with each GTFS feed file type in separate Pandas DataFrames to represent all the loaded transit feeds in a metropolitan area.


In [ ]:
loaded_feeds.stops.head()

Note the two transit services we have aggregated into one regional table


In [ ]:
loaded_feeds.stops.unique_agency_id.unique()

Quickly view the transit stop locations


In [ ]:
loaded_feeds.stops.plot(kind='scatter', x='stop_lon', y='stop_lat', s=0.1)

In [ ]:
loaded_feeds.routes.head()

In [ ]:
loaded_feeds.stop_times.head()

In [ ]:
loaded_feeds.trips.head()

In [ ]:
loaded_feeds.calendar.head()

Create a transit network

Now that we have loaded and standardized our GTFS data, let's create a travel time weighted graph from the GTFS feeds we have loaded.

Create a network for weekday monday service between 7 am and 10 am (['07:00:00', '10:00:00']) to represent travel times during the AM Peak period.

Assumptions: We are using the service ids in the calendar file to subset the day of week, however if your feed uses the calendar_dates file and not the calendar file then you can use the calendar_dates_lookup parameter. This is not required for AC Transit and BART.


In [ ]:
ua.gtfs.network.create_transit_net(gtfsfeeds_dfs=loaded_feeds,
                                   day='monday',
                                   timerange=['07:00:00', '10:00:00'],
                                   calendar_dates_lookup=None)

The UrbanAccess network object

The output is a global urbanaccess_network object. This object holds the resulting graph comprised of nodes and edges for the processed GTFS network data for services operating at the day and time you specified inside of transit_edges and transit_nodes.

Let's set the global network object to a variable called urbanaccess_net that we can then inspect:


In [ ]:
urbanaccess_net = ua.network.ua_network

In [ ]:
urbanaccess_net.transit_edges.head()

In [ ]:
urbanaccess_net.transit_nodes.head()

In [ ]:
urbanaccess_net.transit_nodes.plot(kind='scatter', x='x', y='y', s=0.1)

Download OSM data

Now let's download OpenStreetMap (OSM) pedestrian street network data to produce a graph network of nodes and edges for Oakland, CA. We will use the same bounding box as before.


In [ ]:
nodes, edges = ua.osm.load.ua_network_from_bbox(bbox=bbox,
                                                remove_lcn=True)

Create a pedestrian network

Now that we have our pedestrian network data let's create a travel time weighted graph from the pedestrian network we have loaded and add it to our existing UrbanAccess network object. We will assume a pedestrian travels on average at 3 mph.

The resulting weighted network will be added to your UrbanAccess network object inside osm_nodes and osm_edges


In [ ]:
ua.osm.network.create_osm_net(osm_edges=edges,
                              osm_nodes=nodes,
                              travel_speed_mph=3)

Let's inspect the results which we can access inside of the existing urbanaccess_net variable:


In [ ]:
urbanaccess_net.osm_nodes.head()

In [ ]:
urbanaccess_net.osm_edges.head()

In [ ]:
urbanaccess_net.osm_nodes.plot(kind='scatter', x='x', y='y', s=0.1)

Create an integrated transit and pedestrian network

Now let's integrate the two networks together. The resulting graph will be added to your existing UrbanAccess network object. After running this step, your network will be ready to be used with Pandana.

The resulting integrated network will be added to your UrbanAccess network object inside net_nodes and net_edges


In [ ]:
ua.network.integrate_network(urbanaccess_network=urbanaccess_net,
                             headways=False)

Let's inspect the results which we can access inside of the existing urbanaccess_net variable:


In [ ]:
urbanaccess_net.net_nodes.head()

In [ ]:
urbanaccess_net.net_edges.head()

In [ ]:
urbanaccess_net.net_edges[urbanaccess_net.net_edges['net_type'] == 'transit'].head()

Save the network to disk

You can save the final processed integrated network net_nodes and net_edges to disk inside of a HDF5 file. By default the file will be saved to the directory of this notebook in the folder data


In [ ]:
ua.network.save_network(urbanaccess_network=urbanaccess_net,
                        filename='final_net.h5',
                        overwrite_key = True)

Load saved network from disk

You can load an existing processed integrated network HDF5 file from disk into a UrbanAccess network object.


In [ ]:
urbanaccess_net = ua.network.load_network(filename='final_net.h5')

Visualize the network

You can visualize the network you just created using basic UrbanAccess plot functions

Integrated network


In [ ]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges,
                 bbox=bbox,
                 fig_height=30, margin=0.02,
                 edge_color='#999999', edge_linewidth=1, edge_alpha=1,
                 node_color='black', node_size=1.1, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

Integrated network by travel time

Use the col_colors function to color edges by travel time. In this case the darker red the higher the travel times.

Note the ability to see AC Transit's major bus arterial routes (in darker red) and transfer locations and BART rail network (rail stations are visible by the multiple bus connections at certain junctions in the network most visible in downtown Oakland at 19th, 12th Street, and Lake Merritt stations and Fruitvale and Coliseum stations) with the underlying pedestrian network. Downtown Oakland is located near the white cutout in the northeast middle section of the network which represents Lake Merritt.


In [ ]:
edgecolor = ua.plot.col_colors(df=urbanaccess_net.net_edges, col='weight', cmap='gist_heat_r', num_bins=5)
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges,
                 bbox=bbox,
                 fig_height=30, margin=0.02,
                 edge_color=edgecolor, edge_linewidth=1, edge_alpha=0.7,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

Let's zoom in closer to downtown Oakland using a new smaller extent bbox. Note the bus routes on the major arterials and the BART routes from station to station.


In [ ]:
edgecolor = ua.plot.col_colors(df=urbanaccess_net.net_edges, col='weight', cmap='gist_heat_r', num_bins=5)
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges,
                 bbox=(-122.282295, 37.795, -122.258434, 37.816022),
                 fig_height=30, margin=0.02,
                 edge_color=edgecolor, edge_linewidth=1, edge_alpha=0.7,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

Transit network

You can also slice the network by network type


In [ ]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges[urbanaccess_net.net_edges['net_type']=='transit'],
                 bbox=None,
                 fig_height=30, margin=0.02,
                 edge_color='#999999', edge_linewidth=1, edge_alpha=1,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

Pedestrian network


In [ ]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges[urbanaccess_net.net_edges['net_type']=='walk'],
                 bbox=None,
                 fig_height=30, margin=0.02,
                 edge_color='#999999', edge_linewidth=1, edge_alpha=1,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

Transit network: AC Transit Route 51A

You can slice the network using any attribute in edges. In this case let's examine one route for AC Transit route 51A.

Looking at what routes are in the network for 51A we see route id: 51A-141_ac_transit


In [ ]:
urbanaccess_net.net_edges['unique_route_id'].unique()

In [ ]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges[urbanaccess_net.net_edges['unique_route_id']=='51A-141_ac_transit'],
                 bbox=bbox,
                 fig_height=30, margin=0.02,
                 edge_color='#999999', edge_linewidth=1, edge_alpha=1,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

Transit network: BART network

We can also slice the data by agency. In this case let's view all BART routes.

Looking at what agencies are in the network for BART we see agency id: bay_area_rapid_transit


In [ ]:
urbanaccess_net.net_edges['unique_agency_id'].unique()

In [ ]:
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges[urbanaccess_net.net_edges['unique_agency_id']=='bay_area_rapid_transit'],
                 bbox=bbox,
                 fig_height=30, margin=0.02,
                 edge_color='#999999', edge_linewidth=1, edge_alpha=1,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

Add average headways to network travel time

Calculate route stop level headways

The network we have generated so far only contains pure travel times. UrbanAccess allows for the calculation of and addition of route stop level average headways to the network. This is used as a proxy for passenger wait times at stops and stations. The route stop level average headway are added to the pedestrian to transit connector edges.

Let's calculate headways for the same AM Peak time period. Statistics on route stop level headways will be added to your GTFS transit data object inside of headways


In [ ]:
ua.gtfs.headways.headways(gtfsfeeds_df=loaded_feeds,
                          headway_timerange=['07:00:00','10:00:00'])

In [ ]:
loaded_feeds.headways.head()

Add the route stop level average headways to your integrated network

Now that headways have been calculated and added to your GTFS transit feed object, you can use them to generate a new integrated network that incorporates the headways within the pedestrian to transit connector edge travel times.


In [ ]:
ua.network.integrate_network(urbanaccess_network=urbanaccess_net,
                             headways=True,
                             urbanaccess_gtfsfeeds_df=loaded_feeds,
                             headway_statistic='mean')

Integrated network by travel time with average headways


In [ ]:
edgecolor = ua.plot.col_colors(df=urbanaccess_net.net_edges, col='weight', cmap='gist_heat_r', num_bins=5)
ua.plot.plot_net(nodes=urbanaccess_net.net_nodes,
                 edges=urbanaccess_net.net_edges,
                 bbox=bbox,
                 fig_height=30, margin=0.02,
                 edge_color=edgecolor, edge_linewidth=1, edge_alpha=0.7,
                 node_color='black', node_size=0, node_alpha=1, node_edgecolor='none', node_zorder=3, nodes_only=False)

Using an UrbanAccess network with Pandana

Pandana (Pandas Network Analysis) is a tool to compute network accessibility metrics.

Now that we have an integrated transit and pedestrian network that has been formatted for use with Pandana, we can now use Pandana right away to compute accessibility metrics.

There are a couple of things to remember about UrbanAccess and Pandana:

  • UrbanAccess generates by default a one way network. One way means there is an explicit edge for each direction in the edge table. Where applicable, it is important to set any Pandana two_way parameters to False (they are True by default) to indicate that the network is a one way network.
  • As of Pandana v0.3.0, node ids and from and to columns in your network must be integer type and not string. UrbanAccess automatically generates both string and integer types so use the from_int and to_int columns in edges and the index in nodes id_int.
  • UrbanAccess by default will generate edge weights that represent travel time in units of minutes.

For more on Pandana see the:

Pandana repo: https://github.com/UDST/pandana

Pandana documentation: http://udst.github.io/pandana/

Load Census block data

Let's load 2010 Census block data for the 9 county Bay Area. Note: These data have been processed from original Census and LEHD data.

The data is located in the demo folder on the repo with this notebook.


In [ ]:
blocks = pd.read_hdf('bay_area_demo_data.h5','blocks')
# remove blocks that contain all water
blocks = blocks[blocks['square_meters_land'] != 0]
print('Total number of blocks: {:,}'.format(len(blocks)))
blocks.head()

Let's subset the Census data to just be the bounding box for Oakland


In [ ]:
lng_max, lat_min, lng_min, lat_max = bbox
outside_bbox = blocks.loc[~(((lng_max < blocks["x"]) & (blocks["x"] < lng_min)) & ((lat_min < blocks["y"]) & (blocks["y"] < lat_max)))]
blocks_subset = blocks.drop(outside_bbox.index)
print('Total number of subset blocks: {:,}'.format(len(blocks_subset)))

In [ ]:
blocks_subset.plot(kind='scatter', x='x', y='y', s=0.1)

Initialize the Pandana network

Let's initialize our Pandana network object using our transit and pedestrian network we created. Note: the from_int and to_int as well as the twoway=False denoting this is a explicit one way network.


In [ ]:
s_time = time.time()
transit_ped_net = pdna.Network(urbanaccess_net.net_nodes["x"],
                               urbanaccess_net.net_nodes["y"],
                               urbanaccess_net.net_edges["from_int"],
                               urbanaccess_net.net_edges["to_int"],
                               urbanaccess_net.net_edges[["weight"]], 
                               twoway=False)
print('Took {:,.2f} seconds'.format(time.time() - s_time))

Now let's set our blocks on to the network


In [ ]:
blocks_subset['node_id'] = transit_ped_net.get_node_ids(blocks_subset['x'], blocks_subset['y'])

Calculate cumulative accessibility

Now let's compute an accessibility metric, in this case a cumulative accessibility metric. See Pandana for other metrics that can be calculated.

Let's set the block variables we want to use as our accessibly metric on the Pandana network. In this case let's use jobs


In [ ]:
transit_ped_net.set(blocks_subset.node_id, variable = blocks_subset.jobs, name='jobs')

Now let's run an cumulative accessibility query using our network and the jobs variable for three different travel time thresholds: 15, 30, 45 minutes.

Note: Depending on network size, radius threshold, computer processing power, and whether or not you are using multiple cores the compute process may take some time.


In [ ]:
s_time = time.time()
jobs_45 = transit_ped_net.aggregate(45, type='sum', decay='linear', name='jobs')
jobs_30 = transit_ped_net.aggregate(30, type='sum', decay='linear', name='jobs')
jobs_15 = transit_ped_net.aggregate(15, type='sum', decay='linear', name='jobs')
print('Took {:,.2f} seconds'.format(time.time() - s_time))

Quickly visualize the accessibility query results. As expected, note that a travel time of 15 minutes results in a lower number of jobs accessible at each network node.


In [ ]:
print(jobs_45.head())
print(jobs_30.head())
print(jobs_15.head())

Jobs accessible within 15 minutes

Note how the radius of the number of jobs accessible expands as the time threshold increases where high accessibility is indicated in dark red. You can easily see downtown Oakland has the highest accessibility due to a convergence of transit routes and because downtown is where the majority of jobs in the area are located. Other high accessibility areas are visible elsewhere directly adjacent to BART metro rail stations of West Oakland, Fruitvale, and Coliseum and AC Transit bus routes on the main arterial road corridors.


In [ ]:
s_time = time.time()
transit_ped_net.plot(jobs_15, 
                    plot_type='scatter',
                    fig_kwargs={'figsize':[20,20]},
                    bmap_kwargs={'epsg':'26943','resolution':'h'},
                    plot_kwargs={'cmap':'gist_heat_r','s':4,'edgecolor':'none'})
print('Took {:,.2f} seconds'.format(time.time() - s_time))

Jobs accessible within 30 minutes


In [ ]:
s_time = time.time()
transit_ped_net.plot(jobs_30, 
                    plot_type='scatter',
                    fig_kwargs={'figsize':[20,20]},
                    bmap_kwargs={'epsg':'26943','resolution':'h'},
                    plot_kwargs={'cmap':'gist_heat_r','s':4,'edgecolor':'none'})
print('Took {:,.2f} seconds'.format(time.time() - s_time))

Jobs accessible within 45 minutes


In [ ]:
s_time = time.time()
transit_ped_net.plot(jobs_45, 
                    plot_type='scatter',
                    fig_kwargs={'figsize':[20,20]},
                    bmap_kwargs={'epsg':'26943','resolution':'h'},
                    plot_kwargs={'cmap':'gist_heat_r','s':4,'edgecolor':'none'})
print('Took {:,.2f} seconds'.format(time.time() - s_time))

In [ ]:


In [ ]:


In [ ]:


In [ ]: