In [1]:
%matplotlib inline
from IPython.display import HTML
import pandas as pd
import matplotlib.pyplot as plt
import sys
sys.path.insert(0, '../src/processing/')
import tools
If you've ever ridden the bus, you've probably had the following experience. You're standing at the bus stop waiting for the bus to come. You've been waiting over ten minutes. You're running late to an important meeting, a date, or maybe a house warming party you didn't really want to go to. Some people at the bus stop are commenting out loud about how this bus is always late. Others are checking their transit apps or Google maps and complaining how the apps always lie: "it said the bus was coming in five minutes, five minutes ago!" One guy keeps stepping into the street every 30 seconds and staring down oncoming traffic to check for the bus. Suddenly, you see the bus appear in the distance. As the bus gets closer you realize, there's not just one bus, but two buses! (If it's a strange day, maybe there is a caravan of three buses arriving together or in close succession). If you're lucky, you witnessed this from the bus stop across the street. This phenomenon is appropriately called bus bunching.
Bus bunching is when two or more buses on the same route arrive at a bus stop together or in close succession. Bus bunching occurs when the headway, or the temporal distance, between two buses is sufficiently small. What counts as "sufficiently small" depends on the context. There's no standard threshold headway used to define bus bunching. On a high frequency route, two buses may be considered bunched if the headway between them is 2 minutes or less. On a lower frequency route, however, two buses may be considered bunched at headways over 5 minutes.
Bus bunching has a number of possible causes:
Determining when and where bus bunching occurs along a route is critical for transit planners working to improve service.
In this notebook, I visualize bus bunching for eastbound trips on CTA Route 73 Armitage. To determine if buses are bunched, I analyze a dataset of observed eastbound headways at each stop along the route for the month of March 2019. I (arbitrarily) count a pair of buses as bunched if their headway is less than 2 minutes.
My primary goal is to learn where bus bunching tends to occur along Route 73 Armitage. I'm also interested in learning if bus bunching occurs more often at certain times of the day. My plan is to create a series of heatmap-like visualizations indexed by time that highlight the bus stops at which the most bunching incidents were observed during the data collection period. The idea is for the heatmap to be visually similar to Google Traffic that communicates traffic conditions as a colored overlay on city streets.
To create something like a heatmap, I need to load the geospatial data for the bus route (the route pattern), subdivide it into smaller segments, and color each according to the number of bunching incidents observed on that segment.
First, I load the pattern data for Route 73 Armitage. Each row represents either a bus stop (typ == 'S'
) or an intermediate waypoint (typ == 'W'
). I load the patterns with the waypoints, since they will allow me to create a more accurate image of the route. For simplicity, I will only work with one pattern: pid 2170 (eastbound trips between Grand & Latrobe and Clark & North).
In [2]:
patterns = tools.load_patterns(73, waypoints=True)
patterns = patterns[patterns.pid == 2170]
patterns.head()
Out[2]:
We can get an idea of the shape of the route pattern by creating a LineString object from the latitude and longitude of each of its stops and waypoints using the Shapely library.
In [3]:
from shapely.geometry import LineString
LineString(zip(patterns.lon, patterns.lat))
Out[3]:
A logical way to divide the pattern is into segments with end points at adjacent bus stops labeled with the name of the stop visited earlier in the sequence. Any waypoints falling between the two bus stops should be included in the segment, so as to preserve the route's shape. It is possible that some segments won't be straight line, and that is okay. Finally, I can create a LineString from the coordinates of the segment's stops and waypoints, so that they can be mapped.
To accomplish the above for each pair of adjacent stops A and B, I will:
Once the above steps are accomplished, I can load the geometry into a GeoDataFrame
so that it can be plotted.
Conveniently, waypoints have null-valued stpnm and stpid. I make sure the DataFrame
is sorted in sequence order, and then apply the forward fill method .ffill()
to populate the null values in those fields with the name and id of the preceding stop. I drop any remaining rows with null values, i.e. any points along the route before the first stop.
In [4]:
patterns.sort_values('seq')
patterns.ffill(inplace=True)
patterns.dropna(inplace=True)
patterns.head()
Out[4]:
Next, I populate the column lon_lat with one-item lists containing a tuple of each row's latitude and longitude. Grouping the DataFrame
by stpid and applying sum
to the lon_lat column concatenates the one-item coordinate lists together.
In [5]:
patterns['lon_lat'] = [[xy] for xy in zip(patterns.lon, patterns.lat)]
grouped = patterns.groupby(['stpid']).lon_lat.agg('sum').to_frame().reset_index()
grouped.head()
Out[5]:
I join this result back to the patterns DataFrame
.
In [6]:
merged = patterns.merge(grouped, on='stpid')
merged.head()
Out[6]:
I drop the waypoints from the DataFrame
, since they are no longer needed. I also rename some of the columns.
In [7]:
merged = merged[merged.typ != "W"]
merged.rename({'lon_lat_y': 'coordinates', 'lon_lat_x': 'lon_lat'}, inplace=True, axis='columns')
merged.head()
Out[7]:
The coordinate lists include all of the points up to but not including the final end point. To include the final endpoint, I concatenate the coordinates of each bus stop (lon_lat
) to the coordinate list (coordinates
) in the row above it with the help of the .shift()
method and the +
operator. Passing -1 to .shift()
shifts each row backward by 1, so that the 0th row is removed, the 1st row becomes the 0th row, and so on, and the last row becomes null. When the +
operator is applied to two Series of lists, it concatenates the lists in each Series. Notice that a non-null value + NaN = NaN
.
In [8]:
merged.coordinates = (merged.coordinates + merged.lon_lat.shift(-1))
merged.tail()
Out[8]:
In [9]:
import geopandas as gpd
merged.dropna(inplace=True)
geometry = [LineString(xys) for xys in merged.coordinates]
gdf = gpd.GeoDataFrame(merged, geometry=geometry)
gdf.drop(['lon', 'lat', 'lon_lat', 'coordinates'], inplace=True, axis=1)
gdf.head()
Out[9]:
In [10]:
gdf.plot().set_axis_off()
In [11]:
travels_waits = tools.load_travels_waits(73, "Eastbound", "201903")
travels_waits.head()
Out[11]:
In [12]:
patterns[patterns.stpid.isin(["15417", "4040"])]
Out[12]:
I count the number of wait times under 2 minutes at each stop for eastbound buses traveling the full length of the route to Clark & North (as opposed to Armitage & Pulaski, which is only a small portionof the route). I then perform an attribute join of the geospatial data with the counts.
In [13]:
bunching = travels_waits[travels_waits["wait|15417"] < 2].groupby('origin').count().tatripid.to_frame().reset_index().rename({'tatripid': 'counts'}, axis='columns')
gdf.stpid = gdf.stpid.astype(int)
bunching_gdf = gdf.merge(bunching, left_on="stpid", right_on="origin")
bunching_gdf.head()
Out[13]:
In [14]:
bunching_gdf.plot(column="counts", legend=True, cmap='YlGnBu', linewidth=5).set_axis_off()
The number of bunching incidents increases as the buses travel further east. This result is intuitive: there are more opportunities for the eastbound buses to get off schedule the further they travel along the route. Once a group of buses become bunched, it may be difficult for them to become unbunched, unless one of the drivers take deliberate action to do so, e.g. stopping and waiting for several minutes.
The heatmap gives a general idea of where bunching tends to occur, but it would be helpful if it had some more context. Luckily, the contextily package makes it easy to add background maps to plots created with geopandas. Read a short tutorial here.
In [15]:
import contextily as ctx
def add_basemap(ax, zoom, url='http://tile.stamen.com/toner/tileZ/tileX/tileY.png'):
xmin, xmax, ymin, ymax = ax.axis()
basemap, extent = ctx.bounds2img(xmin, ymin, xmax, ymax, zoom=zoom, url=url)
ax.imshow(basemap, extent=extent, interpolation='bilinear')
ax.axis((xmin, xmax, ymin, ymax))
return basemap
f, ax = plt.subplots(1, figsize=(15, 8))
ax.axis([-9770000, -9750000, 5145000, 5155000])
bunching_gdf.crs = {'init': 'epsg:4326'}
bunching_gdf = bunching_gdf.to_crs(epsg=3857)
bunching_gdf.plot(column="counts", ax=ax, legend=True, cmap='Reds', linewidth=5)
ax.set_axis_off()
add_basemap(ax, zoom=12)
plt.show()
To see how bunching varies throughout the day, I count the number of bunching incidents over different time periods and create a map for each. The CTA defines four weekday service intervals that I use:
In [16]:
cta_time_periods = [6, 9, 15, 19, 22]
cta_time_period_labels = ["AM Peak", "Midday", "PM Peak", "Evening"]
travels_waits["bin"] = pd.cut(travels_waits.decimal_time, cta_time_periods, labels=cta_time_period_labels, right=False)
binned_bunching = travels_waits[travels_waits["wait|15417"] < 2].groupby(['origin', 'bin']).count().tatripid.to_frame().reset_index().rename({'tatripid': 'counts'}, axis='columns')
binned_bunching.counts = binned_bunching.counts.fillna(0)
binned_gdf = gdf.merge(binned_bunching, left_on="stpid", right_on="origin")
binned_gdf.crs = {'init': 'epsg:4326'}
binned_gdf = binned_gdf.to_crs(epsg=3857)
In [17]:
fig, ax = plt.subplots(2, 2, figsize=(16, 8))
fig.suptitle("Eastbound Armitage Bus Bunching")
bin_names = [["AM Peak", "Midday"], ["PM Peak", "Evening"]]
vmin = 0
vmax = binned_gdf.counts.max()
for i, r in enumerate(bin_names):
for j, c in enumerate(r):
_ax = ax[i][j]
binned_gdf[binned_gdf.bin == c].plot(column="counts", ax=_ax, cmap='Reds', linewidth=5, vmin=vmin, vmax=vmax)
_ax.axis([-9770000, -9750000, 5145000, 5155000])
_ax.title.set_text(c)
_ax.set_axis_off()
add_basemap(_ax, zoom=12)
cax = fig.add_axes([0.9, 0.1, 0.03, 0.8])
sm = plt.cm.ScalarMappable(cmap='Reds', norm=plt.Normalize(vmin=vmin, vmax=vmax))
sm._A = []
fig.colorbar(sm, cax=cax)
plt.show()
Most bunching incidents for eastbound trips were observed during morning rush hour. Again, this result is intuitive: during morning rush hour, there is more traffic congestion in eastbound lanes as commuter travel downtown, so it is more likely for the buses to get off schedule.
These observations are for the entire month of March 2019. Notice that during this period around 30 bunching incidents occured at bus stops on the eastern edge of the route during morning rush hour. In other words, an average of one busing incident was observed each day at some stops.