In [1]:
from IPython.display import HTML
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

import geopandas as gpd
from shapely.geometry import Point

import sys
sys.path.insert(0, '../src/processing/')
import tools
import os.path
import definitions

Introduction

With the large volume of CTA bus location data I have collected so far in 2019, I want to develop a process for analyzing the data on a neighborhood and city level. My aim is to measure and compare the quality of bus service within each Chicago community area. In this notebook, I begin to develop a process for neighborhood-level analysis by working with data for the bus routes that service Logan Square, specifically restricting my analysis to the geopositional data and bus stops within the boundaries of the neighborhood. To start, I estimate the average wait times for each bus route servicing Logan Square, as well as the proportion of the trips on each route that experience bus bunching in the neighborhood.

Logan Square is served by the following bus routes:

  • 49 Western
  • X49 Western Express
  • 52 Kedzie/California
  • 53 Pulaski
  • 56 Milwaukee
  • 73 Armitage
  • 74 Fullerton
  • 76 Diversey
  • 82 Kimball-Homan

I use data from 73 Armitage to develop the initial data processing workflow and then combine the code into reusable functions.

Load geospatial data for Chicago's community area boundaries into a GeoDataFrame

The geospatial boundaries of Chicago's 77 community areas can be found at the Chicago Data Portal.


In [2]:
commareas = gpd.read_file("../data/raw/geofences/Boundaries - Community Areas (current).geojson")
commareas.plot()
commareas.head()


Out[2]:
community area shape_area perimeter area_num_1 area_numbe comarea_id comarea shape_len geometry
0 DOUGLAS 0 46004621.1581 0 35 35 0 0 31027.0545098 (POLYGON ((-87.60914087617894 41.8446925026539...
1 OAKLAND 0 16913961.0408 0 36 36 0 0 19565.5061533 (POLYGON ((-87.59215283879394 41.8169293462668...
2 FULLER PARK 0 19916704.8692 0 37 37 0 0 25339.0897503 (POLYGON ((-87.62879823733725 41.8018930336891...
3 GRAND BOULEVARD 0 48492503.1554 0 38 38 0 0 28196.8371573 (POLYGON ((-87.6067081256125 41.81681377057218...
4 KENWOOD 0 29071741.9283 0 39 39 0 0 23325.1679062 (POLYGON ((-87.59215283879394 41.8169293462668...

I'm only concerned about the boundaries of Logan Square for the rest of the notebook.


In [3]:
losq = commareas[commareas.community == "LOGAN SQUARE"]
losq.plot()


Out[3]:
<matplotlib.axes._subplots.AxesSubplot at 0x15f5e828>

Load the bus patterns into a GeoDataFrame

I load pattern data for Route 73 Armitage, which details the locations of each of its bus stops. The plan is to identify only those bus stops which are in Logan Square by performing a spatial join on this data set with losq. Since the raw pattern data is not stored in a geospatial file format---they are regular JSON files---I cannot directly load it into a GeoDataFrame. I first load it into a regular DataFrame, then create geometry objects using the latitude and longitudes columns, and finally load the DataFrame and geometry into a GeoDataFrame.


In [4]:
patterns = tools.load_patterns(73, False)
patterns.head()


Out[4]:
lat lon pdist seq stpid stpnm typ ln pid rtdir
0 41.913547 -87.633030 287.0 4 1451 Clark & Lasalle S 36853.0 2169 Westbound
1 41.915565 -87.634193 1024.0 6 1905 Clark & Lincoln S 36853.0 2169 Westbound
2 41.916687 -87.634905 1470.0 8 1906 Clark & Wisconsin S 36853.0 2169 Westbound
3 41.918220 -87.636368 2359.0 14 13180 Armitage & Clark S 36853.0 2169 Westbound
4 41.918212 -87.639118 3097.0 17 4084 Armitage & Lincoln/Sedgwick S 36853.0 2169 Westbound

In [5]:
geometry = [Point(xy) for xy in zip(patterns.lon, patterns.lat)]
patterns_gdf = gpd.GeoDataFrame(patterns, geometry=geometry, crs={'init': 'epsg:4326'})
patterns_gdf.plot()
patterns_gdf.head()


Out[5]:
lat lon pdist seq stpid stpnm typ ln pid rtdir geometry
0 41.913547 -87.633030 287.0 4 1451 Clark & Lasalle S 36853.0 2169 Westbound POINT (-87.633029999999 41.913547)
1 41.915565 -87.634193 1024.0 6 1905 Clark & Lincoln S 36853.0 2169 Westbound POINT (-87.634193000002 41.915564999999)
2 41.916687 -87.634905 1470.0 8 1906 Clark & Wisconsin S 36853.0 2169 Westbound POINT (-87.634904999998 41.916687)
3 41.918220 -87.636368 2359.0 14 13180 Armitage & Clark S 36853.0 2169 Westbound POINT (-87.63636800000199 41.918219999999)
4 41.918212 -87.639118 3097.0 17 4084 Armitage & Lincoln/Sedgwick S 36853.0 2169 Westbound POINT (-87.63911799999801 41.918212000001)

Perform a spatial join on the two geometry objects

Intersect the bus stop geometry (Points) with the neighborhood geometry (Polygon) to select only those bus stops in Logan Square.


In [6]:
losq_stops = gpd.sjoin(patterns_gdf, losq, how="inner", op="intersects")
losq_stops.plot()


Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0xea35630>

Load the travel and wait times into a DataFrame

I load the travel and wait times data for Route 73 Armitage (and abitrarily choose to just work with the data for westbound trips in January for now).


In [7]:
travels_waits = tools.load_travels_waits(73, "Westbound", "201901")
travels_waits.head()


Out[7]:
1451 1905 1906 13180 4084 4085 4086 4087 4088 4089 ... 4137 4138 3883 14179 start_date pid tatripid decimal_time wait|14179 origin
0 NaN 0.52 0.85 1.60 2.53 2.85 3.30 3.77 4.28 4.87 ... 33.48 34.22 34.98 35.15 2019-01-01 2169 2 7.05 49.65 1451
1 NaN 0.42 0.77 1.32 1.80 2.33 3.02 3.43 3.90 4.53 ... 27.63 28.23 NaN NaN 2019-01-01 2169 8 7.87 14.37 1451
2 NaN 0.72 1.00 2.17 3.57 3.97 4.52 5.05 6.32 7.92 ... 33.40 34.17 34.92 35.07 2019-01-01 2169 9 8.12 20.97 1451
3 NaN 0.63 1.02 1.75 2.27 2.70 3.30 3.83 4.40 5.10 ... 32.50 32.90 33.47 33.75 2019-01-01 2169 10 8.47 23.68 1451
4 NaN 0.20 0.38 0.82 1.37 1.73 2.17 2.55 3.10 4.12 ... 28.75 29.62 37.27 37.45 2019-01-01 2169 14 8.85 25.22 1451

5 rows × 65 columns

Because bus service varies throughout the day, it makes sense to analyze bus service during different time intervals. The CTA defines four weekday service intervals in its Service Standards and Policies document:

  • AM Peak: 6AM-9AM
  • Midday: 9AM-3PM
  • PM Peak: 3PM-6PM
  • Evening: 7PM-10PM

I also go ahead and drop all of the rows where the origin bus stop is not within Logan Square.


In [8]:
cta_time_periods = [6, 9, 15, 19, 22]
cta_time_period_labels = ["AM Peak", "Midday", "PM Peak", "Evening"]

travels_waits["bins"] = pd.cut(travels_waits.decimal_time, cta_time_periods, labels=cta_time_period_labels, right=False)
travels_waits.drop(travels_waits[~travels_waits.origin.isin(losq_stops.stpid)].index, inplace=True)
travels_waits.head()


Out[8]:
1451 1905 1906 13180 4084 4085 4086 4087 4088 4089 ... 4138 3883 14179 start_date pid tatripid decimal_time wait|14179 origin bins
27526 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 24.07 24.83 25.00 2019-01-01 2169 2 7.22 19.72 4095 AM Peak
27527 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 19.62 20.47 20.72 2019-01-01 2169 7 7.53 28.32 4095 AM Peak
27528 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 19.70 NaN NaN 2019-01-01 2169 8 8.02 18.57 4095 AM Peak
27529 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 21.43 22.18 22.33 2019-01-01 2169 9 8.32 17.25 4095 AM Peak
27530 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... 23.88 24.45 24.73 2019-01-01 2169 10 8.62 22.60 4095 AM Peak

5 rows × 66 columns

Calculate the average wait time for an Armitage bus in Logan Square (by direction of travel and time of day)

As a bus completes its route, the wait time between it and the next bus will vary from stop to stop. First, I find the average wait time at the Logan Square bus stops for each Armitage bus trip. Then I average over those averages to find the average wait time for the Armitage bus in Logan Square. As explained in the documentation, each trip is uniquely identified by its "start_date", "pid", and "tatripid". I further aggregate the data by time of day.

Note: For each departure from an origin stop, the travels_waits DataFrame gives the wait time (in minutes) for the next bus arrival heading to a particular terminal stop. Since some routes have more than one terminus, there may be more than one calculated wait time. Each column "wait|STPID_1", ..., "wait|STPID_N" gives the wait time for the next bus heading to some terminal with stpid STPID_M.


In [9]:
terminals = travels_waits.columns[travels_waits.columns.str.contains("wait\|")]
trip_means = travels_waits.groupby(["start_date", "pid", "tatripid", "bins"])[terminals].mean().reset_index()

The travels_waits data files are already separated by direction of travel. Explicitly naming and grouping by the direction of travel is just for convenience when reading the printed output.


In [10]:
trip_means["rtdir"] = "Westbound"
trip_means.groupby(["rtdir", "bins"])[terminals].mean()
print trip_means.groupby(["rtdir", "bins"])[terminals].mean()


                   wait|14179
rtdir     bins               
Westbound AM Peak   13.886208
          Midday    16.265976
          PM Peak   14.409311
          Evening   21.294015

As expected, the morning and evening rush service has the shortest wait times, while service in the later evening has the longest wait times. The Armitage bus is classified as a "Support Route" by the CTA, so it has longer wait times and a narrower window of service than "Key Routes" like Route 74 Fullerton.

Calculate the proportion of bunched trips

To find the proportion of trips that were "bunched", first count the number of bunching incidents during each time interval, then countx the total number of trips during each time interval, and divide the two results.

I arbitrarily classify any trip as bunched if the wait time between it and the next bus (heading toward or through the same terminal stop) is less than 2 minutes. In the future, I could define bus bunching in a more nuanced way--such as defining the bunching threshold in terms of the route's scheduled wait times--but this definition works for a first-round exploration and analysis.


In [11]:
# bunching incidents
masked = trip_means.copy()
masked[terminals] = (masked[terminals] < 2)
masked.groupby(["rtdir", "bins"])[terminals].sum()


Out[11]:
wait|14179
rtdir bins
Westbound AM Peak 4.0
Midday 3.0
PM Peak 6.0
Evening 0.0

In [12]:
# total number of trips
trip_means.groupby(["rtdir", "bins"])[terminals].count()


Out[12]:
wait|14179
rtdir bins
Westbound AM Peak 402
Midday 738
PM Peak 568
Evening 223

In [13]:
# proportion of trips that were bunched
masked.groupby(["rtdir", "bins"])[terminals].sum() / trip_means.groupby(["rtdir", "bins"])[terminals].count()


Out[13]:
wait|14179
rtdir bins
Westbound AM Peak 0.009950
Midday 0.004065
PM Peak 0.010563
Evening 0.000000

Route 73 experiences very little bus bunching under my definition. This shouldn't be surprising: the headways throughout the day on Route 73 are all over 13 minutes on average.

Combine the above code to process all Logan Square bus routes


In [14]:
# Travels/waits data files are separated by route and month
dates = [
    "201901",
    "201902",
    "201903",
    "201904",
]

# Bus routes going through Logan Square
rts = ["49","X49","52","53","56","73","74","76","82"]

cta_holidays = pd.DatetimeIndex(definitions.HOLIDAYS)
cta_time_periods = [6,9,15,19,22]
cta_time_period_labels = ["AM Peak", "Midday", "PM Peak", "Evening"]

def load_community_area(path, ca):
    commareas = gpd.read_file(path)
    return commareas[commareas.community == ca.upper()]
    
def patterns_to_gdf(rt, waypoints):
    patterns = tools.load_patterns(rt, waypoints)
    geometry = [Point(xy) for xy in zip(patterns.lon, patterns.lat)]
    return gpd.GeoDataFrame(patterns, geometry=geometry, crs={'init': 'epsg:4326'})

In [15]:
losq_mean_waits = []
losq_bunching = []

losq = load_community_area("../data/raw/geofences/Boundaries - Community Areas (current).geojson", "LOGAN SQUARE")
for rt in rts:
    patterns_gdf = patterns_to_gdf(rt, False)
    losq_stops = gpd.sjoin(patterns_gdf, losq, how="inner", op="intersects")

    colname_map = {"wait|{}".format(stpid): stpnm for stpid, stpnm in zip(patterns_gdf.stpid, patterns_gdf.stpnm)}
    for rtdir in losq_stops.rtdir.unique():      
        tws = pd.concat((tools.load_travels_waits(rt, rtdir.lower(), d) for d in dates), ignore_index=True)

        # I further restrict the analysis to non-holiday weekdays,
        # since frequency of service is different on holidays and weekends
        tws.drop(tws[(tws.start_date.dt.dayofweek >= 5) | tws.start_date.isin(cta_holidays)].index, inplace=True)
        tws.drop(tws[~tws.origin.isin(losq_stops.stpid)].index, inplace=True)
        tws.dropna(axis='columns', how='all', inplace=True)

        terminals = [colname_map[col] for col in tws.columns[tws.columns.str.contains("wait\|")]]
        tws.rename(columns=colname_map, inplace=True)

        tws["bins"] = pd.cut(tws.decimal_time, cta_time_periods, labels=cta_time_period_labels, right=False)

        rt_waits = tws.groupby(["start_date", "pid", "tatripid", "bins"])[terminals].mean().reset_index()
        rt_waits["rtdir"] = rtdir

        bunched = rt_waits.copy()
        bunched[terminals] = (bunched[terminals] < 2)
        bunching_incidents = bunched.groupby(["rtdir", "bins"])[terminals].sum()
        tot_trips = rt_waits.groupby(["rtdir", "bins"])[terminals].count()

        rt_mean_waits = rt_waits.groupby(["rtdir", "bins"])[terminals].mean()
        rt_bunching = bunching_incidents / tot_trips
        losq_mean_waits.append((rt, rtdir, rt_mean_waits))
        losq_bunching.append((rt, rtdir, rt_bunching))
        
        print rt, rtdir
        print rt_mean_waits
        print "bus bunching statistics"
        print rt_bunching


49 Northbound
                    Western & Berwyn Terminal
rtdir      bins                              
Northbound AM Peak                  12.695774
           Midday                    8.930839
           PM Peak                  10.082905
           Evening                  11.040389
bus bunching statistics
                    Western & Berwyn Terminal
rtdir      bins                              
Northbound AM Peak                   0.030276
           Midday                    0.097345
           PM Peak                   0.116848
           Evening                   0.133238
49 Southbound
                    79th & Western Terminal
rtdir      bins                            
Southbound AM Peak                13.490671
           Midday                  8.496942
           PM Peak                12.678930
           Evening                11.727100
bus bunching statistics
                    79th & Western Terminal
rtdir      bins                            
Southbound AM Peak                 0.003854
           Midday                  0.047357
           PM Peak                 0.018231
           Evening                 0.020101
X49 Southbound
                    79th & Western Terminal
rtdir      bins                            
Southbound AM Peak                 7.507194
           Midday                 67.735297
           PM Peak                 8.166527
           Evening                      NaN
bus bunching statistics
                    79th & Western Terminal
rtdir      bins                            
Southbound AM Peak                 0.088563
           Midday                  0.046341
           PM Peak                 0.061996
           Evening                      NaN
X49 Northbound
                    Western & Berwyn Terminal
rtdir      bins                              
Northbound AM Peak                   7.440262
           Midday                   69.266782
           PM Peak                   9.612303
           Evening                        NaN
bus bunching statistics
                    Western & Berwyn Terminal
rtdir      bins                              
Northbound AM Peak                   0.148706
           Midday                    0.030952
           PM Peak                   0.101132
           Evening                        NaN
52 Northbound
                    Rockwell & Addison
rtdir      bins                       
Northbound AM Peak           10.552524
           Midday            13.119601
           PM Peak           12.216943
           Evening           17.606454
bus bunching statistics
                    Rockwell & Addison
rtdir      bins                       
Northbound AM Peak            0.056138
           Midday             0.036562
           PM Peak            0.062542
           Evening            0.010063
52 Southbound
                    Kedzie & Van Buren   Kedzie + 63rd Place Terminal
rtdir      bins                                                      
Southbound AM Peak            10.655686                     12.767064
           Midday             13.213829                     13.736956
           PM Peak            11.939582                     11.939582
           Evening            15.694450                     18.756760
bus bunching statistics
                    Kedzie & Van Buren   Kedzie + 63rd Place Terminal
rtdir      bins                                                      
Southbound AM Peak             0.018433                      0.003687
           Midday              0.003779                      0.001967
           PM Peak             0.011407                      0.011407
           Evening             0.000000                      0.000000
53 Southbound
                    Pulaski + Erie-C.T.A Driveway  31st Street + Komensky  \
rtdir      bins                                                             
Southbound AM Peak                      10.575737               11.875798   
           Midday                       10.859429               11.591608   
           PM Peak                       8.401866                9.988490   
           Evening                      13.343510               16.699532   

                    Pulaski & Harrison (Blue Line)  
rtdir      bins                                     
Southbound AM Peak                        1052.400  
           Midday                          724.250  
           PM Peak                         443.110  
           Evening                         328.805  
bus bunching statistics
                    Pulaski + Erie-C.T.A Driveway  31st Street + Komensky  \
rtdir      bins                                                             
Southbound AM Peak                       0.038951                0.032086   
           Midday                        0.036101                0.025740   
           PM Peak                       0.087056                0.060606   
           Evening                       0.029438                0.014574   

                    Pulaski & Harrison (Blue Line)  
rtdir      bins                                     
Southbound AM Peak                             0.0  
           Midday                              0.0  
           PM Peak                             0.0  
           Evening                             0.0  
53 Northbound
                    Irving Park + Keystone  Pulaski & Peterson Terminal
rtdir      bins                                                        
Northbound AM Peak             1027.750000                    13.860107
           Midday               776.987222                    10.679169
           PM Peak              479.183333                     9.693377
           Evening              308.427500                    16.882923
bus bunching statistics
                    Irving Park + Keystone  Pulaski & Peterson Terminal
rtdir      bins                                                        
Northbound AM Peak                     0.0                     0.008214
           Midday                      0.0                     0.068218
           PM Peak                     0.0                     0.076356
           Evening                     0.0                     0.027086
56 Northbound
                    Jefferson Park Transit Center
rtdir      bins                                  
Northbound AM Peak                      14.442170
           Midday                       16.035613
           PM Peak                      11.465768
           Evening                      16.510917
bus bunching statistics
                    Jefferson Park Transit Center
rtdir      bins                                  
Northbound AM Peak                       0.026263
           Midday                        0.009075
           PM Peak                       0.040024
           Evening                       0.015945
56 Southbound
                    Washington & Canal  \
rtdir      bins                          
Southbound AM Peak            7.742533   
           Midday            15.724159   
           PM Peak           12.509747   
           Evening           21.465109   

                    Washington + Wabash (Nearside Garland)  Milwaukee & Kedzie  
rtdir      bins                                                                 
Southbound AM Peak                                7.742533            7.380870  
           Midday                                15.745689           15.910790  
           PM Peak                               13.568509           12.525521  
           Evening                               21.465109           21.801350  
bus bunching statistics
                    Washington & Canal  \
rtdir      bins                          
Southbound AM Peak            0.083519   
           Midday             0.002226   
           PM Peak            0.034593   
           Evening            0.000000   

                    Washington + Wabash (Nearside Garland)  Milwaukee & Kedzie  
rtdir      bins                                                                 
Southbound AM Peak                                0.083519            0.130341  
           Midday                                 0.002228            0.002924  
           PM Peak                                0.013908            0.048070  
           Evening                                0.000000            0.001727  
73 Westbound
                   Grand & Latrobe Terminal
rtdir     bins                             
Westbound AM Peak                 12.020787
          Midday                  14.697785
          PM Peak                 12.542995
          Evening                 20.430271
bus bunching statistics
                   Grand & Latrobe Terminal
rtdir     bins                             
Westbound AM Peak                  0.016681
          Midday                   0.007568
          PM Peak                  0.010152
          Evening                  0.000000
73 Eastbound
                   Clark & North
rtdir     bins                  
Eastbound AM Peak      10.138362
          Midday       15.785476
          PM Peak      13.928326
          Evening      23.377171
bus bunching statistics
                   Clark & North
rtdir     bins                  
Eastbound AM Peak       0.014936
          Midday        0.001088
          PM Peak       0.003472
          Evening       0.000000
74 Eastbound
                   Fullerton & Halsted
rtdir     bins                        
Eastbound AM Peak            14.221409
          Midday             10.685474
          PM Peak             9.525862
          Evening            16.833366
bus bunching statistics
                   Fullerton & Halsted
rtdir     bins                        
Eastbound AM Peak             0.049333
          Midday              0.021542
          PM Peak             0.044010
          Evening             0.003352
c:\python27\lib\site-packages\pandas\core\reshape\concat.py:228: DtypeWarning: Columns (64) have mixed types. Specify dtype option on import or set low_memory=False.
  copy=copy, sort=sort)
74 Westbound
                   Fullerton & Narragansett  Grand & Nordica Terminal
rtdir     bins                                                       
Westbound AM Peak                  7.369679                 11.566110
          Midday                  10.795643                 11.862409
          PM Peak                  8.400303                 11.987659
          Evening                 14.749329                 14.784249
bus bunching statistics
                   Fullerton & Narragansett  Grand & Nordica Terminal
rtdir     bins                                                       
Westbound AM Peak                  0.084203                  0.010477
          Midday                   0.021333                  0.007554
          PM Peak                  0.036849                  0.010896
          Evening                  0.008729                  0.007782
76 Westbound
                   Diversey & Neva Terminal  Logan Square Blue Line Station
rtdir     bins                                                             
Westbound AM Peak                 14.960640                        9.394377
          Midday                  14.690593                       12.909493
          PM Peak                  8.879924                        8.830972
          Evening                 15.685628                       16.139698
bus bunching statistics
                   Diversey & Neva Terminal  Logan Square Blue Line Station
rtdir     bins                                                             
Westbound AM Peak                  0.017578                        0.102493
          Midday                   0.011874                        0.033969
          PM Peak                  0.061628                        0.080837
          Evening                  0.016649                        0.019340
76 Eastbound
                   Logan Square Blue Line Station  \
rtdir     bins                                      
Eastbound AM Peak                        8.238937   
          Midday                        13.955731   
          PM Peak                       11.815253   
          Evening                       20.509034   

                   Cannon & Nature Museum/Fullerton  
rtdir     bins                                       
Eastbound AM Peak                          8.345199  
          Midday                          13.223468  
          PM Peak                         11.049056  
          Evening                         20.336149  
bus bunching statistics
                   Logan Square Blue Line Station  \
rtdir     bins                                      
Eastbound AM Peak                        0.058517   
          Midday                         0.018953   
          PM Peak                        0.074303   
          Evening                        0.005533   

                   Cannon & Nature Museum/Fullerton  
rtdir     bins                                       
Eastbound AM Peak                          0.051843  
          Midday                           0.014834  
          PM Peak                          0.059207  
          Evening                          0.001342  
82 Northbound
                    Kimball + Belmont  Kimball + Peterson  \
rtdir      bins                                             
Northbound AM Peak           6.838023            6.838023   
           Midday            9.197232            9.197232   
           PM Peak           9.108279            9.108279   
           Evening          19.551838           19.551838   

                    Devon + Kedzie Terminal  Lincolnwood Town Center  
rtdir      bins                                                       
Northbound AM Peak                23.156613                 9.095463  
           Midday                162.205964                 9.916637  
           PM Peak                      NaN                 9.108279  
           Evening                22.151088                18.443176  
bus bunching statistics
                    Kimball + Belmont  Kimball + Peterson  \
rtdir      bins                                             
Northbound AM Peak           0.117732            0.117732   
           Midday            0.049884            0.049884   
           PM Peak           0.070697            0.070697   
           Evening           0.006812            0.006812   

                    Devon + Kedzie Terminal  Lincolnwood Town Center  
rtdir      bins                                                       
Northbound AM Peak                 0.058550                 0.072202  
           Midday                  0.004525                 0.031228  
           PM Peak                      NaN                 0.070697  
           Evening                 0.000000                 0.009747  
82 Southbound
                    Homan & Congress  Central Park & 31st Street  \
rtdir      bins                                                    
Southbound AM Peak          8.947503                   12.133837   
           Midday          10.281963                   11.586422   
           PM Peak          6.203770                   12.285979   
           Evening         11.982293                   15.972443   

                    Komensky & 31st Street  Central Park & Cermak  
rtdir      bins                                                    
Southbound AM Peak                     NaN               8.947503  
           Midday                      NaN              10.371914  
           PM Peak                     NaN               7.588948  
           Evening               15.914119              12.591775  
bus bunching statistics
                    Homan & Congress  Central Park & 31st Street  \
rtdir      bins                                                    
Southbound AM Peak          0.092975                    0.024096   
           Midday           0.073906                    0.045084   
           PM Peak          0.214525                    0.050393   
           Evening          0.079119                    0.013761   

                    Komensky & 31st Street  Central Park & Cermak  
rtdir      bins                                                    
Southbound AM Peak                     NaN               0.092975  
           Midday                      NaN               0.066565  
           PM Peak                     NaN               0.144544  
           Evening                0.026984               0.057292  

Scanning the output, there are a couple of routes with very high wait times. Route 53 Pulaski buses heading toward Pulaski & Harrison or Irving Park & Keystone only make trips at night, hence the long wait times for those buses during the day. Similarly, Route 82 Kimball-Homan buses only make trips to Devon & Kedzie at the start and end of their service day.


In [16]:
outliers = ["Pulaski & Harrison (Blue Line)", "Irving Park + Keystone", "Devon + Kedzie Terminal"]

all_waits = pd.concat(
    (pd.melt(x[2].reset_index(), id_vars=["rtdir", "bins"], var_name="terminal", value_name="wait_time").assign(rt=x[0]) for x in losq_mean_waits),
    ignore_index=True
)

all_waits = all_waits[~all_waits.terminal.isin(outliers)]

In [17]:
all_waits.loc[all_waits.wait_time.nlargest(3).index]


Out[17]:
rtdir bins terminal wait_time rt
13 Northbound Midday Western & Berwyn Terminal 69.266782 X49
9 Southbound Midday 79th & Western Terminal 67.735297 X49
71 Eastbound Evening Clark & North 23.377171 73

In [18]:
print "Longest wait times:"
print all_waits.loc[all_waits.wait_time.nlargest(6).index]


Longest wait times:
         rtdir     bins                                terminal  wait_time  \
13  Northbound   Midday               Western & Berwyn Terminal  69.266782   
9   Southbound   Midday                 79th & Western Terminal  67.735297   
71   Eastbound  Evening                           Clark & North  23.377171   
63  Southbound  Evening                      Milwaukee & Kedzie  21.801350   
55  Southbound  Evening                      Washington & Canal  21.465109   
59  Southbound  Evening  Washington + Wabash (Nearside Garland)  21.465109   

     rt  
13  X49  
9   X49  
71   73  
63   56  
55   56  
59   56  

In [19]:
print "Shortest wait times:"
print all_waits.loc[all_waits.wait_time.nsmallest(3).index]


Shortest wait times:
          rtdir     bins            terminal  wait_time  rt
118  Southbound  PM Peak    Homan & Congress   6.203770  82
100  Northbound  AM Peak   Kimball + Belmont   6.838023  82
104  Northbound  AM Peak  Kimball + Peterson   6.838023  82

In [20]:
all_bunches = pd.concat(
    (pd.melt(x[2].reset_index(), id_vars=["rtdir", "bins"], var_name="terminal", value_name="bunch_ratio").assign(rt=x[0]) for x in losq_bunching),
    ignore_index=True
)

all_bunches = all_bunches[~all_bunches.terminal.isin(outliers)]

print "Overall worst bus bunching:"
print all_bunches.loc[all_bunches.bunch_ratio.nlargest(3).index]


Overall worst bus bunching:
          rtdir     bins                   terminal  bunch_ratio   rt
118  Southbound  PM Peak           Homan & Congress     0.214525   82
12   Northbound  AM Peak  Western & Berwyn Terminal     0.148706  X49
130  Southbound  PM Peak      Central Park & Cermak     0.144544   82

In [21]:
print "Most bunching during morning rush hour:"
print all_bunches.loc[all_bunches[all_bunches.bins == "AM Peak"].bunch_ratio.nlargest(3).index]
print "\nLeast bunching during morning rush hour:"
print all_bunches.loc[all_bunches[all_bunches.bins == "AM Peak"].bunch_ratio.nsmallest(3).index]


Most bunching during morning rush hour:
          rtdir     bins                   terminal  bunch_ratio   rt
12   Northbound  AM Peak  Western & Berwyn Terminal     0.148706  X49
60   Southbound  AM Peak         Milwaukee & Kedzie     0.130341   56
100  Northbound  AM Peak          Kimball + Belmont     0.117732   82

Least bunching during morning rush hour:
         rtdir     bins                      terminal  bunch_ratio  rt
24  Southbound  AM Peak  Kedzie + 63rd Place Terminal     0.003687  52
4   Southbound  AM Peak       79th & Western Terminal     0.003854  49
44  Northbound  AM Peak   Pulaski & Peterson Terminal     0.008214  53

In [22]:
print "Most bunching during evening rush hour:"
print all_bunches.loc[all_bunches[all_bunches.bins == "PM Peak"].bunch_ratio.nlargest(3).index]
print "\nLeast bunching during evening rush hour:"
print all_bunches.loc[all_bunches[all_bunches.bins == "PM Peak"].bunch_ratio.nsmallest(3).index]


Most bunching during evening rush hour:
          rtdir     bins                   terminal  bunch_ratio  rt
118  Southbound  PM Peak           Homan & Congress     0.214525  82
130  Southbound  PM Peak      Central Park & Cermak     0.144544  82
2    Northbound  PM Peak  Western & Berwyn Terminal     0.116848  49

Least bunching during evening rush hour:
        rtdir     bins                  terminal  bunch_ratio  rt
70  Eastbound  PM Peak             Clark & North     0.003472  73
66  Westbound  PM Peak  Grand & Latrobe Terminal     0.010152  73
82  Westbound  PM Peak  Grand & Nordica Terminal     0.010896  74