Using Python to Analyze Public Transport Data

by Alex Raichev

My background

  • I like math, algorithms, and Python
  • Work @MRCagney, a transport and planning consultancy
  • Get to use Python there to analyze and visualize civic data

Intro

  • Three main kinds of public transport data:
    • Scheduling: when and where the vehicles should be
    • Realtime: when and where the vehicles actually are
    • Ticketing: who is riding the vehicels
  • Focus here on scheduling data

General Transit Feed Specification (GTFS)

  • Most popular open standard for encoding scheduling data
  • Developed in 2005 by Portland's TriMet transit agency and Google
  • Some uses
    • Routing, ala Google Maps
    • Measuring performance
    • Answering whimsical questions

Example

Challenge

  • Use the Auckland GTFS feed included here (from Nov 2014; has fixed shapes and route IDs) and Python to answer the following questions:
    • What is the shortest/longest distance route in Auckland?
    • What is the slowest/fastest route in Auckland?

Approach

  • Pythonistas, how to handle all those CSVs?
  • csv module?
  • Database?
  • Suggestions?
  • Pandas!

In [15]:
import pandas as pd

# Custom module
import utils

%load_ext autoreload
%autoreload 2

# My version with fixed shapes and route short names
gtfs_dir = '../data/auckland_20141101_gtfs_fixed/'

# List directory
!ls $gtfs_dir


The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload
agency.txt         calendar_dates.txt shapes.txt         stops.txt
calendar.txt       routes.txt         stop_times.txt     trips.txt

In [26]:
# Pandas play...

# Read in trips
trips = pd.read_csv(gtfs_dir + 'trips.txt')
trips

# Just show route, trip, and shape: slice columns
trips[['route_id', 'trip_id', 'shape_id']]

# Find all trips of a specific route: slice rows
trips[trips['route_id'] == 'r000001']

# Group all trips by route: group
g = trips.groupby('route_id')
g.get_group('r000002')

# Compute number of trips per route: split, apply, combine
def my_agg(group):
    d = {}
    d['num_trips'] = group.shape[0]
    return pd.Series(d)

gg = g.apply(my_agg).reset_index().sort('num_trips', ascending=False)

# Get route short name: join
routes = pd.read_csv(gtfs_dir + 'routes.txt')
pd.merge(routes[['route_id', 'route_short_name']], gg).sort('num_trips', 
  ascending=False)

# While we're here, what route has the most trips?


Out[26]:
route_id route_short_name num_trips
276 r000276 OUT 429
274 r000274 NEX 402
256 r000256 CTY 373
261 r000261 INN 346
69 r000069 258 326
253 r000253 AIR 297
71 r000071 267 268
74 r000074 277 250
67 r000067 249 218
101 r000101 380 212
432 r000432 WEST 196
73 r000073 274 191
130 r000130 500 188
6 r000006 020 181
140 r000140 550 181
62 r000062 233 162
83 r000083 312 155
144 r000144 552 152
159 r000159 625 145
187 r000187 813 141
127 r000127 497 139
1 r000001 007 137
125 r000125 487 136
237 r000237 955 132
262 r000262 MNK 132
61 r000061 224 127
3 r000003 009 122
8 r000008 030 120
430 r000430 STH 120
411 r000411 SE 118
... ... ... ...
382 r000382 S267 1
88 r000088 330 1
356 r000356 S079 1
366 r000366 S090 1
353 r000353 S075 1
347 r000347 S069 1
346 r000346 S067 1
370 r000370 S097 1
345 r000345 S066 1
372 r000372 S115 1
373 r000373 S115B 1
376 r000376 S156 1
377 r000377 S156B 1
381 r000381 S248 1
383 r000383 S302 1
401 r000401 S323 1
384 r000384 S303 1
385 r000385 S304 1
386 r000386 S305 1
387 r000387 S309 1
388 r000388 S310 1
389 r000389 S311 1
390 r000390 S312 1
391 r000391 S313 1
394 r000394 S316 1
396 r000396 S318 1
397 r000397 S319 1
399 r000399 S321 1
400 r000400 S322 1
367 r000367 S091 1

434 rows × 3 columns


In [28]:
# Let's make a class to hold all this data.
# See utils.py.
feed = utils.Feed(gtfs_dir)
feed.routes.head()


Out[28]:
route_short_name route_id agency_id route_long_name route_type route_color route_text_color
0 005 r000000 AT Pt Chevalier To Britomart Via Herne Bay And Co... 3 00F0FF 000000
1 007 r000001 AT Pt Chevalier To St Heliers 3 00F0FF 000000
2 008 r000002 AT New Lynn To Otahuhu 3 DCC8FF C80000
3 009 r000003 AT Onehunga To New Lynn 3 DCC8FF C80000
4 010 r000004 AT Onehunga To Wynyard Quarter Via Unitec 3 00F0FF 000000

In [35]:
# Make a Pandas data frame with all the columns of ``self.trips`` plus the columns
#
# - num_stops: number of stops on trip
# - start_time: first departure time of the trip
# - end_time: last departure time of the trip
# - duration: duration of the trip in hours
# - distance: distance of the trip in kilometers
# - speed: speed of the trips in kilometers per hour

f = feed.stop_times.copy().sort(['trip_id', 'stop_sequence'])
f['departure_time'] = f['departure_time'].map(utils.timestr_to_seconds)
f

def my_agg2(group):
    d = {}
    d['num_stops'] = group.shape[0]
    d['start_time'] = group['departure_time'].iat[0]
    d['end_time'] = group['departure_time'].iat[-1]
    d['duration'] = (d['end_time'] - d['start_time'])/3600
    d['distance'] = group['shape_dist_traveled'].iat[-1]
    d['speed'] = d['distance']/d['duration']
    return pd.Series(d)
    
ff = f.groupby('trip_id').apply(my_agg2).reset_index()
trips_stats = pd.merge(ff, feed.trips)
trips_stats


Out[35]:
trip_id distance duration end_time num_stops speed start_time route_id service_id trip_headsign direction_id shape_id
0 0051ML61550600114378 8.72115 0.666667 24000 24 13.081725 21600 r000000 0051ML61550600114378 BRITOMART 0 s000269
1 0051ML61550630123378 8.72115 0.666667 25800 24 13.081725 23400 r000000 0051ML61550630123378 BRITOMART 0 s000269
2 0051ML61550700132378 8.72115 0.666667 27600 24 13.081725 25200 r000000 0051ML61550700132378 BRITOMART 0 s000269
3 0051ML61550715136878 8.72115 0.666667 28500 24 13.081725 26100 r000000 0051ML61550715136878 BRITOMART 0 s000269
4 0051ML61550730141378 8.72115 0.666667 29400 24 13.081725 27000 r000000 0051ML61550730141378 BRITOMART 0 s000269
5 0051ML61550745145878 8.72115 0.666667 30300 24 13.081725 27900 r000000 0051ML61550745145878 BRITOMART 0 s000269
6 0051ML61550800150378 8.72115 0.666667 31200 24 13.081725 28800 r000000 0051ML61550800150378 BRITOMART 0 s000269
7 0051ML61550815154878 8.72115 0.666667 32100 24 13.081725 29700 r000000 0051ML61550815154878 BRITOMART 0 s000269
8 0051ML61550830159378 8.72115 0.666667 33000 24 13.081725 30600 r000000 0051ML61550830159378 BRITOMART 0 s000269
9 0052ML61561605348873 8.57533 0.666667 60300 26 12.862995 57900 r000000 0052ML61561605348873 PT CHEV 1 s000017
10 0052ML61561625355893 8.57533 0.666667 61500 26 12.862995 59100 r000000 0052ML61561625355893 PT CHEV 1 s000017
11 0052ML61561645362913 8.57533 0.666667 62700 26 12.862995 60300 r000000 0052ML61561645362913 PT CHEV 1 s000017
12 0052ML61561705369933 8.57533 0.666667 63900 26 12.862995 61500 r000000 0052ML61561705369933 PT CHEV 1 s000017
13 0052ML61561725376953 8.57533 0.666667 65100 26 12.862995 62700 r000000 0052ML61561725376953 PT CHEV 1 s000017
14 0052ML61561745383973 8.57533 0.666667 66300 26 12.862995 63900 r000000 0052ML61561745383973 PT CHEV 1 s000017
15 0052ML61561805390993 8.57533 0.666667 67500 26 12.862995 65100 r000000 0052ML61561805390993 PT CHEV 1 s000017
16 0052ML61561825398013 8.57533 0.666667 68700 26 12.862995 66300 r000000 0052ML61561825398013 PT CHEV 1 s000017
17 0052ML61561845405033 8.57533 0.666667 69900 26 12.862995 67500 r000000 0052ML61561845405033 PT CHEV 1 s000017
18 0070ML577519153849114 26.18067 0.916667 72600 80 28.560731 69300 r000001 0070ML577519153849114 ST HELIERS 1 s000266
19 0070ML577520154043514 26.18067 0.916667 76200 80 28.560731 72900 r000001 0070ML577520154043514 ST HELIERS 1 s000266
20 0070ML577521154231898 26.18067 0.833333 79500 80 31.416804 76500 r000001 0070ML577521154231898 ST HELIERS 1 s000266
21 0071ML57760625896844 22.02507 0.833333 26100 65 26.430084 23100 r000001 0071ML57760625896844 PT CHEV 0 s000222
22 0071ML577607101007566 22.02507 0.966667 29280 65 22.784555 25800 r000001 0071ML577607101007566 PT CHEV 0 s000222
23 0071ML577607301062533 22.02507 1.166667 31200 65 18.878631 27000 r000001 0071ML577607301062533 PT CHEV 0 s000222
24 0071ML577608101153869 22.02507 1.083333 33300 65 20.330834 29400 r000001 0071ML577608101153869 PT CHEV 0 s000222
25 0071ML577615102037849 22.02507 1.050000 58380 65 20.976257 54600 r000001 0071ML577615102037849 PT CHEV 0 s000222
26 0071ML577615402102199 22.02507 1.050000 60180 65 20.976257 56400 r000001 0071ML577615402102199 PT CHEV 0 s000222
27 0071ML577616152176891 22.02507 1.050000 62280 65 20.976257 58500 r000001 0071ML577616152176891 PT CHEV 0 s000222
28 0071ML577616452241624 22.02507 1.050000 64080 65 20.976257 60300 r000001 0071ML577616452241624 PT CHEV 0 s000222
29 0071ML577617152303623 22.02507 1.000000 65700 65 22.025070 62100 r000001 0071ML577617152303623 PT CHEV 0 s000222
... ... ... ... ... ... ... ... ... ... ... ... ...
14572 9988NS402414256939682 43.36482 1.450000 57120 122 29.906772 51900 r000251 9988NS402414256939682 OREWA 1 s000811
14573 9988NS402416057654709 43.36482 1.383333 62880 122 31.348063 57900 r000251 9988NS402416057654709 OREWA 1 s000811
14574 9988NS402416257805138 43.36482 1.416667 64200 122 30.610461 59100 r000251 9988NS402416257805138 OREWA 1 s000811
14575 9988NS402416257811378 43.36482 1.416667 64200 122 30.610461 59100 r000251 9988NS402416257811378 OREWA 1 s000811
14576 9988NS402417058088884 43.36482 1.333333 66300 122 32.523615 61500 r000251 9988NS402417058088884 OREWA 1 s000811
14577 9991NS43050700764419 21.39810 0.700000 27720 58 30.568714 25200 r000252 9991NS43050700764419 OREWA 0 s000562
14578 9992NS37590705851757 22.09762 0.666667 27900 61 33.146430 25500 r000252 9992NS37590705851757 ARMY BAY 1 s000804
14579 9992NS375916351941892 22.09762 0.800000 62580 61 27.622025 59700 r000252 9992NS375916351941892 ARMY BAY 1 s000804
14580 9992NS375917352049982 22.09762 0.716667 65880 61 30.833888 63300 r000252 9992NS375917352049982 ARMY BAY 1 s000804
14581 9993NS155418202208589 25.59740 0.800000 68880 62 31.996750 66000 r000252 9993NS155418202208589 OREWA 0 s000566
14582 9995NS41340710862023 22.15162 0.666667 28200 61 33.227430 25800 r000252 9995NS41340710862023 OREWA 0 s000565
14583 9995NS413420102329056 22.15162 0.583333 74700 61 37.974206 72600 r000252 9995NS413420102329056 OREWA 0 s000565
14584 9995NS413421102442516 22.15162 0.583333 78300 61 37.974206 76200 r000252 9995NS413421102442516 OREWA 0 s000565
14585 9996NS548407354203329 75.73279 2.250000 35400 127 33.659018 27300 r000220 9996NS548407354203329 MIDTOWN 1 s000806
14586 9997NS404209354515773 43.57203 1.416667 39600 119 30.756727 34500 r000252 9997NS404209354515773 SILVERDALE 1 s000805
14587 9997NS404211355393408 43.57203 1.500000 47100 119 29.048020 41700 r000252 9997NS404211355393408 SILVERDALE 1 s000805
14588 9997NS404213356241752 43.57203 1.416667 54000 119 30.756727 48900 r000252 9997NS404213356241752 SILVERDALE 1 s000805
14589 9997NS404215357094830 43.57203 1.416667 61200 119 30.756727 56100 r000252 9997NS404215357094830 SILVERDALE 1 s000805
14590 9998NS402507203577738 43.36482 1.583333 32100 119 27.388307 26400 r000252 9998NS402507203577738 OREWA 1 s000807
14591 9998NS402507203587172 43.36482 1.583333 32100 119 27.388307 26400 r000252 9998NS402507203587172 OREWA 1 s000807
14592 9998NS402509204434538 43.36482 1.583333 39300 119 27.388307 33600 r000252 9998NS402509204434538 OREWA 1 s000807
14593 9998NS402509204440087 43.36482 1.583333 39300 119 27.388307 33600 r000252 9998NS402509204440087 OREWA 1 s000807
14594 9998NS402511255301733 43.36482 1.500000 46500 119 28.909880 41100 r000252 9998NS402511255301733 OREWA 1 s000807
14595 9998NS402511255307509 43.36482 1.500000 46500 119 28.909880 41100 r000252 9998NS402511255307509 OREWA 1 s000807
14596 9998NS402513256158533 43.36482 1.500000 53700 119 28.909880 48300 r000252 9998NS402513256158533 OREWA 1 s000807
14597 9998NS402513256166158 43.36482 1.500000 53700 119 28.909880 48300 r000252 9998NS402513256166158 OREWA 1 s000807
14598 9998NS402515257010001 43.36482 1.500000 60900 119 28.909880 55500 r000252 9998NS402515257010001 OREWA 1 s000807
14599 9998NS402515257022107 43.36482 1.500000 60900 119 28.909880 55500 r000252 9998NS402515257022107 OREWA 1 s000807
14600 9998NS402517257845402 43.36482 1.416667 67800 119 30.610461 62700 r000252 9998NS402517257845402 OREWA 1 s000807
14601 9998NS402517257863929 43.36482 1.500000 68100 119 28.909880 62700 r000252 9998NS402517257863929 OREWA 1 s000807

14602 rows × 12 columns


In [37]:
# Aggregate trip stats to route level,
# but only include trips active on the study date.
# See utils.py.

study_date = '20141103'  # Monday
print('num trips on {!s} is'.format(study_date), feed.get_trips(study_date).shape[0])
routes_stats = feed.get_routes_stats(trips_stats, study_date)
routes_stats.T


num trips on 20141103 is 8930
Out[37]:
0 1 2 3 4 5 6 7 8 9 ... 412 413 414 415 416 417 418 419 420 421
route_short_name 005 007 008 009 010 011 020 020X 030 048 ... SPBS4 SPBS5 SPBS6 SPBS7 SRU810 SRU812 STH STP1 WEST WSTH
route_id r000000 r000001 r000002 r000003 r000004 r000005 r000006 r000007 r000008 r000009 ... r000424 r000425 r000426 r000427 r000428 r000429 r000430 r000431 r000432 r000433
agency_id AT AT AT AT AT AT AT AT AT AT ... AT AT AT AT AT AT AT AT AT AT
route_long_name Pt Chevalier To Britomart Via Herne Bay And Co... Pt Chevalier To St Heliers New Lynn To Otahuhu Onehunga To New Lynn Onehunga To Wynyard Quarter Via Unitec Onehunga To St Lukes Via Landscape Rd Westmere To Britomart Via Richmond Rd Westmere To Britomart Express Pt Chevalier To Britomart Via Williamson Ave Te Atatu Peninsula To Britomart ... Weymouth & Wattledowns To Manurewa High Clendon To Alfriston College Greenmeadows Intermediate To Clendon Alfriston School To The Everglades Westgate To Rutherford High School Swanson To Rutherford College Papakura To Britomart Via Newmarket Tirimoana Primary To Sabulite Rd Britomart To Henderson West Harbour To Auckland
route_type 3 3 3 3 3 3 3 3 3 3 ... 3 3 3 3 3 3 2 3 2 4
route_color 00F0FF 00F0FF DCC8FF DCC8FF 00F0FF 00F0FF 00F0FF 00F0FF 00F0FF 64FF00 ... FFFFFF FFFFFF FFFFFF FFFFFF FFFFFF FFFFFF FFFFFF FFFFFF FFFFFF FFFFFF
route_text_color 000000 000000 C80000 C80000 000000 000000 000000 000000 000000 000000 ... 000000 000000 000000 000000 000000 000000 000000 000000 000000 000000
start_time 21600 21300 19800 21600 23400 32400 21600 24600 22500 21000 ... 28800 27900 54300 53400 25500 25800 18900 53700 18900 25200
end_time 69900 79800 78900 76200 68280 53640 86400 66600 84420 34500 ... 55200 57600 57000 57000 58200 58200 82980 54900 81420 69900
num_trips 18 79 55 53 36 12 97 10 68 8 ... 2 2 1 1 2 2 79 1 100 27
max_trip_distance 8.72115 28.77413 23.48133 25.19758 17.54451 8.96219 8.54975 7.30827 10.35137 19.38778 ... 19.36152 19.05493 10.79873 14.70057 24.36412 14.64206 49.29895 3.79851 31.82829 12.46688
max_trip_speed 13.08173 31.84876 25.616 30.2371 22.39725 24.25822 17.3171 10.96241 17.17091 29.08167 ... 29.04228 25.40657 14.39831 14.70057 24.36412 17.33828 42.25624 11.39553 32.36775 21.37179
service_distance 155.6683 2070.473 1285.99 1280.394 628.8095 107.1412 820.5146 72.07555 633.9946 155.1022 ... 31.53212 28.59949 10.79873 14.70057 38.19671 29.09063 3027.271 3.79851 2972.528 336.6058
service_duration 12 84.35 62.61667 55.48333 32.35 4.6 54.63333 6.666667 41.01667 6.833333 ... 1.166667 1.583333 0.75 1 1.916667 1.75 77.85 0.3333333 93.3 15.75
service_speed 12.97236 24.54622 20.5375 23.0771 19.4377 23.29157 15.01857 10.81133 15.457 22.69789 ... 27.02753 18.06284 14.39831 14.70057 19.92872 16.62322 38.88594 11.39553 31.8599 21.37179

15 rows × 422 columns


In [43]:
display_cols = [
  'route_short_name', 
  'route_long_name', 
  'route_type', 
  'start_time',
  'end_time',
  'num_trips',
  'max_trip_speed',
  ]

# The shortest/longest route is...
routes_stats.sort('max_trip_distance')[display_cols]

# The slowest/fastest route is...
f = routes_stats.sort('max_trip_speed')[display_cols]
f[~f['route_short_name'].str.startswith('S')]


Out[43]:
route_short_name route_long_name route_type start_time end_time num_trips max_trip_speed
255 CTY Wynyard Quarter 3 23160 87420 137 9.226129
7 020X Westmere To Britomart Express 3 24600 66600 10 10.962405
0 005 Pt Chevalier To Britomart Via Herne Bay And Co... 3 21600 69900 18 13.081725
253 BAYS Bayswater To Auckland 4 24000 76800 42 13.822320
55 220 St Lukes To Midtown 3 27300 71280 24 14.491157
256 DEV Devonport To Auckland 4 20700 85500 76 14.516480
57 221X Rosebank Rd To Midtown Express 3 24900 66600 7 14.693073
260 INN Inner Link Anticlockwise 3 23220 85500 196 15.569320
63 240 Sandringham To Midtown Limited Stop Express 3 26400 31620 3 15.791822
64 240X Blockhouse Bay To Midtown Express 3 25800 32220 3 15.958789
239 960 Northcote Point To Highbury 3 30900 52800 11 16.363920
60 223X New Lynn To Midtown Express 3 25500 66600 10 16.410427
66 243X New Lynn To Midtown Express 3 27300 64680 4 16.895355
68 249X Blockhouse Bay To Midtown Express 3 25200 67320 8 17.024536
8 030 Pt Chevalier To Britomart Via Williamson Ave 3 22500 84420 68 17.170913
160 643 Britomart To Upland Rd Via Parnell & Remuera Rd 3 28200 29400 1 17.193630
6 020 Westmere To Britomart Via Richmond Rd 3 21600 86400 97 17.317097
70 258X Civic Centre To Blockhouse Bay Express 3 27600 64740 4 17.631135
168 715X Glen Innes Express To Britomart 3 27300 31500 2 17.642235
75 283 Hospitals To Britomart 3 24600 69720 36 17.936356
50 186 New Lynn Loop Via Seabrook Ave And Astley Ave 3 23400 76800 30 18.344700
54 209 Titirangi To Downtown Via Green Bay 3 25200 66300 9 18.817800
187 815 Devonport To Westwell Rd Via Ngataringa Rd 3 25500 64800 4 18.911120
185 804 Bayswater Wharf To Takapuna 3 25800 69720 12 18.953676
157 606 Benson Road To Civic Centre 3 30000 58440 20 19.391471
88 330 Symonds St To Onehunga 3 54600 56400 1 19.544260
30 104 New Lynn Local 3 28800 60480 9 19.779268
33 11X Henderson To Britomart Express Via Glendene 3 24300 66600 6 19.785166
65 243 New Lynn To Midtown Via Sandringham Road 3 21000 69300 24 19.898973
62 233 New Lynn To Midtown Via Sandringham Road And S... 3 23820 89580 67 20.414360
... ... ... ... ... ... ... ...
134 512 Mt Wellington To Britomart Via Ruawai Road An... 3 32400 86700 19 36.057408
24 093 Whenuapai To Westgate 3 30600 60900 9 36.228270
221 899X Army Bay To Takapuna Express 3 23880 28620 1 36.418899
214 893X Orewa To Midtown Express 3 26100 66360 4 37.053010
237 956 Greenhithe To Midtown 3 24000 69000 11 37.073335
250 998 899 Orewa To Army Bay To Silverdale 3 23700 71340 11 37.314703
98 358 Pakuranga Plaza To Onehunga 3 24600 63300 5 37.420656
263 NEX Albany To Britomart 3 19800 88200 271 37.667580
261 MNK Manukau To Britomart Via Glen Innes 2 20760 81900 58 38.060985
118 472X Red Hill And Papakura To Britomart Express 3 21300 30900 3 38.275400
262 MTIA Waiheke Island To Auckland Direct 4 21000 90300 39 38.399777
266 PINE Pine Harbour To Auckland 4 22800 69300 30 38.583291
176 767 Glendowie South To Britomart Via Tamaki Drive 3 23100 75000 31 38.952560
212 892X Orewa To Auckland Express 3 24900 64200 3 39.904403
114 470X Papakura To Britomart Express 3 23400 67500 7 40.208640
220 899 Army Bay To Takapuna 3 21600 74700 11 40.769585
121 476 Pukekohe To Tuakau To Pukekohe 3 25500 67500 2 41.041775
213 893 Hibiscus Coast To Midtown Via Albany & Takapuna 3 25200 73080 3 41.276259
218 897X Gulf Harbour To Midtown Express 3 21780 69900 14 42.329848
29 102 New Lynn To Patiki Rd 3 23400 63600 9 42.810168
143 552 Pakuranga Plaza To Bucklands Beach 3 20400 89520 72 43.032864
219 898 Army Bay To Takapuna 3 20100 79140 14 43.375871
154 589 Botany Town Centre To Beachlands 3 22500 67800 12 43.889820
217 896 Hibiscus Coast To Midtown 3 72600 89100 5 43.897749
178 769 Glendowie North To Brtomart Via Tamaki Drive 3 21300 85380 50 44.945262
215 895 Waiwera To Midtown 3 25500 75600 21 46.716285
120 475 Papakura Shops To Pukekohe 3 21600 68100 36 50.632728
216 895X Waiwera To Midtown Express 3 21600 70620 11 51.346004
12 060 Helensville To Downtown 3 21000 74400 20 53.718279
123 479 Waiuku To Papakura 3 23100 66300 2 54.927730

269 rows × 7 columns

Some open-source GTFS tools


In [ ]: