Larger context of transit APIs: 45 Transit APIs: Yahoo Traffic, SMSMyBus and BART | ProgrammableWeb. How about NYC?
BART - API Documentation, Station
get info about stations
In [1]:
%run talktools
Let's look at the documentation for the BART API:
Definitely good to familiarize yourself with the BART website
Also a good idea to use the developer's group: https://groups.google.com/forum/#!forum/bart-developers
fun to look at Ridership Reports | bart.gov too.
Two options for use of key
use the public key: MW9S-E7SL-26DU-VV8V
get your own key if you sign up for one. (read Developer License Agreement | bart.gov)
Benefits of signing up:
If you sign up for your own key you'll still be able to access the API if it turns into a tragedy of the commons. Plus you'll get a backstage pass to check out pre-release functionality that might just break everything you're working on -- or maybe give you a leg up in the marketplace.
In [2]:
# setting up the API Key
import requests
import urllib
from lxml.etree import fromstring
try:
# to allow import of a registered key if it exists
from settings import BART_API_KEY
except:
BART_API_KEY = "MW9S-E7SL-26DU-VV8V"
In [3]:
# [BART - API Documentation, Station](http://api.bart.gov/docs/stn/stns.aspx)
def stations(key=BART_API_KEY):
url = "http://api.bart.gov/api/stn.aspx?" + \
urllib.urlencode({'key':key,
'cmd':'stns'})
r = requests.get(url)
return r
# grab the content of the XML document returned by the API
r = stations().content
print r
In [4]:
# how many stations?
# parse the XML document to look for number of nodes with stations/station
from lxml import etree
stations = etree.fromstring(r).xpath("stations/station")
len(stations)
Out[4]:
In [5]:
# let's make a DataFrame (table) out of the stations data
from pandas import DataFrame
from collections import OrderedDict
def station_to_ordereddict(station):
return OrderedDict([(child.tag, child.text) for child in station.iterchildren()])
stations_df = DataFrame([station_to_ordereddict(station) for station in stations])
stations_df
Out[5]:
In [6]:
for s in stations_df.T.to_dict().values():
print s['name'], float(s['gtfs_latitude']), float(s['gtfs_longitude'])
In [7]:
# plot the maps using folium
# possible alternative: leaflet.js widget that Brian Granger working on
# https://github.com/ellisonbg/leaftletwidget
# http://nbviewer.ipython.org/gist/bburky/7763555/folium-ipython.ipynb
from IPython.display import HTML
import folium
def inline_map(map):
"""
Embeds the HTML source of the map directly into the IPython notebook.
This method will not work if the map depends on any files (json data). Also this uses
the HTML5 srcdoc attribute, which may not be supported in all browsers.
"""
map._build_map()
return HTML('<iframe srcdoc="{srcdoc}" style="width: 100%; height: 510px; border: none"></iframe>'.format(srcdoc=map.HTML.replace('"', '"')))
def embed_map(map, path="map.html"):
"""
Embeds a linked iframe to the map into the IPython notebook.
Note: this method will not capture the source of the map into the notebook.
This method should work for all maps (as long as they use relative urls).
"""
map.create_map(path=path)
return HTML('<iframe src="files/{path}" style="width: 100%; height: 510px; border: none"></iframe>'.format(path=path))
In [8]:
bart_map = folium.Map(location=[37.8717, -122.2728], zoom_start=9)
# for airport in islice(airports,None):
# lat = float(airport['Origin_lat'])
# lon = float(airport['Origin_long'])
# label = str(airport['Origin_airport']) # don't know why str necessary here
# airport_map.simple_marker([lat,lon],popup=label)
for s in stations_df.T.to_dict().values():
bart_map.simple_marker([float(s['gtfs_latitude']), float(s['gtfs_longitude'])],
popup=s['name'])
inline_map(bart_map)
Out[8]:
In [9]:
def train_count(key=BART_API_KEY):
url = "http://api.bart.gov/api/bsa.aspx?" + \
urllib.urlencode({'cmd':'count',
'key':key})
r = requests.get(url)
doc = fromstring(r.content)
call_dt = doc.find('date').text + " " +doc.find('time').text
count = doc.find('traincount').text
return (call_dt, count)
d = train_count()
d
Out[9]:
One of the most useful parts of the BART API is real-time departure information:
In [10]:
from dateutil.parser import parse
from dateutil.tz import tzlocal
from lxml.etree import fromstring
def etd(orig, key=BART_API_KEY):
url = "http://api.bart.gov/api/etd.aspx?" + \
urllib.urlencode({'cmd':'etd',
'orig':orig,
'key':key})
r = requests.get(url)
doc = fromstring(r.content)
estimations_list = []
# parse the datetime for the API call
s = doc.find('date').text + " " +doc.find('time').text
call_dt = parse(s)
# turn the results into a rectangular format
# parse the station
for station in doc.findall('station'):
etds = station.findall('etd')
for etd in etds:
estimates = etd.findall('estimate')
for estimate in estimates:
estimate_tuple = [(child.tag, child.text) for child in estimate.iterchildren()]
estimate_tuple += [('call_dt', call_dt),
('station', station.find('abbr').text),
('destination', etd.find('abbreviation').text)]
estimations_list.append(dict(estimate_tuple))
return estimations_list
In [11]:
# we can get ETA for all stations
from pandas import DataFrame
import pandas as pd
import numpy as np
df = DataFrame(etd('all'))
df
Out[11]:
In [12]:
# look up information for El Cerrito Plaza
df[df.station == 'PLZA']
Out[12]:
Now that we have some feel for the BART API, we now decide between:
In the case of the BART API, there is: https://pypi.python.org/pypi/bart_api/0.1 (github) by Reuben Castelino
Easiest way to use the library is to use pip from PyPi. Should be able to do:
pip install bart_api
but as of 2014.05.25, the version doesn't work for Python 2.x. The github version is more up-to-date. You can do installation from github (https://github.com/projectdelphai/bart_api):
pip install git+git://github.com/projectdelphai/bart_api.git
I've made some changes so it might be helpful to use my fork until my changes get folded back into the main project:
pip install git+git://github.com/rdhyee/bart_api.git
In [13]:
# example: use the bart_api to calculate the number of active trains
import bart_api
reload(bart_api)
bart = bart_api.BartApi()
bart.number_of_trains()
Out[13]:
In [14]:
# stations
station_list = bart.get_stations()
station_list
Out[14]:
In [15]:
#routes = bart.routes(sched, date)
# http://api.bart.gov/docs/route/routes.aspx
# http://api.bart.gov/api/route.aspx?cmd=routes&key=MW9S-E7SL-26DU-VV8V
routes = bart.routes(34, '05/27/2014')
routes
Out[15]:
In [16]:
# let's display the routes with the BART color coding
# sorted by BART numbering
# nice feature of the IPython notebook to able to use HTML
from IPython.display import HTML
import jinja2
ROUTE_TEMPLATE = """
<div class="wrap">
<table>
{% for item in items %}<tr>
<td style="background-color:{{item.color}}">{{item.name}}</td>
<td>{{item.number}}</td>
</tr>
{% endfor %}
</table>
</div>
"""
template = jinja2.Template(ROUTE_TEMPLATE)
HTML(template.render(items=sorted(routes, key=lambda r: int(r.get('number')))))
Out[16]:
In [17]:
# bart.get_route_schedule seems to be outdated
# https://github.com/projectdelphai/bart_api/blob/5101e0deec452ddca2f76d0d6d97d6725080ae31/bart_api/__init__.py#L147
bart.get_route_schedule('4')
Out[17]:
In [18]:
# http://api.bart.gov/docs/sched/routesched.aspx
# The optional "date" and "sched" parameters should not be used together.
# If they are, the date will be ignored, and the sched parameter will be used.
# to get route 4 (Richmond->Fremont)
# http://api.bart.gov/api/sched.aspx?cmd=routesched&route=4&key=MW9S-E7SL-26DU-VV8V
def filter_none(d):
return dict([(k,v) for (k,v) in d.items() if v is not None])
def route_schedule(route, sched=None, date=None, l=0, key=BART_API_KEY):
url = "http://api.bart.gov/api/sched.aspx?" + \
urllib.urlencode(filter_none({'cmd':'routesched',
'route':route,
'sched':sched,
'date':date,
'l':l,
'key':key}))
r = requests.get(url)
doc = fromstring(r.content)
return doc
doc = route_schedule(4)
doc
Out[18]:
In [19]:
print doc.find("date").text, "\n"
for train in doc.findall('route/train'):
print train.attrib['index']
for stop in train.iterchildren():
print stop.attrib['station'], stop.attrib['origTime']
print
Where I'd like to go...but will ultimately leave as an exercise to you all...comparing the scheduled arrival times with the real time BART info to see how individual trains (and the system as a whole) are doing.
What we do here:
Compare http://www.bart.gov/schedules/eta?stn=PLZA with http://www.bart.gov/schedules/bystationresults?station=PLZA
In [20]:
%%html
<iframe src="http://www.bart.gov/schedules/eta?stn=PLZA"/ width=800 height=600>
In [21]:
# Get schedule for PLZA
# using El Cerrito Plaza station as an example (closest station to me)
# http://www.bart.gov/stations/plza
# http://api.bart.gov/api/sched.aspx?cmd=stnsched&orig=plza&key=MW9S-E7SL-26DU-VV8V
schedule_plza_df = DataFrame(bart.get_station_schedule('PLZA'))
schedule_plza_df.head()
Out[21]:
In [22]:
# Get real time info for PLZA, accounting for 'Leaving', 'Arriving' in Minutes fields
# http://api.bart.gov/api/etd.aspx?cmd=etd&orig=PLZA&key=MW9S-E7SL-26DU-VV8V
# compare to http://www.bart.gov/schedules/eta?stn=PLZA
# need to handle the case of 'Leaving' and 'Arriving' given for minutes in the real time estimates
def etd_minutes_to_int(m):
if m in ['Leaving', 'Arriving']:
return 0
else:
try:
return int(m)
except:
return np.nan
etd_plza_df = DataFrame(etd('PLZA'))
etd_plza_df['minutes'] = etd_plza_df.minutes.apply(etd_minutes_to_int)
etd_plza_df
Out[22]:
In [23]:
# compute the absolute time associated with real time estimate
from datetime import timedelta
etd_plza_df['estimate'] = etd_plza_df.apply(lambda s: s['call_dt'] + \
timedelta(minutes=s['minutes']), axis=1)
etd_plza_df[['destination','direction','station','estimate','minutes']]
Out[23]:
In [24]:
# convert scheduled time into an absolute time
from pytz import timezone
pacific_tz = timezone("US/Pacific")
def orig_time_dt(ot):
""" """
dt = parse(ot)
# with rare exception, we should not find any times earlier than 4am -- if so this is the
# next day
# exception: Bay Bridge repair weekend http://www.bart.gov/news/articles/2013/news20130815
if dt.hour < 4:
dt = dt + timedelta(days=1)
# put dt into local timezone
dt = pacific_tz.localize(dt)
return dt
schedule_plza_df['orig_time'].apply(orig_time_dt)
Out[24]:
In [25]:
# as a first pass, compare only the the next trains for each route
next_plza_trains = etd_plza_df.groupby('destination').apply(lambda s: min(s['estimate']))
next_plza_trains
Out[25]:
In [26]:
# do the comparison
comparisons = []
for (destination, estimate) in next_plza_trains.iterkv():
scheduled_for_dest = schedule_plza_df[schedule_plza_df.train_head_station==destination]
time_diff = scheduled_for_dest['orig_time'].apply(orig_time_dt) - estimate
comparisons.append({'destination':destination,
'estimate':estimate,
'schedule':schedule_plza_df.iloc[np.argmin(abs(time_diff))]['orig_time'],
'diff': np.min(abs(time_diff))})
comparisons_df = DataFrame(comparisons)
comparisons_df
Out[26]:
In [27]:
# put it all together
# first pass at an algorithm
# for all ETD, find the minimum difference with a sechedule arrival
import requests
import urllib
from datetime import timedelta
from lxml.etree import fromstring
from dateutil.parser import parse
from dateutil.tz import tzlocal
from pandas import DataFrame
import numpy as np
import bart_api
try:
# to allow import of a registered key if it exists
from settings import BART_API_KEY
except:
BART_API_KEY = "MW9S-E7SL-26DU-VV8V"
from pytz import timezone
pacific_tz = timezone("US/Pacific")
def etd(orig, key=BART_API_KEY):
url = "http://api.bart.gov/api/etd.aspx?" + \
urllib.urlencode({'cmd':'etd',
'orig':orig,
'key':key})
r = requests.get(url)
doc = fromstring(r.content)
estimations_list = []
# parse the datetime for the API call
s = doc.find('date').text + " " +doc.find('time').text
call_dt = parse(s)
# turn the results into a rectangular format
# parse the station
for station in doc.findall('station'):
etds = station.findall('etd')
for etd in etds:
estimates = etd.findall('estimate')
for estimate in estimates:
estimate_tuple = [(child.tag, child.text) for child in estimate.iterchildren()]
estimate_tuple += [('call_dt', call_dt),
('station', station.find('abbr').text),
('destination', etd.find('abbreviation').text)]
estimations_list.append(dict(estimate_tuple))
return estimations_list
def etd_minutes_to_int(m):
if m in ['Leaving', 'Arriving']:
return 0
else:
try:
return int(m)
except:
return np.nan
def orig_time_dt(ot):
""" """
dt = parse(ot)
# with rare exception, we should not find any times earlier than 4am -- if so this is the
# next day
# exception: Bay Bridge repair weekend http://www.bart.gov/news/articles/2013/news20130815
if dt.hour < 4:
dt = dt + timedelta(days=1)
# put dt into local timezone
dt = pacific_tz.localize(dt)
return dt
bart = bart_api.BartApi(BART_API_KEY)
# using El Cerrito Plaza station as an example (closest station to me)
# http://www.bart.gov/stations/plza
# station schedule
# http://api.bart.gov/api/sched.aspx?cmd=stnsched&orig=plza&key=MW9S-E7SL-26DU-VV8V
schedule_plza_df = DataFrame(bart.get_station_schedule('PLZA'))
# http://api.bart.gov/api/etd.aspx?cmd=etd&orig=PLZA&key=MW9S-E7SL-26DU-VV8V
# compare to http://www.bart.gov/schedules/eta?stn=PLZA
etd_plza_df = DataFrame(etd('PLZA'))
# make sure minutes are integer
etd_plza_df['minutes'] = etd_plza_df.minutes.apply(etd_minutes_to_int)
# compute the absolute time of the real time estimates
etd_plza_df['estimate'] = etd_plza_df.apply(lambda s: s['call_dt'] + \
timedelta(minutes=s['minutes']), axis=1)
# for each route, compute the next train coming into PLZA
next_plza_trains = etd_plza_df.groupby('destination').apply(lambda s: min(s['estimate']))
comparisons = []
for (destination, estimate) in next_plza_trains.iterkv():
scheduled_for_dest = schedule_plza_df[schedule_plza_df.train_head_station==destination]
time_diff = scheduled_for_dest['orig_time'].apply(orig_time_dt) - estimate
comparisons.append({'destination':destination,
'estimate':estimate,
'schedule':schedule_plza_df.iloc[np.argmin(abs(time_diff))]['orig_time'],
'diff': np.min(abs(time_diff))})
#print destination, estimate, schedule_plza_df.iloc[np.argmin(abs(time_diff))]['orig_time'], np.min(abs(time_diff))
comparisons_df = DataFrame(comparisons)
comparisons_df
Out[27]:
As of 2014.05.26, http://api.bart.gov/api/sched.aspx?cmd=scheds&key=MW9S-E7SL-26DU-VV8V shows two schedules in effect: 33 and 34 -- when does schedule 33 get used? Is number 33 archival? How about older schedules -- are they still available from the API?
come back to Assessing accuracy of real-time arrival estimate systems
In [27]: