The API we're pulling data from is what supports the TTC's Next Train Arrivals page. With a bit of exploration through your browser's developer console, you can see that the page gets refreshed with data from a request to http://www.ttc.ca/Subway/loadNtas.action
In [1]:
import requests #to handle http requests to the API
from psycopg2 import connect
In [2]:
stationid = 3
#We'll find out the full range of possible stations further down.
lineid = 1
#[1,2,4]
In [3]:
# The url for the request
base_url = "http://www.ttc.ca/Subway/loadNtas.action"
In [4]:
# Our query parameters for this API request
payload = {#"subwayLine":lineid,
"stationId":stationid,
"searchCriteria":''} #The value in the search box
#it has to be included otherwise the query fails
#"_":request_epoch} #Great job naming variables...
# subwayLine and _ are redundant variables.
# We thought we could query historical data using the "_" parameter
# But it seems no
r = requests.get(base_url, params = payload)
So now we've just received our first request from the API and the response is stored in the requests
object r
. From previous examination of the API we know that the response to an API request is in JSON format. So the below code will pretty print out the response so we can have a look at the variables.
In [5]:
r.json()
Out[5]:
In [6]:
data = r.json()
In [7]:
data['ntasData'][0]['createDate']
Out[7]:
In [8]:
#Testing whether have to be explicit about line numbers for stations with multiple lines
payload = {#"subwayLine":lineid,
"stationId":10, #St. George, Line 1
"searchCriteria":''}
r = requests.get(base_url, params = payload)
r.json()
Out[8]:
In [9]:
#Testing whether have to be explicit about line numbers for stations with multiple lines
payload = {#"subwayLine":lineid,
"stationId":48, #St. George, Line 2
"searchCriteria":''}
r = requests.get(base_url, params = payload)
r.json()
Out[9]:
In [10]:
data = r.json()
data['ntasData'][0]['createDate'].replace('T',' ')
Out[10]:
By opening up the inspector tools in the browser, we can see the full list of station ids by hovering over the Select a subway station
dropdown list. Stations increase in number from West to East.
For Line 1 they are numbered 1-32 (from Downsview to Finch, in order)
For Line 2 they are numbered 33-63 (from Kipling to Kennedy)
For Line 4 they are numbered 64-68 (from Sheppard to Don Mills)
Thus we can construct a dictionary that will represent every possible API call:
In [11]:
lines = {1: range(1, 33), #max value must be 1 greater
2: range(33, 64), # west to east (Kipling onwards)
3: range(64, 68)} # also west to east (Sheppard onwards)
In [12]:
def get_API_response(*args):
baseurl = "http://www.ttc.ca/Subway/loadNtas.action"
if len(args) > 1:
line_id = args[0]
station_id = args[2]
payload = {"subwayLine":line_id,
"stationId":station_id,
"searchCriteria":''}
else:
station_id = args[0]
payload = {"stationId":station_id,
"searchCriteria":''}
r = requests.get(baseurl, params = payload)
return r.json()
def insert_request_info(con, data, line_id, station_id):
request_row = {}
request_row['data_'] = data['data']
request_row['stationid'] = station_id
request_row['lineid'] = line_id
request_row['all_stations'] = data['allStations']
request_row['create_date'] = data['ntasData'][0]['createDate'].replace( 'T', ' ')
cursor = con.cursor()
cursor.execute("INSERT INTO public.requests(data_, stationid, lineid, all_stations, create_date)"
"VALUES(%(data_)s, %(stationid)s, %(lineid)s, %(all_stations)s, %(create_date)s)"
"RETURNING requestid", request_row)
request_id = cursor.fetchone()[0]
con.commit()
return request_id
def insert_ntas_data(con, ntas_data, request_id):
cursor = con.cursor()
sql = """INSERT INTO public.ntas_data(
requestid, id, station_char, subwayline, system_message_type,
timint, traindirection, trainid, train_message)
VALUES (%(requestid)s, %(id)s, %(station_char)s, %(subwayline)s, %(system_message_type)s,
%(timint)s, %(traindirection)s, %(trainid)s, %(train_message)s);
"""
for record in ntas_data:
record_row ={}
record_row['requestid'] = request_id
record_row['id'] = record['id']
record_row['station_char'] = record['stationId']
record_row['subwayline'] = record['subwayLine']
record_row['system_message_type'] = record['systemMessageType']
record_row['timint'] = record['timeInt']
record_row['traindirection'] = record['trainDirection']
record_row['trainid'] = record['trainId']
record_row['train_message'] = record['trainMessage']
cursor.execute(sql, record_row)
con.commit()
cursor.close()
def query_stations(con, lines):
data = {}
for line_id, stations in lines.items():
for station_id in stations:
data = get_API_response(station_id)
request_id = insert_request_info(con, data, line_id, station_id)
insert_ntas_data(con, data['ntasData'], request_id)
return data, request_id
Looking at the response above. I've written up a basic schema of two tables to store the responses to the API. it's in create_tables.sql
. Use this file to setup of PostgreSQL database either using terminal (Linux/OSX) or command line (Windows). Alternately, you can download PgAdmin v3 or v4 (depending on your platform) which will provide you with a GUI to setup and manage the database. In the latter case, the default database name is 'postgres' and use 'postgres' as the password as well when setting up the server.
In [13]:
dbsettings = {'database':'ttc',
'user':'postgres'}
# 'host':'localhost'}
con = connect(database = dbsettings['database'],
user = dbsettings['user'])
#host = dbsettings['host'])
In [14]:
data = query_stations(con, lines) # be patient, this command can take a while to complete
In [15]:
data
Out[15]:
In [16]:
lines = {1: [3]}# station_id = 3 (St. George), line_id = 1 (YUS)
In [17]:
data, request_id = query_stations(con, lines)
In [18]:
data
Out[18]:
In [19]:
cursor = con.cursor()
In [20]:
cursor.execute('''SELECT timint FROM ntas_data WHERE requestid = ''' + str(request_id) + ''' limit 10''')
In [21]:
rows = cursor.fetchall()
In [22]:
print(rows)
In [23]:
import numpy
In [25]:
print(numpy.mean(rows)) # Average (expected) wait time at St. George. Note this is not the true wait time.