Written Questions - Flow From Party to Answering Body

Inspired by @philbgorman / https://cdn.rawgit.com/Mognar/sankey/master/index.html

Documenting the generation of a Sankey diagram using Python to chart the number of responses from Answering Bodies to questions tabled by members of the House of Commons of a particular party for the 2015/16 Parliamentary session.



In [1]:

    
#Quick peek at the API

#http://lda.data.parliament.uk/commonswrittenquestions.json?_view=Written+Questions&_pageSize=10&_page=0
stub='http://lda.data.parliament.uk'.strip('/')

#http://explore.data.parliament.uk/?learnmore=Commons%20Written%20Questions
#commonswrittenquestions	All Commons Written Questions
#commonswrittenquestions/answeringdepartment?q={query}	All Commons Written Questions By Answering Body
#commonswrittenquestions?mnisId={mnisId}	All Commons Written Questions By Member
#commonswrittenquestions?session={session}	All Commons Written Questions By Session
#commonswrittenquestions/tabled?startDate={startDate}&endDate={endDate}	All Commons Written Questions By Table Date
#commonswrittenquestions?uin={uin}	All Commons Written Questions By UIN
#commonswrittenquestions/{id}	Commons Written Questions by ID



In [6]:

    
#We're going to have to call the API somehow
import requests

#To make thinks more efficient if we do this again, cache requests
import requests_cache
requests_cache.install_cache('parlidata_cache', backend='sqlite')
#(It helps if I remember to specify a persistent backend.., doh!)



In [7]:

    
#Get data from URL
def getURL(url):
    print(url)
    r=requests.get(url)
    print(r.status_code)
    return r

#Should build a proper recursive loader
def loader(url):
    items=[]
    done=False
    r=getURL(url)
    while not done:
        items=items+r.json()['result']['items']
        if 'next' in r.json()['result']:
            r=getURL(r.json()['result']['next']+'&_pageSize=500')
        else: done=True
    return items



In [8]:

    
#Get the data for a particular session
url='{}/{}.json?session={}'.format(stub,'commonswrittenquestions','2015/16')
items=loader(url)









    



http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=1&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=2&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=3&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=4&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=5&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=6&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=7&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=8&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=9&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=10&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=11&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=12&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=13&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=14&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=15&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=16&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=17&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=18&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=19&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=20&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=21&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=22&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=23&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=24&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=25&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=26&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=27&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=28&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=29&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=30&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=31&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=32&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=33&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=34&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=35&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=36&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=37&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=38&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=39&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=40&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=41&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=42&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=43&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=44&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=45&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=46&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=47&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=48&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=49&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=50&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=51&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=52&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=53&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=54&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=55&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=56&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=57&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=58&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=59&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=60&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=61&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=62&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=63&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=64&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=65&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=66&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=67&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=68&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=69&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=70&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=71&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=72&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=73&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=74&_pageSize=500
200
http://lda.data.parliament.uk/commonswrittenquestions.json?session=2015/16&_page=75&_pageSize=500
200



In [9]:

    
#Preview of the record for a single item
#As this is all JSON, maybe consider looking at https://www.sqlite.org/json1.html
#Or use mongo etc?
items[0]









    Out[9]:





{'AnswerDate': {'_datatype': 'dateTime', '_value': '2016-05-12'},
 'AnsweringBody': [{'_value': 'Prime Minister'}],
 '_about': 'http://data.parliament.uk/resources/518530',
 'dateTabled': {'_datatype': 'dateTime', '_value': '2016-05-11'},
 'questionText': 'To ask the Prime Minister, if he will clarify his recent remarks concerning Suliman Gani.',
 'tablingMember': {'_about': 'http://data.parliament.uk/members/126',
  'label': {'_value': 'Biography information for Mr Keith Simpson'}},
 'tablingMemberPrinted': [{'_value': 'Mr Keith Simpson'}],
 'title': 'House of Commons Tabled Parliamentary Question 2015/16 37017',
 'uin': '37017'}

The answer data includes a reference to the Member who tabled the question, but not their party. We need to pull that from the member database.



In [10]:

    
#Function to get member data back from the member API.
#Note the form of the test URL...
def memberData(memberURL):
    r=requests.get(memberURL)
    return r.json()['result']['primaryTopic']

memberData('http://lda.data.parliament.uk/members/4393.json')









    Out[10]:





{'_about': 'http://data.parliament.uk/members/4393',
 'additionalName': {'_value': 'Campbell'},
 'constituency': {'_about': 'http://data.parliament.uk/resources/144264',
  'label': {'_value': 'Cumbernauld, Kilsyth and Kirkintilloch East'}},
 'familyName': {'_value': 'McDonald'},
 'fullName': {'_value': 'Stuart C. McDonald'},
 'gender': {'_value': 'Male'},
 'givenName': {'_value': 'Stuart'},
 'homePage': 'http://stuartmcdonaldmp.com/',
 'isPrimaryTopicOf': 'http://lda.data.parliament.uk/members/4393.json',
 'label': {'_value': 'Biography information for Stuart C. McDonald'},
 'party': {'_value': 'Scottish National Party'},
 'twitter': {'_value': 'https://twitter.com/stuart_mcdonald'}}



In [11]:

    
#Generate a set of members who have tabled questions that have been answered
#Note that the identifier Linked Data URL doesn't link... so patch it...
members= {'{}.json'.format(i['tablingMember']['_about'].replace('http://','http://lda.')) for i in items}

#Get member data for each member
membersdata=[memberData(m) for m in members]



In [12]:

    
#Generate a lookup of the party from the member identifer as a dict
def getParty(memberDataURL):
    for m in membersdata:
        if m['_about']==memberDataURL:
            return m['party']['_value']
    return {}

memberparty={m['_about']:getParty(m['_about']) for m in membersdata}



In [13]:

    
#Get the party of the person who tabled an answered question and answering department(s)
def getPartyDept(item):
    rel=[]
    party=memberparty[item['tablingMember']['_about']]
    #Note that there may be more than one answering body, so return a list of (party,answering body) pairs
    for body in item['AnsweringBody']:
        rel.append((party,body['_value']))
    return rel

#For each answer, lookup the party of the member who tabled the question and the answering department(s)
#Use this cunning trick to collapse the nested list of responses to a flat list of (party,answering body) pairs
#http://stackoverflow.com/a/952952/454773
#[item for sublist in [[1,2],[3]] for item in sublist]
rels=[rel for s in [getPartyDept(q) for q in items] for rel in s]



In [14]:

    
rels[:3]









    Out[14]:





[('Conservative', 'Prime Minister'),
 ('Labour', 'Department for Business, Innovation and Skills'),
 ('Labour', 'Department for Business, Innovation and Skills')]



In [15]:

    
#Generate a dataframe of the pairs...
import pandas as pd
df=pd.DataFrame(rels)
df.columns=['party','answeringBody']

#Count the number of responses for each (party, answering body) pair
dfc=df.groupby(['party','answeringBody']).size().reset_index()
dfc.columns=['source','target','value']
dfc.head()









    Out[15]:






  
    
      
      source
      target
      value
    
  
  
    
      0
      Conservative
      Attorney General
      56
    
    
      1
      Conservative
      Cabinet Office
      202
    
    
      2
      Conservative
      Church Commissioners
      12
    
    
      3
      Conservative
      Department for Business, Innovation and Skills
      430
    
    
      4
      Conservative
      Department for Communities and Local Government
      425



In [16]:

    
#Generate the list of dicts the Sankey widget expects
links=dfc.dropna().to_dict(orient='records')
links[:3]









    Out[16]:





[{'source': 'Conservative', 'target': 'Attorney General', 'value': 56},
 {'source': 'Conservative', 'target': 'Cabinet Office', 'value': 202},
 {'source': 'Conservative', 'target': 'Church Commissioners', 'value': 12}]



In [17]:

    
#And generate the Sanley diagram
from ipysankeywidget import SankeyWidget
SankeyWidget(value={'links': links},
             width=1000, height=800,margins=dict(top=0, bottom=0,left=250,right=250))



In [ ]:

	source	target	value
0	Conservative	Attorney General	56
1	Conservative	Cabinet Office	202
2	Conservative	Church Commissioners	12
3	Conservative	Department for Business, Innovation and Skills	430
4	Conservative	Department for Communities and Local Government	425