Author: Khairil Yusof khairil.yusof@sinarproject.org
Date: 2 July 2016
In constrained environments where valuable information for governance is hard to get, Popit database and API provides a public service for storing and sharing information from different organizations following Popolo international open government data standards, so that people of these countries and environments can also benefit from open data. It creates an enabling environment for countries to get benfits of open data when limited open data on politicians, legislature and government information is provided by their respective governments.
A central database is needed in countries with incomplete data sources, because data needs to be collected first. For countries like Malaysia, information for the database needs to be contributed from a variety of sources, and collaboration from different partners. Rather then a static set from a single source, data is constantly collected and improved until enough is available for it to be useful.
Popolo international open government standards are used for the database schema. When data sourced from different sources and formats, it needs to be mapped to a consistent standard. When data is incomplete, a reference standard provides structure of what data is still missing. A standard also provides for reusability of tools and applications developed by others, while providing new developers an API with a consistent set of fields.
To develop applications using joined up data, it is important to have consistent unique identifier. A central database provides this along with additional identifiers. A database of politically exposed persons and their memberships in committees, boards and organizations also provides a reusable extensive list that removes the need for duplication of research work for persons and organizations such as politicians, senior public officials, constituencies and government departments.
Research team lead by Professor Terence Gomez at University Malaya mapped share ownership and the directors of government linked companies (GLC) and government investment companies (GIC), but needed data to match a few hundred names against known database of politicians and government officials. The exercise quickly provided results and brief descriptions of matches. From this exercise we learned from the research partner that for transparency purposes, that data of senior public officials is also just as important as politicians. Legal system of judiciary and prosecutors should also be added to the database. This would provide a good open data resource for transparency research.
Myanmar recently had elections in 2015, for which data on elected representatives were collected by different parties. Partners Myanmar Fifth Estate and Open Myanmar Initiative collaborated to make multingual legislative information available as public data, starting with their respresentatives, and eventually adding additional linked Popolo data on motions and legislative documents. On a standard public API, this same source of continously improved legislative data can be reused in multiple ways by the public. Initially as a parliamentary monitoring website, it can also be resused for statistics and generating contact lists. In future the same database and API can used as data source for mobile apps, or joined up data for transparency, or reuse other Popit API tools such as Sinar's relationships viewer.
OpenHluttaw also tested a new method of importing and syncing data into Popit database from Google Docs spreadsheet that complied to Popolo standard fields. Initial results show that standards helps ease collaboration and reuse of data in specific fields such as legislature. The use of Google Docs with popolo fields also reduces the barrier to entry for contributions to the central database. In future, ability to download CSV of complete or incomplete lists of names or members from the DB with Popit IDs, would be helpful in improving process of contributions to the central database.
Collaborating with spreadsheets and Popit API
Web page of person: http://openhluttaw.com/person-detail/?personId=7c77665ab3fb4ce781e48b7b4906207d
API data source: http://api.openhluttaw.org/en/persons/7c77665ab3fb4ce781e48b7b4906207d
https://github.com/popolo-project/popolo-spec/issues/56
Django Rest Framework backed by Postgresql Database was chosen over Node.js and Mongo DB API, for better data integrity by enforcing data types and foreign key relationships, while providing flexibility for other fields by storing JSON values and GIS features with spacial objects support.
MySociety also opted for the same decision as Sinar for their YourNextRepresentative website and developer Mark Longair lists out in detail the technical decision for this choice.
Popit Next Gen source code can be downloaded from GitHub.
Organizations in countries that need support with setting up Popit DB and API in their country can contact team@sinarproject.org for support.
1. CRUD API for Person, Organization, Post, Membership following popolo standard.
2. Implement Othername, ContactDetails, Area, Links, Identifier following popolo standard.
3. Search API for Person, Organization, Post, Membership. Including any entity on 2. that is embedded.
4. Multilingual support for the feature 1., 2. and 3.
5. Support for json output.
6. Support for API to be displayed on browser.
7. Extensive supporting unit test for supported feature.
8. Extend links to support citation by having an optional field value. There no API to easily browse citations yet.
In [9]:
import requests
import pandas
import json
amyotha_req = requests.get('http://api.openhluttaw.org/en/organizations/897739b2831e41109713ac9d8a96c845')
memberships = json.loads(amyotha_req.content)['result']['memberships']
amyotha = []
for member in memberships:
r = requests.get('http://api.openhluttaw.org/en/organizations/' + member['on_behalf_of_id'])
if json.loads(r.content)['result']['name']:
party = json.loads(r.content)['result']['name']
amyotha.append({'consituency': member['post']['label'],
'party':party })
amyotha_df = pandas.DataFrame(amyotha)
%matplotlib inline
parties = amyotha_df['party']
pie = parties.value_counts()
pie.plot.pie(figsize=(10,10))
Out[9]:
In [10]:
import requests
import pandas
import json
amyotha_req = requests.get('http://api.openhluttaw.org/en/organizations/897739b2831e41109713ac9d8a96c845')
memberships = json.loads(amyotha_req.content)['result']['memberships']
amyotha_my = []
for member in memberships:
r = requests.get('http://api.openhluttaw.org/my/organizations/' + member['on_behalf_of_id'])
if json.loads(r.content)['result']['name']:
party = json.loads(r.content)['result']['name']
amyotha_my.append({'consituency': member['post']['label'],
'party':party , 'gender':member['person']['gender'].lower() })
amyotha_my_df = pandas.DataFrame(amyotha_my)
amyotha_df_gender=amyotha_my_df.drop('consituency',axis=1)
gender_counts = amyotha_df_gender.groupby('party')['gender'].value_counts()
gender_counts
Out[10]:
In [11]:
index_party = []
gender_values = []
for party in gender_counts.index:
index_party.append(party[0])
index_party = list(set(index_party))
for party in index_party:
male_count = gender_counts[party].male
if 'female' in gender_counts[party].index:
female_count = gender_counts[party].female
else:
female_count=0
gender_values.append([male_count,female_count])
gender_df = pandas.DataFrame(gender_values, index=index_party, columns=['ကျား','မ'])
import matplotlib
%matplotlib inline
matplotlib.rc('font', family='Padauk') #Needed for proper rendering of characters
gender_df.plot.barh(stacked=True,figsize=(12,5))
Out[11]:
With at standards based API, some lower level tools as well as some applications can be reused by different implementing partners.
Visual explorer tool for Popit API/DB is a working proof of concept to allow users to interactively explore relationships between PEPs and organizations with live data from Popit API.
Source code: https://github.com/Sinar/popit_visualizer