You will need access to an ODK Aggregate Server. Before running this notebook, create the following environment variables:
export ODKA_URL="https://aggregate.dbca.wa.gov.au"
export ODKA_UN="my-odk-aggregate-username"
export ODKA_PW="my-odk-aggregate-password"
The following section contains a few helpers to simplify ODK API access. With some more refinement, these could become a python module.
In [1]:
import json
import os
import requests
from requests.auth import HTTPDigestAuth
from xml.etree import ElementTree
# ---------------------------------------------------------------------------#
# ODK Aggregate API helpers
#
def xmlelem_to_dict(t):
"""Convert a potentially nested XML Element to a dict, strip namespace.
Source: https://stackoverflow.com/a/19557036/2813717
Credit: https://stackoverflow.com/users/489638/s29
Note: creates some superfluous dicts and lists. Needs more work to produce
a more simplified output.
"""
return {t.tag.split("}")[-1]: map(xmlelem_to_dict, list(t)) or t.text}
def odka_forms(url=os.environ['ODKA_URL'],
un=os.environ['ODKA_UN'],
pw=os.environ['ODKA_PW']):
"""Return an OpenRosa xformsList XML response as list of dicts.
See https://groups.google.com/forum/#!topic/opendatakit-developers/rfjN1nwYRFY
Arguments
url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
default: the value of environment variable "ODKA_URL".
un A username that exists on the ODK-A instance.
Default: the value of environment variable "ODKA_UN".
pw The username's password.
Default: the value of environment variable "ODKA_PW".
Returns
A list of dicts, each dict contains one xform:
[
{'downloadUrl': 'https://dpaw-data.appspot.com/formXml?formId=build_Site-Visit-Start-0-1_1490753483',
'formID': 'build_Site-Visit-Start-0-1_1490753483',
'hash': 'md5:c18c69c713c648bac240cbac9eee2d8a',
'majorMinorVersion': None,
'name': 'Site Visit Start 0.1',
'version': None},
{...repeat for each form...},
]
"""
api = "{0}/xformsList".format(url)
au = HTTPDigestAuth(un, pw)
print("[odka_forms] Retrieving xformsList from {0}...".format(url))
res = requests.get(api, auth=au)
ns = "{http://openrosa.org/xforms/xformsList}"
xforms = ElementTree.fromstring(res.content)
forms = [{x.tag.replace(ns, ""): xform.find(x.tag).text for x in xform}
for xform in list(xforms)]
# not quite right:
# xforms_dict = [xmlelem_to_dict(xform, ns=ns) for xform in list(xforms)]
print("[odka_forms] Done, retrieved {0} forms.".format(len(forms)))
return forms
def odka_submission_ids(form_id,
limit=10000,
url=os.environ['ODKA_URL'],
un=os.environ['ODKA_UN'],
pw=os.environ['ODKA_PW']):
"""Return a list of submission IDs for a given ODKA formID.
TODO: should lower numEntries
Arguments:
form_id An existing xform formID,
e.g. 'build_Site-Visit-End-0-1_1490756971'.
limit The maximum number of submission IDs to retrieve, default: 10000.
url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
default: the value of environment variable "ODKA_URL".
un A username that exists on the ODK-A instance.
Default: the value of environment variable "ODKA_UN".
pw The username's password.
Default: the value of environment variable "ODKA_PW".
Returns
A list of submission IDs. Each ID can be used as input for odka_submission().
Example:
forms = odka_forms()
ids = odka_submission_ids(forms[6]["formID"])
['uuid:c439fb45-3a1f-4127-be49-571af79a2c63',
'uuid:13f06748-54a4-4aac-9dc2-e547c80a1c37',
'uuid:fdde19ad-cc80-48d6-a0fd-adebfa3c5e02',
'uuid:ce83e6cf-df9f-4705-8cfc-5fc413e22c43',
'uuid:5d4b2bb6-21c0-4ec9-801a-f13b74a78add',
'uuid:a9772680-b6f9-45c0-8ed4-189f5e722a6c',
...
]
"""
pars = {'formId': form_id, 'numEntries': limit}
api = "{0}/view/submissionList".format(url)
au = HTTPDigestAuth(un, pw)
print("[odka_submission_ids] Retrieving submission IDs for formID '{0}'...".format(form_id))
res = requests.get(api, auth=au, params=pars)
el = ElementTree.fromstring(res.content)
ids = [e.text for e in el.find('{http://opendatakit.org/submissions}idList')]
print("[odka_submission_ids] Done, retrieved {0} submission IDs.".format(len(ids)))
return ids
def odka_submission(form_id,
submission_id,
url=os.environ['ODKA_URL'],
un=os.environ['ODKA_UN'],
pw=os.environ['ODKA_PW'],
verbose=False):
"""Download one ODKA submission and return as ElementTree (goal: dict).
Arguments:
form_id An existing xform formID,
e.g. 'build_Site-Visit-Start-0-1_1490753483'.
submission_id An existing opendatakit submission ID,
e.g. 'uuid:a9772680-b6f9-45c0-8ed4-189f5e722a6c'.
limit The maximum number of submission IDs to retrieve, default: 10000.
url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
default: the value of environment variable "ODKA_URL".
un A username that exists on the ODK-A instance.
Default: the value of environment variable "ODKA_UN".
pw The username's password.
Default: the value of environment variable "ODKA_PW".
verbose Whether to print verbose log messages, default: False.
Returns
WIP - currently the submission as XML Element.
Example
d = odka_submission('build_Site-Visit-Start-0-1_1490753483',
'uuid:a9772680-b6f9-45c0-8ed4-189f5e722a6c')
list(d)
"""
api = ("{0}/view/downloadSubmission?formId={1}"
"[@version=null%20and%20@uiVersion=null]/data[@key={2}]").format(
url, form_id, submission_id)
au = HTTPDigestAuth(un, pw)
print("[odka_submission] Retrieving {0}".format(submission_id))
if verbose:
print("[odka_submission] URL {0}".format(api))
res = requests.get(api, auth=au)
el = ElementTree.fromstring(res.content)
return xmlelem_to_dict(el)
def odka_submissions(form_id,
url=os.environ['ODKA_URL'],
un=os.environ['ODKA_UN'],
pw=os.environ['ODKA_PW'],
verbose=False):
"""Retrieve a list of all submissions for a given formID.
Arguments:
form_id An existing xform formID,
e.g. 'build_Site-Visit-Start-0-1_1490753483'.
url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
default: the value of environment variable "ODKA_URL".
un A username that exists on the ODK-A instance.
Default: the value of environment variable "ODKA_UN".
pw The username's password.
Default: the value of environment variable "ODKA_PW".
verbose Whether to print verbose log messages, default: False.
Example
forms = odka_forms()
data = odka_submissions(forms[6]["formID"])
"""
print("[odka_submissions] Retrieving submissions for formID {0}...".format(form_id))
d = [odka_submission(form_id, x, url=url, un=un, pw=pw, verbose=verbose)
for x in odka_submission_ids(form_id, url=url, un=un, pw=pw)]
print("[odka_submissions] Done, retrieved {0} submissions.".format(len(d)))
return d
def save_odka(form_id,
path=".",
url=os.environ['ODKA_URL'],
un=os.environ['ODKA_UN'],
pw=os.environ['ODKA_PW'],
verbose=False):
"""Save all submissions for a given form_id as JSON to a given path.
Arguments:
form_id An existing form_id
path A locally existing path, default: "."
url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
default: the value of environment variable "ODKA_URL".
un A username that exists on the ODK-A instance.
Default: the value of environment variable "ODKA_UN".
pw The username's password.
Default: the value of environment variable "ODKA_PW".
verbose Whether to print verbose log messages, default: False.
"""
with open('{0}/{1}.json'.format(path, form_id), 'w') as outfile:
json.dump(
odka_submissions(
form_id,
url=url,
un=un,
pw=pw,
verbose=verbose),
outfile
)
def save_all_odka(path=".",
url=os.environ['ODKA_URL'],
un=os.environ['ODKA_UN'],
pw=os.environ['ODKA_PW'],
verbose=False):
"""Save all submissions for all forms of an odka instance.
Arguments:
path A locally existing path, default: "."
url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
default: the value of environment variable "ODKA_URL".
un A username that exists on the ODK-A instance.
Default: the value of environment variable "ODKA_UN".
pw The username's password.
Default: the value of environment variable "ODKA_PW".
verbose Whether to print verbose log messages, default: False.
Returns:
At the specified location (path) for each form, a file will be written
which containis all submissions (records) for that respective form.
"""
[save_odka(
xform['formID'],
path=path,
url=url,
un=un,
pw=pw,
verbose=verbose)
for xform in odka_forms()]
def make_datapackage_json(xform,
path=".",
url=os.environ['ODKA_URL'],
un=os.environ['ODKA_UN'],
pw=os.environ['ODKA_PW'],
verbose=False,
download_submissions=False,
download_config=False):
"""Generate a datapacke.json config for a given xform dict.
Arguments:
xform An xform dict as produced by odka_forms()
path The local path to the downloaded submission JSON as produced by save_odka()
url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
default: the value of environment variable "ODKA_URL".
un A username that exists on the ODK-A instance.
Default: the value of environment variable "ODKA_UN".
pw The username's password.
Default: the value of environment variable "ODKA_PW".
verbose Whether to print verbose log messages, default: False.
download_submissions Whether to download submissions
download_config Whether to write the returned config to a local file
Returns:
A dict
"""
fid = xform["formID"].lower()
datapackage_path = os.path.join(path, fid)
if not os.path.exists(datapackage_path):
os.makedirs(datapackage_path)
if download_submissions:
save_odka(
xform["formID"],
path=datapackage_path,
url=url,
un=un,
pw=pw,
verbose=verbose)
datapackage_config = {
"name": fid,
"title": xform["name"],
"description": "Hash: {0}\nversion: {1}\nmajorMinorVersion: {2}\ndownload URL: {3}".format(
xform["hash"], xform["version"], xform["majorMinorVersion"], xform["downloadUrl"]),
"licenses": [
{
"id": "odc-pddl",
"name": "Public Domain Dedication and License",
"version": "1.0",
"url": "http://opendatacommons.org/licenses/pddl/1.0/"
}
],
"resources": [
{'encoding': 'utf-8',
'format': 'json',
'mediatype': 'text/json',
'name': fid,
'path': "{0}/{1}.json".format(datapackage_path, fid),
'profile': 'data-resource'}
]
}
if download_config:
with open('{0}/datapackage.json'.format(datapackage_path), 'w') as outfile:
json.dump(datapackage_config, outfile)
return datapackage_config
# ---------------------------------------------------------------------------#
# Munging JSON output from odka_*
#
def gimme_data(submission_dict):
"""Return the data part of an ODKA submission."""
return submission_dict["submission"][0]["data"][0]["data"]
def gimme_all(my_iterable, my_key):
"""Return a list of all elements having at least my_key in a given iterable.
E.g.
r = {
"submission": [
{
"data": [
{
"data": [
{
"meta": [
{ "instanceID": "uuid:d7f96001-a126-410c-b33d-407decf068d1" }
]
},
{ "observation_start_time": "2017-10-25T09:39:18.532Z" },
{ "reporter": "david_porteous" },
{
"disturbanceobservation": [
{
"location":
"-20.7768750000 116.8622416667 -3.4000000000 4.9000000000"
},
{ "photo_disturbance": "1508924412065.jpg" },
{ "disturbance_cause": "fox" },
{ "disturbance_cause_confidence": "expert-opinion" },
{ "comments": null }
]
},
{ "observation_end_time": "2017-10-25T09:40:37.327Z" }
]
}
]
},
{
"mediaFile": [
{ "filename": "1508924412065.jpg" },
{ "hash": "md5:f4f0b5dea646865c27ca0c8c832c5800" },
{
"downloadUrl":
"https://dpaw-data.appspot.com/view/binaryData?blobKey=..."
}
]
}
]
}
gimme_all(gimme_data(r), "reporter")
["david_porteous"]
"""
return [element[my_key] for element in my_iterable if my_key in element]
def gimme(my_iterable, my_key):
"""Return the first match of gimme_all.
{
"submission": [
{
"data": [
{
"data": [
{
"meta": [
{ "instanceID": "uuid:d7f96001-a126-410c-b33d-407decf068d1" }
]
},
{ "observation_start_time": "2017-10-25T09:39:18.532Z" },
{ "reporter": "david_porteous" },
{
"disturbanceobservation": [
{
"location":
"-20.7768750000 116.8622416667 -3.4000000000 4.9000000000"
},
{ "photo_disturbance": "1508924412065.jpg" },
{ "disturbance_cause": "fox" },
{ "disturbance_cause_confidence": "expert-opinion" },
{ "comments": null }
]
},
{ "observation_end_time": "2017-10-25T09:40:37.327Z" }
]
}
]
},
{
"mediaFile": [
{ "filename": "1508924412065.jpg" },
{ "hash": "md5:f4f0b5dea646865c27ca0c8c832c5800" },
{
"downloadUrl":
"https://dpaw-data.appspot.com/view/binaryData?blobKey=..."
}
]
}
]
}
gimme(d, "reporter")
"david_porteous"
"""
return gimme_all(my_iterable, my_key)[0]
def gimme_src_id(r):
"""Return the instanceID from an odka_submission."""
return gimme(r, "meta")[0]["instanceID"]
def gimme_media(r):
"""Return a list of {filename: downloadUrl} for all mediaFiles."""
return [{gimme(x["mediaFile"], "filename"): gimme(x["mediaFile"], "downloadUrl")}
for x in r["submission"]
if "mediaFile" in x]
In [2]:
forms = odka_forms()
In [3]:
forms[0]
Out[3]:
In [26]:
submissions = odka_submissions(forms[0]["formID"], verbose=True)
In [28]:
submissions
Out[28]:
submissions here is a list of dicts like:
[{'submission': [{'data': [{'data': [{'meta': [{'instanceID': 'uuid:a0954e6a-14ff-4099-9bae-0b1bdc466675'}]}, {'observation_start_time': '2017-10-12T08:13:41.990Z'}, {'reporter': 'XXXX'}, {'overview': [{'location': '-17.97119 122.23269499999999 0.0 0.0;-17.971113333333335 122.23275333333335 0.0 0.0;-17.970951666666668 122.23281166666666 0.0 0.0;-17.970821666666666 122.232755 0.0 0.0;-17.970818333333334 122.23259166666666 0.0 0.0;-17.970824999999998 122.23257666666665'}, {'fb_evidence': 'present'}, {'gn_evidence': 'absent'}, {'hb_evidence': 'absent'}, {'lh_evidence': 'absent'}, {'or_evidence': 'absent'}, {'unk_evidence': 'absent'}, {'predation_evidence': 'present'}]}, {'fb': [{'fb_no_old_tracks': '0'}, {'fb_no_fresh_successful_crawls': '1'}, {'fb_no_fresh_false_crawls': '0'}, {'fb_no_fresh_tracks_unsure': '0'}, {'fb_no_fresh_tracks_not_assessed': '0'}, {'fb_no_hatched_nests': '2'}]}, {'gn': [{'gn_no_old_tracks': None}, {'gn_no_fresh_successful_crawls': None}, {'gn_no_fresh_false_crawls': None}, {'gn_no_fresh_tracks_unsure': None}, {'gn_no_fresh_tracks_not_assessed': None}, {'gn_no_hatched_nests': None}]}, {'hb': [{'hb_no_old_tracks': None}, {'hb_no_fresh_successful_crawls': None}, {'hb_no_fresh_false_crawls': None}, {'hb_no_fresh_tracks_unsure': None}, {'hb_no_fresh_tracks_not_assessed': None}, {'hb_no_hatched_nests': None}]}, {'lh': [{'lh_no_old_tracks': None}, {'lh_no_fresh_successful_crawls': None}, {'lh_no_fresh_false_crawls': None}, {'lh_no_fresh_tracks_unsure': None}, {'lh_no_fresh_tracks_not_assessed': None}, {'lh_no_hatched_nests': None}]}, {'or': [{'or_no_old_tracks': None}, {'or_no_fresh_successful_crawls': None}, {'or_no_fresh_false_crawls': None}, {'or_no_fresh_tracks_unsure': None}, {'or_no_fresh_tracks_not_assessed': None}, {'or_no_hatched_nests': None}]}, {'unk': [{'unk_no_old_tracks': None}, {'unk_no_fresh_successful_crawls': None}, {'unk_no_fresh_false_crawls': None}, {'unk_no_fresh_tracks_unsure': None}, {'unk_no_fresh_tracks_not_assessed': None}, {'unk_no_hatched_nests': None}]}, {'disturbance': [{'disturbance_cause': 'vehicle'}, {'no_nests_disturbed': '1'}, {'no_tracks_encountered': None}, {'disturbance_comments': None}]}, {'observation_end_time': '2017-10-12T08:20:04.296Z'}]}]}]}]
To save all submissions of all forms (mind the submission ID pagination) as JSON files, simply run save_all_odka().