Setup

You will need access to an ODK Aggregate Server. Before running this notebook, create the following environment variables:

export ODKA_URL="https://aggregate.dbca.wa.gov.au"
export ODKA_UN="my-odk-aggregate-username"
export ODKA_PW="my-odk-aggregate-password"

Helpers

The following section contains a few helpers to simplify ODK API access. With some more refinement, these could become a python module.



In [1]:

    
import json
import os
import requests
from requests.auth import HTTPDigestAuth
from xml.etree import ElementTree

# ---------------------------------------------------------------------------#
# ODK Aggregate API helpers
#
def xmlelem_to_dict(t):
    """Convert a potentially nested XML Element to a dict, strip namespace.

    Source: https://stackoverflow.com/a/19557036/2813717
    Credit: https://stackoverflow.com/users/489638/s29

    Note: creates some superfluous dicts and lists. Needs more work to produce
    a more simplified output.
    """
    return {t.tag.split("}")[-1]: map(xmlelem_to_dict, list(t)) or t.text}


def odka_forms(url=os.environ['ODKA_URL'],
               un=os.environ['ODKA_UN'],
               pw=os.environ['ODKA_PW']):
    """Return an OpenRosa xformsList XML response as list of dicts.

    See https://groups.google.com/forum/#!topic/opendatakit-developers/rfjN1nwYRFY

    Arguments

    url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
        default: the value of environment variable "ODKA_URL".
    un A username that exists on the ODK-A instance.
        Default: the value of environment variable "ODKA_UN".
    pw The username's password.
        Default: the value of environment variable "ODKA_PW".

    Returns
    A list of dicts, each dict contains one xform:

    [
      {'downloadUrl': 'https://dpaw-data.appspot.com/formXml?formId=build_Site-Visit-Start-0-1_1490753483',
       'formID': 'build_Site-Visit-Start-0-1_1490753483',
       'hash': 'md5:c18c69c713c648bac240cbac9eee2d8a',
       'majorMinorVersion': None,
       'name': 'Site Visit Start 0.1',
       'version': None},
      {...repeat for each form...},
    ]
    """
    api = "{0}/xformsList".format(url)
    au = HTTPDigestAuth(un, pw)
    print("[odka_forms] Retrieving xformsList from {0}...".format(url))
    res = requests.get(api, auth=au)
    ns = "{http://openrosa.org/xforms/xformsList}"
    xforms = ElementTree.fromstring(res.content)
    forms = [{x.tag.replace(ns, ""): xform.find(x.tag).text for x in xform}
             for xform in list(xforms)]
    # not quite right:
    # xforms_dict = [xmlelem_to_dict(xform, ns=ns) for xform in list(xforms)]
    print("[odka_forms] Done, retrieved {0} forms.".format(len(forms)))
    return forms


def odka_submission_ids(form_id,
                        limit=10000,
                        url=os.environ['ODKA_URL'],
                        un=os.environ['ODKA_UN'],
                        pw=os.environ['ODKA_PW']):
    """Return a list of submission IDs for a given ODKA formID.

    TODO: should lower numEntries

    Arguments:

    form_id An existing xform formID,
        e.g. 'build_Site-Visit-End-0-1_1490756971'.
    limit The maximum number of submission IDs to retrieve, default: 10000.
    url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
        default: the value of environment variable "ODKA_URL".
    un A username that exists on the ODK-A instance.
        Default: the value of environment variable "ODKA_UN".
    pw The username's password.
        Default: the value of environment variable "ODKA_PW".


    Returns
    A list of submission IDs. Each ID can be used as input for odka_submission().

    Example:

    forms = odka_forms()
    ids = odka_submission_ids(forms[6]["formID"])

    ['uuid:c439fb45-3a1f-4127-be49-571af79a2c63',
     'uuid:13f06748-54a4-4aac-9dc2-e547c80a1c37',
     'uuid:fdde19ad-cc80-48d6-a0fd-adebfa3c5e02',
     'uuid:ce83e6cf-df9f-4705-8cfc-5fc413e22c43',
     'uuid:5d4b2bb6-21c0-4ec9-801a-f13b74a78add',
     'uuid:a9772680-b6f9-45c0-8ed4-189f5e722a6c',
     ...
    ]
    """
    pars = {'formId': form_id, 'numEntries': limit}
    api = "{0}/view/submissionList".format(url)
    au = HTTPDigestAuth(un, pw)
    print("[odka_submission_ids] Retrieving submission IDs for formID '{0}'...".format(form_id))
    res = requests.get(api, auth=au, params=pars)
    el = ElementTree.fromstring(res.content)
    ids = [e.text for e in el.find('{http://opendatakit.org/submissions}idList')]
    print("[odka_submission_ids] Done, retrieved {0} submission IDs.".format(len(ids)))
    return ids


def odka_submission(form_id,
                    submission_id,
                    url=os.environ['ODKA_URL'],
                    un=os.environ['ODKA_UN'],
                    pw=os.environ['ODKA_PW'],
                    verbose=False):
    """Download one ODKA submission and return as ElementTree (goal: dict).

    Arguments:

    form_id An existing xform formID,
        e.g. 'build_Site-Visit-Start-0-1_1490753483'.
    submission_id An existing opendatakit submission ID,
        e.g. 'uuid:a9772680-b6f9-45c0-8ed4-189f5e722a6c'.
    limit The maximum number of submission IDs to retrieve, default: 10000.
    url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
        default: the value of environment variable "ODKA_URL".
    un A username that exists on the ODK-A instance.
        Default: the value of environment variable "ODKA_UN".
    pw The username's password.
        Default: the value of environment variable "ODKA_PW".
    verbose Whether to print verbose log messages, default: False.

    Returns
    WIP - currently the submission as XML Element.

    Example
    d = odka_submission('build_Site-Visit-Start-0-1_1490753483',
        'uuid:a9772680-b6f9-45c0-8ed4-189f5e722a6c')
    list(d)
    """
    api = ("{0}/view/downloadSubmission?formId={1}"
           "[@version=null%20and%20@uiVersion=null]/data[@key={2}]").format(
        url, form_id, submission_id)
    au = HTTPDigestAuth(un, pw)
    print("[odka_submission] Retrieving {0}".format(submission_id))
    if verbose:
        print("[odka_submission] URL {0}".format(api))
    res = requests.get(api, auth=au)
    el = ElementTree.fromstring(res.content)
    return xmlelem_to_dict(el)


def odka_submissions(form_id,
                     url=os.environ['ODKA_URL'],
                     un=os.environ['ODKA_UN'],
                     pw=os.environ['ODKA_PW'],
                     verbose=False):
    """Retrieve a list of all submissions for a given formID.

    Arguments:

    form_id An existing xform formID,
        e.g. 'build_Site-Visit-Start-0-1_1490753483'.
    url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
        default: the value of environment variable "ODKA_URL".
    un A username that exists on the ODK-A instance.
        Default: the value of environment variable "ODKA_UN".
    pw The username's password.
        Default: the value of environment variable "ODKA_PW".
    verbose Whether to print verbose log messages, default: False.

    Example
    forms = odka_forms()
    data = odka_submissions(forms[6]["formID"])
    """
    print("[odka_submissions] Retrieving submissions for formID {0}...".format(form_id))
    d = [odka_submission(form_id, x, url=url, un=un, pw=pw, verbose=verbose)
         for x in odka_submission_ids(form_id, url=url, un=un, pw=pw)]
    print("[odka_submissions] Done, retrieved {0} submissions.".format(len(d)))
    return d


def save_odka(form_id,
              path=".",
              url=os.environ['ODKA_URL'],
              un=os.environ['ODKA_UN'],
              pw=os.environ['ODKA_PW'],
              verbose=False):
    """Save all submissions for a given form_id as JSON to a given path.

    Arguments:

    form_id An existing form_id
    path A locally existing path, default: "."
    url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
        default: the value of environment variable "ODKA_URL".
    un A username that exists on the ODK-A instance.
        Default: the value of environment variable "ODKA_UN".
    pw The username's password.
        Default: the value of environment variable "ODKA_PW".
    verbose Whether to print verbose log messages, default: False.
    """
    with open('{0}/{1}.json'.format(path, form_id), 'w') as outfile:
        json.dump(
            odka_submissions(
                form_id,
                url=url,
                un=un,
                pw=pw,
                verbose=verbose),
            outfile
        )


def save_all_odka(path=".",
                  url=os.environ['ODKA_URL'],
                  un=os.environ['ODKA_UN'],
                  pw=os.environ['ODKA_PW'],
                  verbose=False):
    """Save all submissions for all forms of an odka instance.

    Arguments:

    path A locally existing path, default: "."
    url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
        default: the value of environment variable "ODKA_URL".
    un A username that exists on the ODK-A instance.
        Default: the value of environment variable "ODKA_UN".
    pw The username's password.
        Default: the value of environment variable "ODKA_PW".
    verbose Whether to print verbose log messages, default: False.

    Returns:
    At the specified location (path) for each form, a file will be written
    which containis all submissions (records) for that respective form.
    """
    [save_odka(
        xform['formID'],
        path=path,
        url=url,
        un=un,
        pw=pw,
        verbose=verbose)
     for xform in odka_forms()]


def make_datapackage_json(xform,
                          path=".",
                          url=os.environ['ODKA_URL'],
                          un=os.environ['ODKA_UN'],
                          pw=os.environ['ODKA_PW'],
                          verbose=False,
                          download_submissions=False,
                          download_config=False):
    """Generate a datapacke.json config for a given xform dict.

    Arguments:

    xform An xform dict as produced by odka_forms()
    path The local path to the downloaded submission JSON as produced by save_odka()
    url The OpenRosa xformsList API endpoint of an ODK Aggregate instance,
        default: the value of environment variable "ODKA_URL".
    un A username that exists on the ODK-A instance.
        Default: the value of environment variable "ODKA_UN".
    pw The username's password.
        Default: the value of environment variable "ODKA_PW".
    verbose Whether to print verbose log messages, default: False.
    download_submissions Whether to download submissions
    download_config Whether to write the returned config to a local file

    Returns:
    A dict
    """
    fid = xform["formID"].lower()
    datapackage_path = os.path.join(path, fid)

    if not os.path.exists(datapackage_path):
        os.makedirs(datapackage_path)

    if download_submissions:
        save_odka(
            xform["formID"],
            path=datapackage_path,
            url=url,
            un=un,
            pw=pw,
            verbose=verbose)

    datapackage_config = {
        "name": fid,
        "title": xform["name"],
        "description": "Hash: {0}\nversion: {1}\nmajorMinorVersion: {2}\ndownload URL: {3}".format(
            xform["hash"], xform["version"], xform["majorMinorVersion"], xform["downloadUrl"]),
        "licenses": [
            {
                "id": "odc-pddl",
                "name": "Public Domain Dedication and License",
                "version": "1.0",
                "url": "http://opendatacommons.org/licenses/pddl/1.0/"
            }
        ],
        "resources": [
                {'encoding': 'utf-8',
                 'format': 'json',
                 'mediatype': 'text/json',
                 'name': fid,
                 'path': "{0}/{1}.json".format(datapackage_path, fid),
                 'profile': 'data-resource'}
        ]
        }

    if download_config:
        with open('{0}/datapackage.json'.format(datapackage_path), 'w') as outfile:
            json.dump(datapackage_config, outfile)

    return datapackage_config


# ---------------------------------------------------------------------------#
# Munging JSON output from odka_*
#
def gimme_data(submission_dict):
    """Return the data part of an ODKA submission."""
    return submission_dict["submission"][0]["data"][0]["data"]


def gimme_all(my_iterable, my_key):
    """Return a list of all elements having at least my_key in a given iterable.

    E.g.
    r = {
        "submission": [
          {
            "data": [
              {
                "data": [
                  {
                    "meta": [
                      { "instanceID": "uuid:d7f96001-a126-410c-b33d-407decf068d1" }
                    ]
                  },
                  { "observation_start_time": "2017-10-25T09:39:18.532Z" },
                  { "reporter": "david_porteous" },
                  {
                    "disturbanceobservation": [
                      {
                        "location":
                          "-20.7768750000 116.8622416667 -3.4000000000 4.9000000000"
                      },
                      { "photo_disturbance": "1508924412065.jpg" },
                      { "disturbance_cause": "fox" },
                      { "disturbance_cause_confidence": "expert-opinion" },
                      { "comments": null }
                    ]
                  },
                  { "observation_end_time": "2017-10-25T09:40:37.327Z" }
                ]
              }
            ]
          },
          {
            "mediaFile": [
              { "filename": "1508924412065.jpg" },
              { "hash": "md5:f4f0b5dea646865c27ca0c8c832c5800" },
              {
                "downloadUrl":
                  "https://dpaw-data.appspot.com/view/binaryData?blobKey=..."
              }
            ]
          }
        ]
      }

    gimme_all(gimme_data(r), "reporter")
    ["david_porteous"]
    """
    return [element[my_key] for element in my_iterable if my_key in element]


def gimme(my_iterable, my_key):
    """Return the first match of gimme_all.

    {
        "submission": [
          {
            "data": [
              {
                "data": [
                  {
                    "meta": [
                      { "instanceID": "uuid:d7f96001-a126-410c-b33d-407decf068d1" }
                    ]
                  },
                  { "observation_start_time": "2017-10-25T09:39:18.532Z" },
                  { "reporter": "david_porteous" },
                  {
                    "disturbanceobservation": [
                      {
                        "location":
                          "-20.7768750000 116.8622416667 -3.4000000000 4.9000000000"
                      },
                      { "photo_disturbance": "1508924412065.jpg" },
                      { "disturbance_cause": "fox" },
                      { "disturbance_cause_confidence": "expert-opinion" },
                      { "comments": null }
                    ]
                  },
                  { "observation_end_time": "2017-10-25T09:40:37.327Z" }
                ]
              }
            ]
          },
          {
            "mediaFile": [
              { "filename": "1508924412065.jpg" },
              { "hash": "md5:f4f0b5dea646865c27ca0c8c832c5800" },
              {
                "downloadUrl":
                  "https://dpaw-data.appspot.com/view/binaryData?blobKey=..."
              }
            ]
          }
        ]
      }

    gimme(d, "reporter")
    "david_porteous"
    """
    return gimme_all(my_iterable, my_key)[0]


def gimme_src_id(r):
    """Return the instanceID from an odka_submission."""
    return gimme(r, "meta")[0]["instanceID"]


def gimme_media(r):
    """Return a list of {filename: downloadUrl} for all mediaFiles."""
    return [{gimme(x["mediaFile"], "filename"): gimme(x["mediaFile"], "downloadUrl")}
            for x in r["submission"]
            if "mediaFile" in x]

Examples

List all forms with odka_forms. Use the formID to retrieve all submissions for that form using odka_submissions.



In [2]:

    
forms = odka_forms()









    



[odka_forms] Retrieving xformsList from https://dpaw-data.appspot.com...
[odka_forms] Done, retrieved 7 forms.



In [3]:

    
forms[0]









    Out[3]:





{'downloadUrl': 'https://dpaw-data.appspot.com/formXml?formId=build_Track-Tally-0-5_1502342159',
 'formID': 'build_Track-Tally-0-5_1502342159',
 'hash': 'md5:2607df5d22571e1e55e1b90e90157473',
 'majorMinorVersion': None,
 'name': 'Track Tally 0.5',
 'version': None}



In [26]:

    
submissions = odka_submissions(forms[0]["formID"], verbose=True)









    



[odka_submissions] Retrieving submissions for formID build_Track-Tally-0-5_1502342159...
[odka_submission_ids] Retrieving submission IDs for formID 'build_Track-Tally-0-5_1502342159'...
[odka_submission_ids] Done, retrieved 1 submission IDs.
[odka_submission] Retrieving uuid:a0954e6a-14ff-4099-9bae-0b1bdc466675
[odka_submission] URL https://dpaw-data.appspot.com/view/downloadSubmission?formId=build_Track-Tally-0-5_1502342159[@version=null%20and%20@uiVersion=null]/data[@key=uuid:a0954e6a-14ff-4099-9bae-0b1bdc466675]
[odka_submissions] Done, retrieved 1 submissions.



In [28]:

    
submissions









    Out[28]:





[{'submission': <map at 0x7f9f41adc898>}]

submissions here is a list of dicts like:

[{'submission': [{'data': [{'data': [{'meta': [{'instanceID': 'uuid:a0954e6a-14ff-4099-9bae-0b1bdc466675'}]}, {'observation_start_time': '2017-10-12T08:13:41.990Z'}, {'reporter': 'XXXX'}, {'overview': [{'location': '-17.97119 122.23269499999999 0.0 0.0;-17.971113333333335 122.23275333333335 0.0 0.0;-17.970951666666668 122.23281166666666 0.0 0.0;-17.970821666666666 122.232755 0.0 0.0;-17.970818333333334 122.23259166666666 0.0 0.0;-17.970824999999998 122.23257666666665'}, {'fb_evidence': 'present'}, {'gn_evidence': 'absent'}, {'hb_evidence': 'absent'}, {'lh_evidence': 'absent'}, {'or_evidence': 'absent'}, {'unk_evidence': 'absent'}, {'predation_evidence': 'present'}]}, {'fb': [{'fb_no_old_tracks': '0'}, {'fb_no_fresh_successful_crawls': '1'}, {'fb_no_fresh_false_crawls': '0'}, {'fb_no_fresh_tracks_unsure': '0'}, {'fb_no_fresh_tracks_not_assessed': '0'}, {'fb_no_hatched_nests': '2'}]}, {'gn': [{'gn_no_old_tracks': None}, {'gn_no_fresh_successful_crawls': None}, {'gn_no_fresh_false_crawls': None}, {'gn_no_fresh_tracks_unsure': None}, {'gn_no_fresh_tracks_not_assessed': None}, {'gn_no_hatched_nests': None}]}, {'hb': [{'hb_no_old_tracks': None}, {'hb_no_fresh_successful_crawls': None}, {'hb_no_fresh_false_crawls': None}, {'hb_no_fresh_tracks_unsure': None}, {'hb_no_fresh_tracks_not_assessed': None}, {'hb_no_hatched_nests': None}]}, {'lh': [{'lh_no_old_tracks': None}, {'lh_no_fresh_successful_crawls': None}, {'lh_no_fresh_false_crawls': None}, {'lh_no_fresh_tracks_unsure': None}, {'lh_no_fresh_tracks_not_assessed': None}, {'lh_no_hatched_nests': None}]}, {'or': [{'or_no_old_tracks': None}, {'or_no_fresh_successful_crawls': None}, {'or_no_fresh_false_crawls': None}, {'or_no_fresh_tracks_unsure': None}, {'or_no_fresh_tracks_not_assessed': None}, {'or_no_hatched_nests': None}]}, {'unk': [{'unk_no_old_tracks': None}, {'unk_no_fresh_successful_crawls': None}, {'unk_no_fresh_false_crawls': None}, {'unk_no_fresh_tracks_unsure': None}, {'unk_no_fresh_tracks_not_assessed': None}, {'unk_no_hatched_nests': None}]}, {'disturbance': [{'disturbance_cause': 'vehicle'}, {'no_nests_disturbed': '1'}, {'no_tracks_encountered': None}, {'disturbance_comments': None}]}, {'observation_end_time': '2017-10-12T08:20:04.296Z'}]}]}]}]

To save all submissions of all forms (mind the submission ID pagination) as JSON files, simply run save_all_odka().