Welcome to Gobble

Gobble is the Python client for Open-Spending


In [1]:
from gobble import pull, validate, push


[Gobble] Hello loic! You are logged into Open-Spending

Pushing a datapackage to Open-Spending


In [2]:
help(push)


Help on function push in module gobble.upload:

push(target, publish=False)
    Upload a fiscal datapackage to Open-Spending.
    
    The target is the full path to the fiscal datapackage JSON descriptor,
    but it can also be a dictionary representing the schema itself or a url
    pointing to a descriptor (for more information please refer to the
    documentation for the :class:`datapackage.DataPackage` class.
    
    By default, newly uploaded packages are kept private, but you can change
    that. Also note that if you upload a datapackage twice, the first one will
    be overwritten. For now, the only valid datafile format is CSV.
    
    :param publish: toggle the datapackage to "published" after upload
    :param target: absolute path to package descriptor or url or schema


In [2]:
batch = push('/home/loic/repos/gobble/assets/datapackage/datapackage.json')


[Gobble] mexican-federal-budget is a valid datapackage
[Gobble] Starting uploading process for mexican-federal-budget
[Gobble] data/data.csv is ready for upload to http://fakes3/fake-bucket/5df4a7b06a940c992d1c44525daff47b/mexican-federal-budget/data/data.csv
[Gobble] datapackage.json is ready for upload to http://fakes3/fake-bucket/5df4a7b06a940c992d1c44525daff47b/mexican-federal-budget/datapackage.json
[Gobble] Successful S3 upload: http://fakes3/fake-bucket/5df4a7b06a940c992d1c44525daff47b/mexican-federal-budget/data/data.csv?Content-Length=50556&Content-MD5=%2BuqBmwvQLi0M2W2enNxD%2FA%3D%3D
[Gobble] Successful S3 upload: http://fakes3/fake-bucket/5df4a7b06a940c992d1c44525daff47b/mexican-federal-budget/datapackage.json?Content-Length=16454&Content-MD5=FVkI2t1HIOgQXK5cZuu2oQ%3D%3D
[Gobble] Congratuations, mexican-federal-budget was uploaded successfully!
[Gobble] You can find you fiscal datapackage here: http://dev.openspending.org/5df4a7b06a940c992d1c44525daff47b:mexican-federal-budget

In [19]:
batch.in_progress


Out[19]:
False

In [37]:
batch.name


Out[37]:
'mexican-federal-budget'

In [39]:
batch.filepath


Out[39]:
'/home/loic/repos/gobble/assets/datapackage/datapackage.json'

In [11]:
batch.files


Out[11]:
{'filedata': {'data/data.csv': {'length': 50556,
   'md5': '+uqBmwvQLi0M2W2enNxD/A==',
   'name': 'data',
   'type': 'text/csv'},
  'datapackage.json': {'length': 16454,
   'md5': 'FVkI2t1HIOgQXK5cZuu2oQ==',
   'name': 'mexican-federal-budget',
   'type': 'text/json'}},
 'metadata': {'name': 'mexican-federal-budget',
  'owner': '5df4a7b06a940c992d1c44525daff47b'}}

In [13]:
batch.os_url


Out[13]:
'http://dev.openspending.org/5df4a7b06a940c992d1c44525daff47b:mexican-federal-budget'

In [16]:
len(batch)


Out[16]:
1

In [22]:
resource = batch[0]
resource


Out[22]:
<datapackage.resource.TabularResource at 0x7f82a83bae48>

In [1]:
for row in resource.iter():
    for column in row.items():
        print(column)
        break


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-cd3ba8a3106a> in <module>()
----> 1 for row in resource.iter():
      2     for column in row.items():
      3         print(column)
      4         break

NameError: name 'resource' is not defined

This is almost equivalent to:


In [61]:
resource.data[0]


Out[61]:
{'Actividad_Institucional': 'Equipo e infraestructura militares de calidad',
 'Adefas': Decimal('0.0'),
 'Aprobado': Decimal('0.0'),
 'Ciclo': 2015,
 'Devengado': Decimal('6223.02'),
 'EF': 'Nuevo León',
 'Ejercicio': Decimal('6223.02'),
 'FIN': 'Gobierno',
 'Fuente_Financiamiento': 'Recursos fiscales',
 'Funcion': 'Seguridad Nacional',
 'ID_AI': '4',
 'ID_CC': '14071170012',
 'ID_EF': '19',
 'ID_FF': '1',
 'ID_FIN': '1',
 'ID_Funcion': '6',
 'ID_Modalidad': 'K',
 'ID_OG': '29801',
 'ID_PP': '19',
 'ID_Ramo': '7',
 'ID_Subfuncion': '1',
 'ID_TG': '3',
 'ID_UR': '117',
 'Modalidad': 'Proyectos de Inversión',
 'Modificado': Decimal('6223.02'),
 'Objeto_Gasto': 'Refacciones y accesorios menores de maquinaria y otros equipos',
 'PP': 'Proyectos de infraestructura gubernamental de seguridad nacional',
 'Pagado': Decimal('6223.02'),
 'Ramo': 'Defensa Nacional',
 'Subfuncion': 'Defensa',
 'Tipo_Gasto': 'Gasto de obra pública',
 'UR': 'Dirección General de Ingenieros'}

In [44]:
batch.save('/home/loic/test.zip')

Validating a fiscal-datapackage

Pushing a package with an invalid schema will fail.


In [4]:
push({'foo': 'bar'})


[Gobble] Validation error: 'name' is a required property
[Gobble] Validation error: 'title' is a required property
[Gobble] Validation error: 'resources' is a required property
[Gobble] Validation error: 'model' is a required property
---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
<ipython-input-4-149250462baf> in <module>()
----> 1 push({'foo': 'bar'})

/home/loic/repos/gobble/gobble/upload.py in push(target, publish)
     51     :param target: absolute path to package descriptor or url or schema
     52     """
---> 53     batch = Batch(target)
     54 
     55     for target in batch.request_s3_urls():

/home/loic/repos/gobble/gobble/upload.py in __init__(self, target, schema, **kwargs)
    166         super(Batch, self).__init__(target, schema=schema, **kwargs)
    167 
--> 168         validate(target)
    169         self._check_file_formats()
    170 

/home/loic/repos/gobble/gobble/upload.py in validate(target, raise_error, schema)
    128         if raise_error:
    129             message = 'Cannot upload %s because it has %s errors'
--> 130             raise ValidationError(message % (name, len(messages)))
    131         else:
    132             return messages

ValidationError: Cannot upload datapackage because it has 4 errors

To return a list of error messages, use the validate function with the raise_error set to False.


In [40]:
help(validate)


Help on function validate in module gobble.upload:

validate(target, raise_error=True, schema='fiscal')
    Validate a datapackage schema.
    
    :param target: A valid datapackage target (`datapackage.DataPackage`).
    :param raise_error: raise a `datapackage.Validation` error if invalid
    :param schema: the schema to validate against:
    
    :return By default, return true if the package is valid, else return
            a list of error messages. If the `raise_error` flag is True,
            however, raise a `datapackage.exceptions.ValidatioError`.


In [41]:
errors = validate({'foo': 'bar'}, raise_error=False)
errors


[Gobble] Validation error: 'name' is a required property
[Gobble] Validation error: 'title' is a required property
[Gobble] Validation error: 'resources' is a required property
[Gobble] Validation error: 'model' is a required property
Out[41]:
["'name' is a required property",
 "'title' is a required property",
 "'resources' is a required property",
 "'model' is a required property"]

Pulling packages from Open-Spending

You can search for specific packages in the database, with the pull function.


In [47]:
help(pull)


Help on function pull in module gobble.search:

pull(query, private=True, limit=None)
    Query the ElasticSearch database.
    
    You can search a package by `title`, `author`, `description`, `regionCode`,
    `countryCode` or`cityCode`. You can match all these fields at once with the
    magic `q` key.
    
    If authentication-token was provided, then private packages from the
    authenticated user will also be included. Otherwise, only public packages
    will be returned. You can limit the size of your results with the `size`
    parameter.
    
    :param query: a `dict` of key value pairs
    :param private: show private datapackages
    :param limit: the number of results returned
    
    :type query: "class:`dict`
    :rtype private: :class:`bool'
    :rtype size: :class:`int'
    
    :return: a dictionary with the results
    :rtype: :class: `dict`


In [51]:
mexican_packages = {'countryCode': 'MX'}
results = pull(mexican_packages)

len(results)


Out[51]:
1

In [55]:
results[0]['package']['author']


Out[55]:
'loic <loic.jounot@not.shown>'

What's inside the user folder?


In [6]:
ls


authentication.json      GET.user.check.json       permissions.json
GET.search.package.json  gobble.log                POST.datastore.json
GET.user.authorize.json  HEAD.I.do.not.exist.json  token.json

In [58]:
cat gobble.log | tail -n 5


[Gobble] [2016-08-02 22:45:12,588] [snapshot] [_log] [DEBUG] Response cookies: <RequestsCookieJar[<Cookie session=d381892a-5383-4a75-899e-b20c572affeb for next.openspending.org/>]>
[Gobble] [2016-08-02 22:45:12,588] [snapshot] [_log] [DEBUG] Request full URL: http://next.openspending.org/user/authorize?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE0NzA1MDA0NTQsInVzZXJpZCI6IjVkZjRhN2IwNmE5NDBjOTkyZDFjNDQ1MjVkYWZmNDdiIn0.ta9ECVQSaqiVgBImHLIdUVf_KC21X4zKttPkr1gwM9g&service=os.datastore
[Gobble] [2016-08-02 22:45:12,588] [snapshot] [_log] [DEBUG] {"token": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJwZXJtaXNzaW9ucyI6eyJkYXRhcGFja2FnZS11cGxvYWQiOnRydWV9LCJzZXJ2aWNlIjoib3MuZGF0YXN0b3JlIiwidXNlcmlkIjoiNWRmNGE3YjA2YTk0MGM5OTJkMWM0NDUyNWRhZmY0N2IifQ.LI4KhEnkGR4WdCfamY1YhBw9Xkm1yd7Ik_r47pzNt8KX1gQ1_8tDvjzO0KVc9kfttOAFDf3vvAlWDnXzUWDGKWq-Yp8tnPYSsCisgR5mCbJs1VvUiDmmCAv3BHAcA-XREPzzkRf0YNqhZk8TE_mNIXxLNwVVMoKVQkC4svOTM07QrZeqjS8kROx2M2hyCvrvsLdGiJtPk01LFIwmPutZCJjOsF0-Z6u7keu7-h_Hf7juuu6nzaTIu5Jy6B2RoXnyzV2aN90siwYU_Y_Fg5LGbA0DnIRs-JaGJtkpRJWcAAVWRwLSn3ng4ofVua9RMGFfWDSjwUamjPCkMgxz_Pqj_Q", "userid": "5df4a7b06a940c992d1c44525daff47b", "permissions": {"datapackage-upload": true}, "service": "os.datastore"}
[Gobble] [2016-08-02 22:45:12,588] [snapshot] [_log] [DEBUG] ****************************** [200] OK - GET: /user/authorize (end) *******************************
[Gobble] [2016-08-02 22:45:12,588] [snapshot] [_save] [DEBUG] Saved request + response to /home/loic/.gobble/GET.user.authorize.json

Basically you have your logs, your user information and the last snapshot of the request for each endpoint:


In [25]:
cat ~/.gobble/GET.user.check.json | jq .


{
  "timestamp": "2016-08-02 22:45:12.517457",
  "url": "http://next.openspending.org/user/check?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE0NzA1MDA0NTQsInVzZXJpZCI6IjVkZjRhN2IwNmE5NDBjOTkyZDFjNDQ1MjVkYWZmNDdiIn0.ta9ECVQSaqiVgBImHLIdUVf_KC21X4zKttPkr1gwM9g",
  "query": {
    "jwt": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJleHAiOjE0NzA1MDA0NTQsInVzZXJpZCI6IjVkZjRhN2IwNmE5NDBjOTkyZDFjNDQ1MjVkYWZmNDdiIn0.ta9ECVQSaqiVgBImHLIdUVf_KC21X4zKttPkr1gwM9g"
  },
  "request_json": null,
  "response_json": {
    "authenticated": true,
    "profile": {
      "id": "google:107630624453481014600",
      "username": "ciol",
      "email": "loic.jounot@gmail.com",
      "name": "loic",
      "avatar_url": "https://lh5.googleusercontent.com/-rxV_5Yr7Mw8/AAAAAAAAAAI/AAAAAAAAA7U/TFWBCQ6OFt8/photo.jpg",
      "idhash": "5df4a7b06a940c992d1c44525daff47b"
    }
  },
  "request_headers": null,
  "response_headers": {
    "Date": "Tue, 02 Aug 2016 20:45:13 GMT",
    "Connection": "keep-alive",
    "Content-Length": "334",
    "Server": "nginx/1.11.3",
    "Set-Cookie": "session=d381892a-5383-4a75-899e-b20c572affeb; Expires=Fri, 02-Sep-2016 20:45:13 GMT; HttpOnly; Path=/",
    "Content-Type": "application/json"
  },
  "cookies": {
    "session": "d381892a-5383-4a75-899e-b20c572affeb"
  }
}

Low level usage

API endpoints

It's possible to communicate with the conductor API at a lower level if you wish. Gobble has a all the API endpoints pre-defined as callable objects:

  • authenticate_user
  • oauth_callback
  • authorize_user
  • update_user
  • search_packages
  • request_upload
  • upload_package
  • toggle_package

For example, let's look at the endpoint to get user permissions, represented by authorize_user.


In [15]:
from gobble.api import authorize_user

authorize_user.info


Out[15]:
{'endslash': False,
 'method': 'GET',
 'path': ['user', 'authorize'],
 'url': 'http://dev.openspending.org/user/authorize'}

To make a request, call the object. You can use the same keyword arguments as you would for the generic requests.Request function, i.e headers, json and params and data and you will get back a standard requests.Response.


In [16]:
payload = {'jwt':'token'}
authorize_user(params=payload)


Out[16]:
<Response [200]>

Handle responses

You can use the handle wrapper function to handle your requests properly.


In [21]:
from gobble.api import request_upload, handle

payload = {'bad_payload': 'I am bad'}
response = request_upload(params=payload)

handle(response)


[Gobble] 400 Client Error: BAD REQUEST for url: http://dev.openspending.org/datastore/?bad_payload=I+am+bad
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
<ipython-input-21-ed037533673e> in <module>()
      4 response = request_upload(params=payload)
      5 
----> 6 handle(response)

/home/loic/repos/gobble/gobble/api.py in handle(response)
    110     except HTTPError as error:
    111         log.error(error)
--> 112         raise error
    113 
    114     return to_json(response)

/home/loic/repos/gobble/gobble/api.py in handle(response)
    107     """
    108     try:
--> 109         response.raise_for_status()
    110     except HTTPError as error:
    111         log.error(error)

/home/loic/.virtualenvs/gobble/lib/python3.5/site-packages/requests/models.py in raise_for_status(self)
    842 
    843         if http_error_msg:
--> 844             raise HTTPError(http_error_msg, response=self)
    845 
    846     def close(self):

HTTPError: 400 Client Error: BAD REQUEST for url: http://dev.openspending.org/datastore/?bad_payload=I+am+bad

Snapshots of responses

You can also inpect the transaction in more detail by looking at the snapshot attribute, which stores the last request. The SnapShot class is a subclass of collections.OrderedDict and has a json attribute for easy formatting.


In [22]:
from json import dumps

print(request_upload.snapshot.json)


{"timestamp": "2016-08-03 00:01:17.114212", "url": "http://dev.openspending.org/datastore/?bad_payload=I+am+bad", "query": {"bad_payload": "I am bad"}, "request_json": null, "response_json": {}, "request_headers": null, "response_headers": {"Content-Type": "text/html; charset=utf-8", "Connection": "keep-alive", "Set-Cookie": "session=c182f82d-8868-4b3d-a64d-cfc36086865b; Expires=Fri, 02-Sep-2016 22:01:17 GMT; HttpOnly; Path=/", "Server": "nginx/1.11.2", "Date": "Tue, 02 Aug 2016 22:01:17 GMT", "Content-Length": "0"}, "cookies": {"session": "c182f82d-8868-4b3d-a64d-cfc36086865b"}}

Create new endpoints

Let's try and hit an endpoint that doesn't exist.


In [24]:
from gobble.api import EndPoint

leap_into_the_unknown = EndPoint('GET', 'I', 'do', 'not', 'exist')
leap_into_the_unknown()


Out[24]:
<Response [200]>

In [5]:
cd ~/.gobble


/home/loic/.gobble