DataPath :: Data Update Example

This notebook demonstrates how to perform simple data manipulations.



In [1]:

    
from deriva.core import ErmrestCatalog, get_credential

This example uses a development server with a throw away catalog. You will not have sufficient permissions to be able to run this example. This notebook is for documentation purpose only.



In [2]:

    
scheme = 'https'
hostname = 'dev.facebase.org'
catalog_number = 1

Use DERIVA-Auth to get a credential or use None if your catalog allows anonymous access.



In [3]:

    
credential = get_credential(hostname)

Now, connect to your catalog and the pathbuilder interface for the catalog.



In [4]:

    
assert scheme == 'http' or scheme == 'https', "Invalid http scheme used."
assert isinstance(hostname, str), "Hostname not set."
assert isinstance(catalog_number, int), "Invalid catalog number"
catalog = ErmrestCatalog(scheme, hostname, catalog_number, credential)
pb = catalog.getPathBuilder()

For this example, we will create or modify entities of the "Dataset" table of a catalog that uses the FaceBase data model.



In [5]:

    
dataset = pb.isa.dataset
dataset









    Out[5]:





Table name: 'dataset'
List of columns:
  Column name: 'id'	Type: serial4	Comment: 'None'
  Column name: 'accession'	Type: text	Comment: 'None'
  Column name: 'title'	Type: text	Comment: 'None'
  Column name: 'project'	Type: int8	Comment: 'None'
  Column name: 'funding'	Type: text	Comment: 'None'
  Column name: 'summary'	Type: text	Comment: 'None'
  Column name: 'description'	Type: markdown	Comment: 'None'
  Column name: 'view_gene_summary'	Type: text	Comment: 'None'
  Column name: 'mouse_genetic'	Type: text	Comment: 'None'
  Column name: 'human_anatomic'	Type: text	Comment: 'None'
  Column name: 'study_design'	Type: markdown	Comment: 'None'
  Column name: 'release_date'	Type: date	Comment: 'None'
  Column name: 'status'	Type: text	Comment: 'None'
  Column name: 'gene_summary'	Type: int4	Comment: 'None'
  Column name: 'thumbnail'	Type: int4	Comment: 'None'
  Column name: 'show_in_jbrowse'	Type: boolean	Comment: 'None'
  Column name: '_keywords'	Type: text	Comment: 'None'
  Column name: 'RID'	Type: ermrest_rid	Comment: 'System-generated unique row ID.'
  Column name: 'RCB'	Type: ermrest_rcb	Comment: 'System-generated row created by user provenance.'
  Column name: 'RMB'	Type: ermrest_rmb	Comment: 'System-generated row modified by user provenance.'
  Column name: 'RCT'	Type: ermrest_rct	Comment: 'System-generated row creation timestamp.'
  Column name: 'RMT'	Type: ermrest_rmt	Comment: 'System-generated row modification timestamp'

Insert example

Here we will insert an entity into the dataset table.



In [6]:

    
new_entity = {
    'title': 'A test dataset by derivapy', 
    'description': 'This was created by the deriva-py API.',
    'project': 311
}
entities = dataset.insert([new_entity], defaults={'id', 'accession'})

The insert operation returns the inserted entities, which now have any system generated attributes filled in.



In [7]:

    
list(entities)









    Out[7]:





[{'RCB': 'https://auth.globus.org/bb256144-d274-11e5-adb1-13a4cc43acbd',
  'RCT': '2018-05-25T14:16:20.951563-07:00',
  'RID': 572108,
  'RMB': 'https://auth.globus.org/bb256144-d274-11e5-adb1-13a4cc43acbd',
  'RMT': '2018-05-25T14:16:20.951563-07:00',
  '_keywords': None,
  'accession': 'FB00000974',
  'description': 'This was created by the deriva-py API.',
  'funding': None,
  'gene_summary': None,
  'human_anatomic': None,
  'id': 14219,
  'mouse_genetic': None,
  'project': 311,
  'release_date': None,
  'show_in_jbrowse': None,
  'status': None,
  'study_design': None,
  'summary': None,
  'thumbnail': None,
  'title': 'A test dataset by derivapy',
  'view_gene_summary': None}]

Update example

Here we will change the description for the entity we inserted and update it in the catalog.



In [8]:

    
entities[0]['description'] = 'A test dataset that was updated by derivapy'



In [9]:

    
updated_entities = dataset.update(entities)

Similar to the insert operation, the update operation also returns the updated entities. Notice that the system-managed 'RMT' (Row Modified Timestamp) attribute has been update too.



In [10]:

    
list(updated_entities)









    Out[10]:





[{'RCB': 'https://auth.globus.org/bb256144-d274-11e5-adb1-13a4cc43acbd',
  'RCT': '2018-05-25T14:16:20.951563-07:00',
  'RID': 572108,
  'RMB': 'https://auth.globus.org/bb256144-d274-11e5-adb1-13a4cc43acbd',
  'RMT': '2018-05-25T14:16:25.63306-07:00',
  '_keywords': None,
  'accession': 'FB00000974',
  'description': 'A test dataset that was updated by derivapy',
  'funding': None,
  'gene_summary': None,
  'human_anatomic': None,
  'id': 14219,
  'mouse_genetic': None,
  'project': 311,
  'release_date': None,
  'show_in_jbrowse': None,
  'status': None,
  'study_design': None,
  'summary': None,
  'thumbnail': None,
  'title': 'A test dataset by derivapy',
  'view_gene_summary': None}]

Update with custom correlation and targets specified

You can also specify which columns to use to correlate the input with the existing rows in the table and which columsn to be the targets of the update. Per the ERMrest protocol, extra data in the update payload (entities) will be ignored. The inputs must be iterables of strings or objects that implement the __str__ method.



In [11]:

    
entities[0]['description'] = 'Yet another update using derivapy'
entities[0]['title'] = 'And a title change'
updated_entities = dataset.update(entities, [dataset.id], [dataset.description, 'title'])



In [12]:

    
list(updated_entities)









    Out[12]:





[{'description': 'Yet another update using derivapy',
  'id': 14219,
  'title': 'And a title change'}]

Delete example

Unlike insert and update which are performed within the context of a table, the delete operation is performed within the context of a data path.

We know the RID from above, which is a single-column key for the entities in the dataset (and any other EMRrest) table. We can use this attribute to form a path to the newly inserted and updated entity.

Note: Any filters could be used in this example; we do not have to use a key column only. We use the key only because we want to delete that specific entity which we just created. If we wanted to, we could link addition tables and apply additional filters to delete entities computed from a complex path.



In [13]:

    
path = dataset.filter(dataset.RID == entities[0]['RID'])

On successful delete, no content will be returned.



In [14]:

    
path.delete()