DataPath Example 2

This notebook gives a very basic example of how to access data. It assumes that you understand the concepts presented in the example 1 notebook.



In [1]:

    
# Import deriva modules
from deriva.core import ErmrestCatalog, get_credential



In [2]:

    
# Connect with the deriva catalog
protocol = 'https'
hostname = 'www.facebase.org'
catalog_number = 1
credential = get_credential(hostname)
catalog = ErmrestCatalog(protocol, hostname, catalog_number, credential)



In [3]:

    
# Get the path builder interface for this catalog
pb = catalog.getPathBuilder()

DataPaths

The PathBuilder object allows you to begin DataPaths from the base Tables. A DataPath begins with a Table (or an TableAlias to be discussed later) as its "root" from which one can "link", "filter", and fetch its "entities".

Start a path rooted at a table from the catalog

We will reference a table from the PathBuilder pb variable from above. Using the PathBuilder, we will reference the "isa" schema, then the "dataset" table, and from that table start a path.



In [4]:

    
path = pb.schemas['isa'].tables['dataset'].path

We could have used the more compact dot-notation to start the same path.



In [5]:

    
path = pb.isa.dataset.path

Getting the URI of the current path

All DataPaths have URIs for the referenced resources in ERMrest. The URI identifies the resources which are available through "RESTful" Web protocols supported by ERMrest.



In [6]:

    
print(path.uri)









    



https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset

ResultSets

The data from a DataPath are accessed through a pythonic container object, the ResultSet. The ResultSet is returned by the DataPath's entities() and other methods.



In [7]:

    
results = path.entities()

Fetch entities from the catalog

Now we can get entities from the server using the ResultSet's fetch() method.



In [8]:

    
results.fetch()









    Out[8]:





<deriva.core.datapath.ResultSet at 0x10db10eb8>

ResultSets behave like python containers. For example, we can check the count of rows in this ResultSet.



In [9]:

    
len(results)









    Out[9]:





854

Note: If we had not explicitly called the fetch() method, then it would have been called implicitly on the first container operation such as len(...), list(...), iter(...) or get item [...].

Get an entity

To get one entity from the set, use the usual container operator to get an item.



In [10]:

    
results[9]









    Out[10]:





{'id': 14244,
 'accession': 'FB00001000',
 'title': 'FB0036_21mo male with midline cleft lip, spheno-ethmoidal meningocele, micropthalmia_Candidate Gene: KLF8',
 'project': 309,
 'funding': None,
 'summary': None,
 'description': 'The purpose of this study is to collect, process, and study samples from individuals with known or possible genetic disease, and their family members. The study’s broad goals are to better understand the genetic causes of disease in order to improve the ability to diagnose, treat, and even prevent illness. Our goal is to obtain a genetic diagnosis for health problem(s) the proband has, so the information can be used, when appropriate, to guide medical decisions made by the affected individuals doctor.\n\n **This is restricted-access human data.**  To gain access to this data, you must first go through the [process outlined here](/odocs/data-guidelines/).\n\nThis case was brought to the attention of FaceBase from Dr. Joan Stoler of Boston Children’s Hospital.\n\nPhenotype:\n- midline cleft lip\n- spheno-ethmoidal meningocele\n- micropthalmia\n\n\n \n',
 'mouse_genetic': None,
 'human_anatomic': None,
 'study_design': '1. Interesting cases are seen by the clinicians on our protocol and they are presented at a monthly meeting.\n2. The cases are looked at based on the solvability of the case, if we can obtain the correct family members for sequencing and if the family is willing to participate.\n3. Samples are obtained, usually for WES first, but sometimes WES has not led to an answer, so WGS is done.\n4. The data is sent to Brigham and Women’s Hospital analysis program, Brigham Genomic Medicine, where computational biologists look at the sequences and find the variant(s) that explains the phenotype seen in the proband.\n5. Functional analysis is done to mimic the phenotype in a mouse, which confirms what was seen with the variant.\n6. Findings are shared with the clinicians, so they may share with their patient.\n\n',
 'release_date': '2019-01-29',
 'show_in_jbrowse': None,
 '_keywords': None,
 'RID': '1-415C',
 'RCB': 'https://auth.globus.org/de244c2a-618a-4f51-9497-4910a200e99a',
 'RMB': 'https://www.facebase.org/webauthn_robot/fb_cron',
 'RCT': '2018-10-01T12:13:32.627187-07:00',
 'RMT': '2019-09-17T19:00:18.380718-07:00',
 'released': True,
 'Requires_DOI?': True,
 'DOI': '10.25550/1-415C'}

Get a specific attribute value from an entity

To get one attribute value from an entity get the item using its Column's name property.



In [11]:

    
dataset = pb.schemas['isa'].tables['dataset']
print(results[9][dataset.accession.name])

FB00001000

Fetch a Limited Number of Results

To set a limit on the number of results to be fetched from the catalog, use the explicit fetch(limit=...) method with the desired upper limit to fetch from the catalog.



In [12]:

    
results.fetch(limit=3)
len(results)









    Out[12]:





3

Iterate over the ResultSet

ResultSets are iterable like a typical container.



In [13]:

    
for entity in results:
    print(entity[dataset.accession.name])









    



FB00000329.02
FB00000982
FB00000957

Convert to Pandas DataFrame

ResultSets can be transformed into the popular Pandas DataFrame.



In [14]:

    
from pandas import DataFrame
DataFrame(results)









    Out[14]:







  
    
      
      id
      accession
      title
      project
      funding
      summary
      description
      mouse_genetic
      human_anatomic
      study_design
      ...
      show_in_jbrowse
      _keywords
      RID
      RCB
      RMB
      RCT
      RMT
      released
      Requires_DOI?
      DOI
    
  
  
    
      0
      10655
      FB00000329.02
      microCT - Bone Tissue of C57BL6J mouse at P0
      156
      This study was supported by grants from the Na...
      Mouse ID: JI221; This dataset includes 1 micro...
      Mouse ID: JI221\n This dataset includes 1 micr...
      None
      None
      None
      ...
      None
      Research on Functional Genomics, Image Analysi...
      V1J
      None
      https://www.facebase.org/webauthn_robot/fb_cron
      2017-09-22T17:33:18.797126-07:00
      2019-09-17T19:11:31.882241-07:00
      True
      True
      10.25550/V1J
    
    
      1
      14226
      FB00000982
      FB0064_Male with Congenital craniosynostosis_C...
      309
      None
      None
      The purpose of this study is to collect, proce...
      None
      None
      1.\tInteresting cases are seen by the clinicia...
      ...
      None
      None
      1-3SW2
      https://auth.globus.org/de244c2a-618a-4f51-949...
      https://www.facebase.org/webauthn_robot/fb_cron
      2018-06-12T12:09:23.852494-07:00
      2019-09-17T19:00:05.25666-07:00
      True
      True
      10.25550/1-3SW2
    
    
      2
      14201
      FB00000957
      FB0123_10mo girl with brain stem compression, ...
      309
      None
      None
      The purpose of this study is to collect, proce...
      None
      None
      1. Interesting cases are seen by the clinician...
      ...
      None
      Rapid Identification and Validation of Human C...
      2BAP
      https://auth.globus.org/de244c2a-618a-4f51-949...
      https://www.facebase.org/webauthn_robot/fb_cron
      2018-02-27T14:34:27.418699-08:00
      2019-09-17T19:00:06.837819-07:00
      True
      True
      10.25550/2BAP
    
  

3 rows × 21 columns

Selecting Attributes

It is also possible to fetch only a subset of attributes from the catalog. The attributes(...) method accepts a variable argument list followed by keyword arguments. Each argument must be a Column object from the table's columns container.

Renaming selected attributes

To rename the selected attributes, use the alias(...) method on the column object. For example, attributes(table.column.alias('new_name')) will rename table.column with new_name in the entities returned from the server. (It will not change anything in the stored catalog data.)



In [15]:

    
results = path.attributes(dataset.accession, dataset.title, dataset.released.alias('is_released')).fetch(limit=5)

Convert to list

Now we can look at the results from the above fetch. To demonstrate a different access mode, we can convert the entities to a standard python list and dump to the console.



In [16]:

    
list(results)









    Out[16]:





[{'accession': 'FB00000329.02',
  'title': 'microCT - Bone Tissue of C57BL6J mouse at P0',
  'is_released': True},
 {'accession': 'FB00000982',
  'title': 'FB0064_Male with Congenital craniosynostosis_Candidate Gene: FREM2',
  'is_released': True},
 {'accession': 'FB00000957',
  'title': 'FB0123_10mo girl with brain stem compression, multiple skeletal anomalies, cranioal bone anomalies, clinodactyly, cleft palate, unusual skull configuration (only 3 bones), stenosis of ear canal',
  'is_released': True},
 {'accession': 'FB00000953',
  'title': 'FB0109_Male with Robin sequence, cleft palate, midface hypoplasia, round almond-shaped eyes, negative CMA, positive family Hx_Candidate Gene: HOXB2',
  'is_released': True},
 {'accession': 'FB00000319',
  'title': 'microCT - Soft Tissue of Tgfbr2fl/+ Control mouse at P0',
  'is_released': True}]



In [ ]:

	id	accession	title	project	funding	summary	description	mouse_genetic	human_anatomic	study_design	...	show_in_jbrowse	_keywords	RID	RCB	RMB	RCT	RMT	released	Requires_DOI?	DOI
0	10655	FB00000329.02	microCT - Bone Tissue of C57BL6J mouse at P0	156	This study was supported by grants from the Na...	Mouse ID: JI221; This dataset includes 1 micro...	Mouse ID: JI221\n This dataset includes 1 micr...	None	None	None	...	None	Research on Functional Genomics, Image Analysi...	V1J	None	https://www.facebase.org/webauthn_robot/fb_cron	2017-09-22T17:33:18.797126-07:00	2019-09-17T19:11:31.882241-07:00	True	True	10.25550/V1J
1	14226	FB00000982	FB0064_Male with Congenital craniosynostosis_C...	309	None	None	The purpose of this study is to collect, proce...	None	None	1.\tInteresting cases are seen by the clinicia...	...	None	None	1-3SW2	https://auth.globus.org/de244c2a-618a-4f51-949...	https://www.facebase.org/webauthn_robot/fb_cron	2018-06-12T12:09:23.852494-07:00	2019-09-17T19:00:05.25666-07:00	True	True	10.25550/1-3SW2
2	14201	FB00000957	FB0123_10mo girl with brain stem compression, ...	309	None	None	The purpose of this study is to collect, proce...	None	None	1. Interesting cases are seen by the clinician...	...	None	Rapid Identification and Validation of Human C...	2BAP	https://auth.globus.org/de244c2a-618a-4f51-949...	https://www.facebase.org/webauthn_robot/fb_cron	2018-02-27T14:34:27.418699-08:00	2019-09-17T19:00:06.837819-07:00	True	True	10.25550/2BAP