This example illustrates the methods used to access the bio_metadata_service
, including sample and individual data.
The Biometadata protocol presents messages that let's one determine the sample used to create some genomic data, such as a read group, or call set. Individuals can have multiple biosamples, and in this way the GA4GH is able to represent having samples from multiple sites of the same individual.
In [1]:
from ga4gh.client import client
c = client.HttpClient("http://1kgenomes.ga4gh.org")
In [2]:
#Obtain dataSet id REF: -> `1kg_metadata_service`
dataset = c.search_datasets().next()
In [3]:
counter = 0
for individual in c.search_individuals(dataset_id=dataset.id):
if counter > 5:
break
counter += 1
print "Individual: {}".format(individual.name)
print " id: {}".format(individual.id)
print " dataset_id: {}".format(individual.dataset_id)
print " description: {}".format(individual.description)
print " species.term: {}".format(individual.species.term)
print " species.id: {}".format(individual.species.id)
print " sex.term: {}".format(individual.sex.term)
print " sex.id: {}\n".format(individual.sex.id)
Although the GA4GH constrains the named fields for representing an individual, the info field is able to interchange other data that might be useful. We will get an individual message and then inspect some values in the info field.
In [4]:
single_individual = c.get_individual(individual_id=individual.id)
print "Individual: {}".format(single_individual.name)
print " info['Family ID']: {}".format(single_individual.info['Family ID'].values[0].string_value)
In this case, the Family ID can be exchanged through the protocol, although the named field is not present in the Individual message itself.
In [6]:
counter = 0
for biosample in c.search_biosamples(dataset_id=dataset.id):
if counter > 5:
break
counter += 1
print "BioSample: {}".format(biosample.name)
print " id: {}".format(biosample.id)
print " dataset_id: {}\n".format(biosample.dataset_id)
In [8]:
single_biosample = c.get_biosample(biosample.id)
print"\nName: {}".format(single_biosample.name)
print" Id: {},".format(single_biosample.id)
print" Dataset Id: {},".format(single_biosample.dataset_id)
print" Desciption: {},".format(single_biosample.description)
print" Individual Id: {},".format(single_biosample.individual_id)
print" Disease: {},".format(single_biosample.disease)
print" Sample Created: {},".format(single_biosample.created)
print" Sample Updated: {}".format(single_biosample.updated)
for info in single_biosample.info:
print" {}: \t{}".format(info, single_biosample.info[info].values[0].string_value)
https://ga4gh-schemas.readthedocs.io/en/latest/schemas/bio_metadata_service.proto.html