GA4GH 1000 Genome Biometadata Example

This example illustrates the methods used to access the bio_metadata_service, including sample and individual data.

The Biometadata protocol presents messages that let's one determine the sample used to create some genomic data, such as a read group, or call set. Individuals can have multiple biosamples, and in this way the GA4GH is able to represent having samples from multiple sites of the same individual.

Initialize client

In this step we create a client object which will be used to communicate with the server. It is initialized using the 1kg URL.


In [1]:
import ga4gh_client.client as client
c = client.HttpClient("http://1kgenomes.ga4gh.org")

Search individuals method

This instance returns a list of individuals which are represented in a dataset. Note that we use dataset_id obtained from the 1kg_metadata_service notebook.


In [2]:
counter = 0
for individual in c.search_individuals(dataset_id="WyIxa2dlbm9tZXMiXQ"):
    if counter > 5:
        break
    counter += 1
    print "Individual: {}".format(individual.name)
    print " id: {}".format(individual.id)
    print " dataset_id: {}".format(individual.dataset_id)
    print " description: {}".format(individual.description)
    print " species.term: {}".format(individual.species.term)
    print " species.id: {}".format(individual.species.id)
    print " sex.term: {}".format(individual.sex.term)
    print " sex.id: {}\n".format(individual.sex.id)


Individual: HG00096
 id: WyIxa2dlbm9tZXMiLCJpIiwiSEcwMDA5NiJd
 dataset_id: WyIxa2dlbm9tZXMiXQ
 description: GBRBritish in England and Scotlandmale
 species.term: Homo sapiens
 species.id: NCBITaxon:9606
 sex.term: male genotypic sex
 sex.id: PATO:0020001

Individual: HG00097
 id: WyIxa2dlbm9tZXMiLCJpIiwiSEcwMDA5NyJd
 dataset_id: WyIxa2dlbm9tZXMiXQ
 description: GBRBritish in England and Scotlandfemale
 species.term: Homo sapiens
 species.id: NCBITaxon:9606
 sex.term: female genotypic sex
 sex.id: PATO:0020002

Individual: HG00098
 id: WyIxa2dlbm9tZXMiLCJpIiwiSEcwMDA5OCJd
 dataset_id: WyIxa2dlbm9tZXMiXQ
 description: GBRBritish in England and Scotlandmale
 species.term: Homo sapiens
 species.id: NCBITaxon:9606
 sex.term: male genotypic sex
 sex.id: PATO:0020001

Individual: HG00099
 id: WyIxa2dlbm9tZXMiLCJpIiwiSEcwMDA5OSJd
 dataset_id: WyIxa2dlbm9tZXMiXQ
 description: GBRBritish in England and Scotlandfemale
 species.term: Homo sapiens
 species.id: NCBITaxon:9606
 sex.term: female genotypic sex
 sex.id: PATO:0020002

Individual: HG00100
 id: WyIxa2dlbm9tZXMiLCJpIiwiSEcwMDEwMCJd
 dataset_id: WyIxa2dlbm9tZXMiXQ
 description: GBRBritish in England and Scotlandfemale
 species.term: Homo sapiens
 species.id: NCBITaxon:9606
 sex.term: female genotypic sex
 sex.id: PATO:0020002

Individual: HG00101
 id: WyIxa2dlbm9tZXMiLCJpIiwiSEcwMDEwMSJd
 dataset_id: WyIxa2dlbm9tZXMiXQ
 description: GBRBritish in England and Scotlandmale
 species.term: Homo sapiens
 species.id: NCBITaxon:9606
 sex.term: male genotypic sex
 sex.id: PATO:0020001

Note: Only parameters that are potentiallly required by other methods are shown. Accessing attributes in the info field is addressed below.

Get Individual by id method

This method obtains an single individual by it's unique identifier. This id was chosen arbitrarily from the retuend results.

Although the GA4GH constrains the named fields for representing an individual, the info field is able to interchange other data that might be useful. We will get an individual message and then inspect some values in the info field.


In [3]:
single_individual = c.get_individual(individual_id="WyIxa2dlbm9tZXMiLCJpIiwiSEcwMDU5NCJd")
print "Individual: {}".format(single_individual.name)
print " info['Family ID']: {}".format(single_individual.info['Family ID'].values[0].string_value)


Individual: HG00594
 info['Family ID']: SH057

In this case, the Family ID can be exchanged through the protocol, although the named field is not present in the Individual message itself.

Search BioSamples

We can list all of the biosamples available in a dataset similar in fashion to how we did so for individuals.


In [4]:
counter = 0
for biosample in c.search_bio_samples(dataset_id="WyIxa2dlbm9tZXMiXQ"):
    if counter > 5:
        break
    counter += 1
    print "BioSample: {}".format(biosample.name)
    print " id: {}".format(biosample.id)
    print " dataset_id: {}\n".format(biosample.dataset_id)


BioSample: HG00096
 id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDA5NiJd
 dataset_id: WyIxa2dlbm9tZXMiXQ

BioSample: HG00097
 id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDA5NyJd
 dataset_id: WyIxa2dlbm9tZXMiXQ

BioSample: HG00098
 id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDA5OCJd
 dataset_id: WyIxa2dlbm9tZXMiXQ

BioSample: HG00099
 id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDA5OSJd
 dataset_id: WyIxa2dlbm9tZXMiXQ

BioSample: HG00100
 id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDEwMCJd
 dataset_id: WyIxa2dlbm9tZXMiXQ

BioSample: HG00101
 id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDEwMSJd
 dataset_id: WyIxa2dlbm9tZXMiXQ

Only five sets of the bio samples data response were displayed for illustration purposes, but the response returns all of the samples hosted by the provided dataset. This all of the data contained in each response, illustrated in a organized fashion.

Get bio samples by id method

This method can return an individual set of data, by simply provding its identifier, the data returned is the one displayed with the exception of the json wrapper


In [5]:
single_bio_sample = c.get_bio_sample(bio_sample_id=biosample.id)
print"\nName: {}".format(single_bio_sample.name)
print" Id: {},".format(single_bio_sample.id)
print" Dataset Id: {},".format(single_bio_sample.dataset_id)
print" Desciption: {},".format(single_bio_sample.description)
print" Individual Id: {},".format(single_bio_sample.individual_id)
print" Disease: {},".format(single_bio_sample.disease)
print" Sample Created: {},".format(single_bio_sample.created)
print" Sample Updated: {}".format(single_bio_sample.updated)
for info in single_bio_sample.info:
        print" {}: \t{}".format(info, single_bio_sample.info[info].values[0].string_value)


Name: HG00102
 Id: WyIxa2dlbm9tZXMiLCJiIiwiSEcwMDEwMiJd,
 Dataset Id: WyIxa2dlbm9tZXMiXQ,
 Desciption: GBRBritish in England and Scotlandfemale,
 Individual Id: WyIxa2dlbm9tZXMiLCJpIiwiSEcwMDEwMiJd,
 Disease: ,
 Sample Created: 2016-10-07T10:59:00.916127,
 Sample Updated: 2016-10-07T10:59:00.916130
 Sample: 	HG00102
 Family ID: 	HG00102
 Relationship: 	
 Other Comments: 	
 Gender: 	female
 Population Description: 	British in England and Scotland
 Avuncular: 	
 Non Paternity: 	
 Unknown Second Order: 	
 Grandparents: 	
 Unexpected Parent/Child : 	
 Siblings: 	
 Half Siblings: 	
 Third Order: 	
 Population: 	GBR