GA4GH 1000 Genome Sequence Annotation Example

This example illustrates how to access the sequence annotations for a given set of ....

Initialize Client

In this step we create a client object which will be used to communicate with the server. It is initialized using the URL


In [1]:
import ga4gh_client.client as client
c = client.HttpClient("http://1kgenomes.ga4gh.org")

Search featuresets method

--- Description --- dataset_id obtained from 1kg_metadata_service notebook.


In [2]:
for feature_sets in c.search_feature_sets(dataset_id="WyIxa2dlbm9tZXMiXQ"):
    print feature_sets


id: "WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyJd"
dataset_id: "WyIxa2dlbm9tZXMiXQ"
reference_set_id: "WyJOQ0JJMzciXQ"
name: "gencode_v24lift37"

id: "WyIxa2dlbm9tZXMiLCJjZ2QiXQ"
dataset_id: "WyIxa2dlbm9tZXMiXQ"
reference_set_id: "WyJOQ0JJMzciXQ"
name: "cgd"

Get featureset by id request

By knowing the id of the featureset_id we can also obtain the set in a get request. Also note that in the following reuest we simply set feature_set_id to feature_sets.id which was obtained in the previous search request.


In [3]:
feature_set = c.get_feature_set(feature_set_id=feature_sets.id)
print feature_set


id: "WyIxa2dlbm9tZXMiLCJjZ2QiXQ"
dataset_id: "WyIxa2dlbm9tZXMiXQ"
reference_set_id: "WyJOQ0JJMzciXQ"
name: "cgd"

Search features method

-- Description ~~


In [4]:
counter = 0
for features in c.search_features(feature_set_id=feature_set.id):
    if counter > 5:
        break
    counter += 1
    print"Id: {},".format(features.id)
    print" Name: {},".format(features.name)
    print" Gene Symbol: {},".format(features.gene_symbol)
    print" Parent Id: {},".format(features.parent_id)
    if features.child_ids:
        for i in features.child_ids:
            print" Child Ids: {}".format(i)
    print" Feature Set Id: {},".format(features.feature_set_id)
    print" Reference Name: {},".format(features.reference_name)
    print" Start: {},\tEnd: {},".format(features.start, features.end)
    print" Strand: {},".format(features.strand)
    print"  Feature Type Id: {},".format(features.feature_type.id)
    print"  Feature Type Term: {},".format(features.feature_type.term)
    print"  Feature Type Sorce Name: {},".format(features.feature_type.source_name)
    print"  Feature Type Source Version: {}\n".format(features.feature_type.source_version)


Id: WyIxa2dlbm9tZXMiLCJjZ2QiLCJodHRwOi8vb2hzdS5lZHUvY2dkLzFmY2IzNmYxIl0,
 Name: EGFR G719C missense mutation,
 Gene Symbol: EGFR G719C missense mutation,
 Parent Id: ,
 Feature Set Id: ,
 Reference Name: ,
 Start: 0,	End: 0,
 Strand: 0,
  Feature Type Id: http://purl.obolibrary.org/obo/SO_0001059,
  Feature Type Term: sequence_alteration,
  Feature Type Sorce Name: ,
  Feature Type Source Version: 

Id: WyIxa2dlbm9tZXMiLCJjZ2QiLCJodHRwOi8vb2hzdS5lZHUvY2dkL2U2YjIxZWEwIl0,
 Name: MAP2K2 V35M, L46F, N126D, C125S missense mutation,
 Gene Symbol: MAP2K2 V35M, L46F, N126D, C125S missense mutation,
 Parent Id: ,
 Feature Set Id: ,
 Reference Name: ,
 Start: 0,	End: 0,
 Strand: 0,
  Feature Type Id: http://purl.obolibrary.org/obo/SO_0001059,
  Feature Type Term: sequence_alteration,
  Feature Type Sorce Name: ,
  Feature Type Source Version: 

Id: WyIxa2dlbm9tZXMiLCJjZ2QiLCJodHRwOi8vb2hzdS5lZHUvY2dkL2UzZTI2MzFhIl0,
 Name: ALK  1151Tins insertion mutation,
 Gene Symbol: ALK  1151Tins insertion mutation,
 Parent Id: ,
 Feature Set Id: ,
 Reference Name: ,
 Start: 0,	End: 0,
 Strand: 0,
  Feature Type Id: http://purl.obolibrary.org/obo/SO_0001059,
  Feature Type Term: sequence_alteration,
  Feature Type Sorce Name: ,
  Feature Type Source Version: 

Id: WyIxa2dlbm9tZXMiLCJjZ2QiLCJodHRwOi8vd3d3LmRydWdiYW5rLmNhL2RydWdzL0RCMDA1MTUiXQ,
 Name: cisplatin,
 Gene Symbol: cisplatin,
 Parent Id: ,
 Feature Set Id: ,
 Reference Name: ,
 Start: 0,	End: 0,
 Strand: 0,
  Feature Type Id: ,
  Feature Type Term: ,
  Feature Type Sorce Name: ,
  Feature Type Source Version: 

Id: WyIxa2dlbm9tZXMiLCJjZ2QiLCJodHRwOi8vY2FuY2VyLnNhbmdlci5hYy51ay9jb3NtaWMvbXV0YXRpb24vb3ZlcnZpZXc_aWQ9NjI0MSJd,
 Name: EGFR S768I missense mutation,
 Gene Symbol: COSM6241,
 Parent Id: ,
 Feature Set Id: ,
 Reference Name: chr7,
 Start: 55249005,	End: 55249006,
 Strand: 0,
  Feature Type Id: http://purl.obolibrary.org/obo/SO_0001059,
  Feature Type Term: sequence_alteration,
  Feature Type Sorce Name: ,
  Feature Type Source Version: 

Id: WyIxa2dlbm9tZXMiLCJjZ2QiLCJodHRwOi8vY2FuY2VyLnNhbmdlci5hYy51ay9jb3NtaWMvbXV0YXRpb24vb3ZlcnZpZXc_aWQ9NjI0MCJd,
 Name: EGFR T790M missense mutation,
 Gene Symbol: COSM6240,
 Parent Id: ,
 Feature Set Id: ,
 Reference Name: chr7,
 Start: 55249071,	End: 55249071,
 Strand: 0,
  Feature Type Id: http://purl.obolibrary.org/obo/SO_0001059,
  Feature Type Term: sequence_alteration,
  Feature Type Sorce Name: ,
  Feature Type Source Version: 

Note: Not all of the elements returned in the response are present in the example. All of the parameters will be shown in the get by id method.

In [5]:
feature = c.get_feature(feature_id=features.id)
print"Id: {},".format(feature.id)
print" Name: {},".format(feature.name)
print" Gene Symbol: {},".format(feature.gene_symbol)
print" Parent Id: {},".format(feature.parent_id)
if feature.child_ids:
    for i in feature.child_ids:
        print" Child Ids: {}".format(i)
print" Feature Set Id: {},".format(feature.feature_set_id)
print" Reference Name: {},".format(feature.reference_name)
print" Start: {},\tEnd: {},".format(feature.start, feature.end)
print" Strand: {},".format(feature.strand)
print"  Feature Type Id: {},".format(feature.feature_type.id)
print"  Feature Type Term: {},".format(feature.feature_type.term)
print"  Feature Type Sorce Name: {},".format(feature.feature_type.source_name)
print"  Feature Type Source Version: {}\n".format(feature.feature_type.source_version)
for vals in feature.attributes.vals:
    print"{}: {}".format(vals, feature.attributes.vals[vals].values[0].string_value)


Id: WyIxa2dlbm9tZXMiLCJjZ2QiLCJodHRwOi8vb2hzdS5lZHUvY2dkLzUzOGNhY2U0Il0,
 Name: NF1 any mutation,
 Gene Symbol: NF1 any mutation,
 Parent Id: ,
 Feature Set Id: ,
 Reference Name: ,
 Start: 0,	End: 0,
 Strand: 0,
  Feature Type Id: http://purl.obolibrary.org/obo/SO_0001059,
  Feature Type Term: sequence_alteration,
  Feature Type Sorce Name: ,
  Feature Type Source Version: 

http://www.w3.org/1999/02/22-rdf-syntax-ns#type: http://purl.obolibrary.org/obo/SO_0001059
http://purl.obolibrary.org/obo/RO_0002200: http://ohsu.edu/cgd/7fd7d0e6
http://www.w3.org/2000/01/rdf-schema#label: NF1 any mutation
In this last call we represent all of the elements returned in the message.
For documentation in the service, and more information go to:

https://ga4gh-schemas.readthedocs.io/en/latest/schemas/allele_annotation_service.proto.html