GA4GH 1000 Genomes Sequence Annotations Example

This example illustrates how to access the sequence annotations for a given set of ....

Initialize Client

In this step we create a client object which will be used to communicate with the server. It is initialized using the URL.


In [1]:
from ga4gh.client import client
c = client.HttpClient("http://1kgenomes.ga4gh.org")

In [7]:
#Obtain dataSet id REF: -> `1kg_metadata_service`
dataset = c.search_datasets().next()

Search Feature Sets

Feature sets are the logical containers for genomic features that might be defined in a GFF3, or other file that describes features in genomic coordinates. They are mapped to a single reference set, and belong to specific datasets.


In [8]:
for feature_set in c.search_feature_sets(dataset_id=dataset.id):
    print feature_set
    if feature_set.name == "gencode_v24lift37":
        gencode = feature_set


id: "WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyJd"
dataset_id: "WyIxa2dlbm9tZXMiXQ"
reference_set_id: "WyJOQ0JJMzciXQ"
name: "gencode_v24lift37"

Get Feature Set by ID

With the identifier to a specific Feature Set, one can retrieve that feature set by ID.


In [9]:
feature_set = c.get_feature_set(feature_set_id=gencode.id)
print feature_set


id: "WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyJd"
dataset_id: "WyIxa2dlbm9tZXMiXQ"
reference_set_id: "WyJOQ0JJMzciXQ"
name: "gencode_v24lift37"

Search Features

With a Feature Set ID, it becomes possible to construct a Search Features Request. In this request, we can find genomic features by position, type, or name. In this request we simply return all features in the Feature Set.


In [10]:
counter = 0
for features in c.search_features(feature_set_id=feature_set.id):
    if counter > 3:
        break
    counter += 1
    print"Id: {},".format(features.id)
    print" Name: {},".format(features.name)
    print" Gene Symbol: {},".format(features.gene_symbol)
    print" Parent Id: {},".format(features.parent_id)
    if features.child_ids:
        for i in features.child_ids:
            print" Child Ids: {}".format(i)
    print" Feature Set Id: {},".format(features.feature_set_id)
    print" Reference Name: {},".format(features.reference_name)
    print" Start: {},\tEnd: {},".format(features.start, features.end)
    print" Strand: {},".format(features.strand)
    print"  Feature Type Id: {},".format(features.feature_type.id)
    print"  Feature Type Term: {},".format(features.feature_type.term)
    print"  Feature Type Sorce Name: {},".format(features.feature_type.source_name)
    print"  Feature Type Source Version: {}\n".format(features.feature_type.source_version)


Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzM3MTc5MiJd,
 Name: exon:ENST00000621489.1:2,
 Gene Symbol: CH17-408M7.1,
 Parent Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzM3MTQwOCJd,
 Feature Set Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyJd,
 Reference Name: GL000192.1,
 Start: 429710,	End: 430271,
 Strand: 1,
  Feature Type Id: SO:0000147,
  Feature Type Term: exon,
  Feature Type Sorce Name: so-xp,
  Feature Type Source Version: so-xp/releases/2015-11-24/so-xp.owl

Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzM3MTAyNCJd,
 Name: ENSG00000277420.1,
 Gene Symbol: CH17-408M7.1,
 Parent Id: ,
 Child Ids: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzM3MTQwOCJd
 Feature Set Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyJd,
 Reference Name: GL000192.1,
 Start: 429710,	End: 440529,
 Strand: 1,
  Feature Type Id: SO:0000704,
  Feature Type Term: gene,
  Feature Type Sorce Name: so-xp,
  Feature Type Source Version: so-xp/releases/2015-11-24/so-xp.owl

Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzM3MTQwOCJd,
 Name: ENST00000621489.1,
 Gene Symbol: CH17-408M7.1,
 Parent Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzM3MTAyNCJd,
 Child Ids: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzM3MTc5MiJd
 Child Ids: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzM3MTYwMCJd
 Feature Set Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyJd,
 Reference Name: GL000192.1,
 Start: 429710,	End: 440529,
 Strand: 1,
  Feature Type Id: SO:0000673,
  Feature Type Term: transcript,
  Feature Type Sorce Name: so-xp,
  Feature Type Source Version: so-xp/releases/2015-11-24/so-xp.owl

Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzM3MTYwMCJd,
 Name: exon:ENST00000621489.1:1,
 Gene Symbol: CH17-408M7.1,
 Parent Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzM3MTQwOCJd,
 Feature Set Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyJd,
 Reference Name: GL000192.1,
 Start: 438554,	End: 440529,
 Strand: 1,
  Feature Type Id: SO:0000147,
  Feature Type Term: exon,
  Feature Type Sorce Name: so-xp,
  Feature Type Source Version: so-xp/releases/2015-11-24/so-xp.owl

Note: Not all of the elements returned in the response are present in the example. All of the parameters will be shown in the get by id method.

We can perform a similar search, this time restricting to a specific genomic region.


In [11]:
for feature in c.search_features(feature_set_id=feature_set.id, reference_name="chr17", start=42000000, end=42001000):
    print feature.name, feature.start, feature.end


ENSG00000282199.1 41991801 42000369
ENST00000585329.1 41991801 42000369
exon:ENST00000621298.1:5 42000143 42000691
ENSG00000267166.5 42000143 42004756
ENST00000621298.1 42000143 42004756
exon:ENST00000585329.1:1 42000144 42000369
exon:ENST00000621298.1:4 42000883 42001329

In [12]:
feature = c.get_feature(feature_id=features.id)
print"Id: {},".format(feature.id)
print" Name: {},".format(feature.name)
print" Gene Symbol: {},".format(feature.gene_symbol)
print" Parent Id: {},".format(feature.parent_id)
if feature.child_ids:
    for i in feature.child_ids:
        print" Child Ids: {}".format(i)
print" Feature Set Id: {},".format(feature.feature_set_id)
print" Reference Name: {},".format(feature.reference_name)
print" Start: {},\tEnd: {},".format(feature.start, feature.end)
print" Strand: {},".format(feature.strand)
print"  Feature Type Id: {},".format(feature.feature_type.id)
print"  Feature Type Term: {},".format(feature.feature_type.term)
print"  Feature Type Sorce Name: {},".format(feature.feature_type.source_name)
print"  Feature Type Source Version: {}\n".format(feature.feature_type.source_version)
for vals in feature.attributes.vals:
    print"{}: {}".format(vals, feature.attributes.vals[vals].values[0].string_value)


Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzM3MTIxNiJd,
 Name: exon:ENST00000614199.1:2,
 Gene Symbol: RP11-640M9.4,
 Parent Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyIsIjE0MDUwODI3NzI1NjA4MCJd,
 Feature Set Id: WyIxa2dlbm9tZXMiLCJnZW5jb2RlX3YyNGxpZnQzNyJd,
 Reference Name: GL000192.1,
 Start: 493155,	End: 493368,
 Strand: 1,
  Feature Type Id: SO:0000147,
  Feature Type Term: exon,
  Feature Type Sorce Name: so-xp,
  Feature Type Source Version: so-xp/releases/2015-11-24/so-xp.owl

remap_original_location: chr1:+:146461646-146461859
gene_status: KNOWN
havana_gene: OTTHUMG00000187535.3
transcript_support_level: NA
Parent: ENST00000614199.1
level: 2
transcript_status: KNOWN
tag: basic
gene_id: ENSG00000277655.1
exon_id: ENSE00003723063.1
transcript_type: unprocessed_pseudogene
transcript_name: RP11-640M9.4-001
exon_number: 2
ont: PGO:0000005
havana_transcript: OTTHUMT00000475232.3
transcript_id: ENST00000614199.1
gene_type: unprocessed_pseudogene
remap_status: full_contig
ID: exon:ENST00000614199.1:2
gene_name: RP11-640M9.4
In this last call we represent all of the elements returned in the message.
For documentation in the service, and more information go to:

https://ga4gh-schemas.readthedocs.io/en/latest/schemas/allele_annotation_service.proto.html