GA4GH 1000 Genome Variant Service Example

This example illustrates how to access the different variant calls implemented within the variant service.

Initialize the client

In this step we create a client object which will be used to communicate with the server. It is initialized using the URL.


In [1]:
import ga4gh_client.client as client
c = client.HttpClient("http://1kgenomes.ga4gh.org")

Search variant annotation sets method

Response returns a list of sets of variant annotations, with the pertaining info fields


In [2]:
for variant_annotation_sets in c.search_variant_annotation_sets(variant_set_id="WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd"):
    print "\nName: {},".format(variant_annotation_sets.name)
    print" Id: {},".format(variant_annotation_sets.id)
    print" Variant Set Id: {},".format(variant_annotation_sets.variant_set_id)
    print" Analysis Id: {},".format(variant_annotation_sets.analysis.id)
    print" Analysis Created: {}\n".format(variant_annotation_sets.analysis.created)
    for info in variant_annotation_sets.analysis.info:
        print"{}:   {}".format(info, variant_annotation_sets.analysis.info[info].values[0].string_value)


Name: functional-annotation,
 Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd,
 Variant Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd,
 Analysis Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImFuYWx5c2lzIl0,
 Analysis Created: 2014-07-30T00:00:00Z

INFO.ALOFT:   The Annotation Of Loss-of-Function Transcripts, provides extensive functional annotations to loss-of-function variants in the human genome, https://github.com/gersteinlab/aloft
INFO.MLEN:   Estimated length of mitochondrial insert
reference:   ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
INFO.GENCODE:   The annotation of coding variants and splice site variants
INFO.SVTYPE:   Type of structural variant
INFO.NS:   Number of samples with data
INFO.DP:   Total read depth
INFO.ERB:   Ensembl Regulatory Build. Format: Allele|Gene|Feature|Feature_type|Consequence
INFO.AFR_AF:   Allele frequency in the AFR populations calculated from AC and AN, in the range (0,1)
FORMAT.GT:   Genotype
fileDate:   20140730
INFO.AMR_AF:   Allele frequency in the AMR populations calculated from AC and AN, in the range (0,1)
INFO.SVLEN:   Difference in length between REF and ALT alleles
INFO.MEINFO:   Mobile element info of the form NAME,START,END<POLARITY; If there is only 5' OR 3' support for this call, will be NULL NULL for START and END
source:   1000GenomesPhase3Pipeline
INFO.SAS_AF:   Allele frequency in the SAS populations calculated from AC and AN, in the range (0,1)
INFO.EUR_AF:   Allele frequency in the EUR populations calculated from AC and AN, in the range (0,1)
INFO.CIPOS:   Confidence interval around POS for imprecise variants
VEP:   v77 cache=/data/blastdb/Ensembl/vep/homo_sapiens/77_GRCh37 db=.
INFO.IMPRECISE:   Imprecise structural variation
INFO.MSTART:   Mitochondrial start coordinate of inserted sequence
INFO.CSQ:   Consequence type as predicted by VEP WITH -PICK_ALLELE parameter. Format: Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|SIFT|PolyPhen|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE
INFO.AC:   Total number of alternate alleles in called genotypes
INFO.AA:   Ancestral Allele. Format: AA|REF|ALT|IndelType. AA: Ancestral allele, REF:Reference Allele, ALT:Alternate Allele, IndelType:Type of Indel (REF, ALT and IndelType are only defined for indels)
INFO.AF:   Estimated allele frequency in the range (0,1)
INFO.FUNSEQ:   FunSeq score for noncoding SNV
INFO.TSD:   Precise Target Site Duplication for bases, if unknown, value will be NULL
INFO.END:   End coordinate of this variant
INFO.CS:   Source call set.
INFO.AN:   Total number of alleles in called genotypes
INFO.PHOSPHORYLATION:   Predicted as phosphorylation sites by Phosphosite.org. Format: Uniprot Identifier of phosphorylated protein|Position in Uniprot sequence of phosphorylation site|Number of low throughput experiments this site has been seen in|Number of high throughput experiments this site has been seen in
INFO.MC:   Merged calls.
INFO.HighD:   The Super population with the higher derived allele frequency for the highD site
INFO.MEND:   Mitochondrial end coordinate of inserted sequence
INFO.CIEND:   Confidence interval around END for imprecise variants
INFO.EAS_AF:   Allele frequency in the EAS populations calculated from AC and AN, in the range (0,1)
fileformat:   VCFv4.1

Search variant annotations method

This request returns ---


In [3]:
counter = 6
for variant_annotations in c.search_variant_annotations(variant_annotation_set_id="WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd", reference_name="1", start=0, end=1000000):
    if counter <= 0:
        break
    counter -= 1 
    print"Id: {},".format(variant_annotations.id)
    print" Variant Id: {},".format(variant_annotations.variant_id)
    print" Variant Annotation Set Id: {}".format(variant_annotations.variant_annotation_set_id)
    print" Created: {}".format(variant_annotations.created)
    print" Transcript Effects Id: {},".format(variant_annotations.transcript_effects[0].id)
    print" Featured Id: {},".format(variant_annotations.transcript_effects[0].feature_id)
    print" Alternate Bases: {},".format(variant_annotations.transcript_effects[0].alternate_bases)
    print" Effects Id: {},".format(variant_annotations.transcript_effects[0].effects[0].id)
    print" Effect Term: {},".format(variant_annotations.transcript_effects[0].effects[0].term)
    print" Effect Sorce Name: {},".format(variant_annotations.transcript_effects[0].effects[0].source_name)
    print" Effect Source Version: {}\n".format(variant_annotations.transcript_effects[0].effects[0].source_version)


Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDE3NiIsIjhjYjI5MGJhNTcyNzlmNjg1MDc4ZGUwZGNmMGNjYzJiIl0,
 Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDE3NiIsImQwMTZjNGUxYWRjYWQ1ZDFiYzg5YzJjYTRhZGJhM2E4Il0,
 Variant Annotation Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd
 Created: 2014-07-30T00:00:00Z
 Transcript Effects Id: 4ef6c08cbb50fec318e9815d897e511f,
 Featured Id: ENST00000456328,
 Alternate Bases: C,
 Effects Id: SO:0001631,
 Effect Term: upstream_gene_variant,
 Effect Sorce Name: so-xp,
 Effect Source Version: so-xp/releases/2015-11-24/so-xp.owl

Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDIzNCIsImZjZGQzZjU2MTMxNWUwMTM4YWNlMmE4MjA2NjllY2QyIl0,
 Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDIzNCIsIjBjZTM1MDcyNDQ2MTRjMzcwNWY1ZTJhYTJkMTBhZjI1Il0,
 Variant Annotation Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd
 Created: 2014-07-30T00:00:00Z
 Transcript Effects Id: 36da0f368090af154c25b9f56266b920,
 Featured Id: ENST00000456328,
 Alternate Bases: A,
 Effects Id: SO:0001631,
 Effect Term: upstream_gene_variant,
 Effect Sorce Name: so-xp,
 Effect Source Version: so-xp/releases/2015-11-24/so-xp.owl

Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDM1MSIsImZjZGQzZjU2MTMxNWUwMTM4YWNlMmE4MjA2NjllY2QyIl0,
 Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDM1MSIsIjBjZTM1MDcyNDQ2MTRjMzcwNWY1ZTJhYTJkMTBhZjI1Il0,
 Variant Annotation Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd
 Created: 2014-07-30T00:00:00Z
 Transcript Effects Id: 36da0f368090af154c25b9f56266b920,
 Featured Id: ENST00000456328,
 Alternate Bases: A,
 Effects Id: SO:0001631,
 Effect Term: upstream_gene_variant,
 Effect Sorce Name: so-xp,
 Effect Source Version: so-xp/releases/2015-11-24/so-xp.owl

Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDUwNCIsImU3MjNlNGE2MDc1Yjg5MTg0MTNjZjVjNDhmZGJiMzRmIl0,
 Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDUwNCIsIjRhY2NkZmI2ZjY5MmY0MWMzMTRkMDkyODU4ODJhNjg5Il0,
 Variant Annotation Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd
 Created: 2014-07-30T00:00:00Z
 Transcript Effects Id: 7b6e7e2966c746e0b914425614e73a19,
 Featured Id: ENST00000456328,
 Alternate Bases: T,
 Effects Id: SO:0001631,
 Effect Term: upstream_gene_variant,
 Effect Sorce Name: so-xp,
 Effect Source Version: so-xp/releases/2015-11-24/so-xp.owl

Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDUwNSIsImNjNjYzYWU2ZDBkNjA4OTAwOTI3MGNkYmQ2MzYyYzk2Il0,
 Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDUwNSIsImM2MmQwYTNhODAyMmY4MTVkYThmNGNhYmE1Y2ViMzhkIl0,
 Variant Annotation Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd
 Created: 2014-07-30T00:00:00Z
 Transcript Effects Id: 7754189c9696c039f4f65f58b87f4fc1,
 Featured Id: ENST00000456328,
 Alternate Bases: G,
 Effects Id: SO:0001631,
 Effect Term: upstream_gene_variant,
 Effect Sorce Name: so-xp,
 Effect Source Version: so-xp/releases/2015-11-24/so-xp.owl

Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDUxMCIsImU1ZDE3OTIyOWNlYjM5NTBlYjkzZDk0Y2FlNzAxOTI3Il0,
 Variant Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsIjEiLCIxMDUxMCIsIjQ1YzRjZjA2Y2I1Y2RhYmE4NTEyYTE2MWU3MzJiNWMyIl0,
 Variant Annotation Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd
 Created: 2014-07-30T00:00:00Z
 Transcript Effects Id: 36da0f368090af154c25b9f56266b920,
 Featured Id: ENST00000456328,
 Alternate Bases: A,
 Effects Id: SO:0001631,
 Effect Term: upstream_gene_variant,
 Effect Sorce Name: so-xp,
 Effect Source Version: so-xp/releases/2015-11-24/so-xp.owl

Get variant annotation set method

This call returns a specific set when the id of the wanted set is provided.


In [4]:
variant_annotation_set = c.get_variant_annotation_set(variant_annotation_set_id="WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd")

In [5]:
print"Name: {}".format(variant_annotation_set.name)
print" Id: {} ".format(variant_annotation_set.id)
print" Variant Set Id: {}".format(variant_annotation_set.variant_set_id)
print" Analysis Id: {},".format(variant_annotation_set.analysis.id)
print" Analysis Created: {},\n".format(variant_annotation_set.analysis.created)
for info in variant_annotation_set.analysis.info:
    print"{}:   {},".format(info, variant_annotation_set.analysis.info[info].values[0].string_value)


Name: functional-annotation
 Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd 
 Variant Set Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiJd
 Analysis Id: WyIxa2dlbm9tZXMiLCJ2cyIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImZ1bmN0aW9uYWwtYW5ub3RhdGlvbiIsImFuYWx5c2lzIl0,
 Analysis Created: 2014-07-30T00:00:00Z,

INFO.ALOFT:   The Annotation Of Loss-of-Function Transcripts, provides extensive functional annotations to loss-of-function variants in the human genome, https://github.com/gersteinlab/aloft,
INFO.MLEN:   Estimated length of mitochondrial insert,
reference:   ftp://ftp.1000genomes.ebi.ac.uk//vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz,
INFO.GENCODE:   The annotation of coding variants and splice site variants,
INFO.SVTYPE:   Type of structural variant,
INFO.NS:   Number of samples with data,
INFO.DP:   Total read depth,
INFO.AMR_AF:   Allele frequency in the AMR populations calculated from AC and AN, in the range (0,1),
INFO.AFR_AF:   Allele frequency in the AFR populations calculated from AC and AN, in the range (0,1),
FORMAT.GT:   Genotype,
fileDate:   20140730,
INFO.ERB:   Ensembl Regulatory Build. Format: Allele|Gene|Feature|Feature_type|Consequence,
INFO.SVLEN:   Difference in length between REF and ALT alleles,
INFO.MEINFO:   Mobile element info of the form NAME,START,END<POLARITY; If there is only 5' OR 3' support for this call, will be NULL NULL for START and END,
source:   1000GenomesPhase3Pipeline,
INFO.PHOSPHORYLATION:   Predicted as phosphorylation sites by Phosphosite.org. Format: Uniprot Identifier of phosphorylated protein|Position in Uniprot sequence of phosphorylation site|Number of low throughput experiments this site has been seen in|Number of high throughput experiments this site has been seen in,
INFO.SAS_AF:   Allele frequency in the SAS populations calculated from AC and AN, in the range (0,1),
INFO.EUR_AF:   Allele frequency in the EUR populations calculated from AC and AN, in the range (0,1),
INFO.CIPOS:   Confidence interval around POS for imprecise variants,
VEP:   v77 cache=/data/blastdb/Ensembl/vep/homo_sapiens/77_GRCh37 db=.,
INFO.IMPRECISE:   Imprecise structural variation,
INFO.MSTART:   Mitochondrial start coordinate of inserted sequence,
INFO.CSQ:   Consequence type as predicted by VEP WITH -PICK_ALLELE parameter. Format: Allele|Gene|Feature|Feature_type|Consequence|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|SIFT|PolyPhen|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE,
INFO.AC:   Total number of alternate alleles in called genotypes,
INFO.AA:   Ancestral Allele. Format: AA|REF|ALT|IndelType. AA: Ancestral allele, REF:Reference Allele, ALT:Alternate Allele, IndelType:Type of Indel (REF, ALT and IndelType are only defined for indels),
INFO.AF:   Estimated allele frequency in the range (0,1),
INFO.FUNSEQ:   FunSeq score for noncoding SNV,
INFO.END:   End coordinate of this variant,
INFO.CS:   Source call set.,
INFO.AN:   Total number of alleles in called genotypes,
INFO.TSD:   Precise Target Site Duplication for bases, if unknown, value will be NULL,
INFO.MC:   Merged calls.,
INFO.HighD:   The Super population with the higher derived allele frequency for the highD site,
INFO.MEND:   Mitochondrial end coordinate of inserted sequence,
INFO.CIEND:   Confidence interval around END for imprecise variants,
INFO.EAS_AF:   Allele frequency in the EAS populations calculated from AC and AN, in the range (0,1),
fileformat:   VCFv4.1,
Observe that this result contains the same values as does the search request. This is due to only one annotation set being available.
For documentation in the service, and more information go to:

https://ga4gh-schemas.readthedocs.io/en/latest/schemas/allele_annotation_service.proto.html