Creating a SequenceVariant from scratch

0. Overview

A SequenceVariant consists of an accession (a string), a sequence type (a string), and a PosEdit, like this:

var = hgvs.sequencevariant.SequenceVariant(ac='NM_01234.5', type='c', posedit=...)

Unsurprisingly, a PosEdit consists of separate position and Edit objects. A position is generally an Interval, which in turn is comprised of SimplePosition or BaseOffsetPosition objects. An edit is a subclass of Edit, which includes classes like NARefAlt for substitutions, deletions, and insertions) and Dup (for duplications).

Importantly, each of the objects we're building has a rule in the parser, which means that you have the tools to serialize and deserialize (parse) each of the components that we're about to construct.

1. Make an Interval to defined a position of the edit


In [1]:
import hgvs.location
import hgvs.posedit

In [2]:
start = hgvs.location.BaseOffsetPosition(base=200,offset=-6,datum=hgvs.location.Datum.CDS_START)
start, str(start)


Out[2]:
(BaseOffsetPosition(base=200, offset=-6, datum=Datum.CDS_START, uncertain=False),
 '200-6')

In [3]:
end = hgvs.location.BaseOffsetPosition(base=22,datum=hgvs.location.Datum.CDS_END)
end, str(end)


Out[3]:
(BaseOffsetPosition(base=22, offset=0, datum=Datum.CDS_END, uncertain=False),
 '*22')

In [4]:
iv = hgvs.location.Interval(start=start,end=end)
iv, str(iv)


Out[4]:
(Interval(start=200-6, end=*22, uncertain=False), '200-6_*22')

2. Make an edit object


In [5]:
import hgvs.edit, hgvs.posedit

In [6]:
edit = hgvs.edit.NARefAlt(ref='A',alt='T')
edit, str(edit)


Out[6]:
(NARefAlt(ref='A', alt='T', uncertain=False), 'A>T')

In [7]:
posedit = hgvs.posedit.PosEdit(pos=iv,edit=edit)
posedit, str(posedit)


Out[7]:
(PosEdit(pos=200-6_*22, edit=A>T, uncertain=False), '200-6_*22A>T')

3. Make the variant


In [8]:
import hgvs.sequencevariant

In [9]:
var = hgvs.sequencevariant.SequenceVariant(ac='NM_01234.5', type='c', posedit=posedit)
var, str(var)


Out[9]:
(SequenceVariant(ac=NM_01234.5, type=c, posedit=200-6_*22A>T),
 'NM_01234.5:c.200-6_*22A>T')

Important: It is possible to bogus variants with the hgvs package. For example, the above interval is incompatible with a SNV. See hgvs.validator.Validator for validation options.

4. Update your variant

The stringification happens on-the-fly. That means that you can update components of the variant and see the effects immediately.


In [10]:
import copy

In [11]:
var2 = copy.deepcopy(var)
var2.posedit.pos.start.base=456
str(var2)


Out[11]:
'NM_01234.5:c.456-6_*22A>T'

In [12]:
var2 = copy.deepcopy(var)
var2.posedit.edit.alt='CT'
str(var2)


Out[12]:
'NM_01234.5:c.200-6_*22delinsCT'

In [13]:
var2 = copy.deepcopy(var)
str(var2)


Out[13]:
'NM_01234.5:c.200-6_*22A>T'

In [ ]: