See https://github.com/biocommons/hgvs/issues/596
UTA#220 added support for custom transcripts, which really means custom alignments associated with a new alt_aln_method
, "splign-manual". The loading/data/splign-manual contains examples.
Although hgvs supports multiple alignment methods already (e.g., one can specify "blat", "splign" in VariantMapper::g_to_c()
), several areas of code assume one default alt_aln_method
.
This work was sponsored by Invitae, which uses the VariantMapper
interface. Therefore, this new feature focuses on minimal changes to extend hgvs to better support using multiple alignment methods with VariantMapper
. These are demonstrated below.
Initialize hgvs and create two variants from CYP2C19 to use as examples (VCV000634882.1)
In [1]:
from hgvs.easy import (__version__, parser, hdp, vm)
from hgvs.exceptions import HGVSDataNotAvailableError
__version__
Out[1]:
In [2]:
# hgvs_g = "NC_000010.11:g.94762693G>A" # GRCh38
hgvs_g = "NC_000010.10:g.96522450G>A" # GRCh37
hgvs_c = "NM_000769.4:c.-13G>A"
var_c = parser.parse(hgvs_c)
var_g = parser.parse(hgvs_g)
In [3]:
hdp.get_tx_mapping_options(var_c.ac)
Out[3]:
In [4]:
# or, for a more complete example with many options:
hdp.get_tx_mapping_options("NM_001807.4")
Out[4]:
In [5]:
alignments = hdp.get_tx_for_gene("CYP2C19")
alignments.sort(key=lambda a: (a["tx_ac"], a["alt_ac"], a["alt_aln_method"]))
alignments
Out[5]:
The hgvs data provider method get_tx_for_region
used a supplied alignment method, which necessarily filtered alignments for a specific alignment method. A new method, get_alignments_for_region
provides similar functionality with an optional alt_aln_method
parameter.
When alt_aln_method
is None (or not provided), all alignments are returned.
When alt_aln_method
is provided, it behaves exactly like get_tx_for_region
.
In [6]:
start = var_g.posedit.pos.start.base
end = var_g.posedit.pos.end.base
alignments = hdp.get_alignments_for_region(var_g.ac, start, end)
alignments.sort(key=lambda a: (a["tx_ac"], a["alt_ac"], a["alt_aln_method"]))
alignments
Out[6]:
This approach follows the easier-to-ask-for-forgiveness-than-permission principle. Code tries "splign", which is expected to satisfy the majority of cases, and then falls back to "splign-manual" for failures. The advantage of this method is that there is only one database fetch for the most common case.
In [7]:
try:
vm.c_to_g(var_c, "NC_000010.10")
except HGVSDataNotAvailableError as e:
print(f"Got {e!r}")
In [8]:
vm.c_to_g(var_c, "NC_000010.10", alt_aln_method="splign-manual")
Out[8]: