Load libraries


In [1]:
using LightXML
using DataArrays
using DataFrames

Read in data; these were obtained from NCBI based on the link from the Gire et al. paper.


In [2]:
xdoc = parse_file("ebola-sle-2014.gbc.xml");

I start by identifying the root element, which is an INSDSet.


In [3]:
xroot = root(xdoc)
println(name(xroot));


INSDSet

I extract all the sequences and accession numbers as lists, the latter using a comprehension.


In [4]:
sequences = get_elements_by_tagname(xroot, "INSDSeq")
accessions = [content(find_element(s,"INSDSeq_primary-accession")) for s in sequences];

In [5]:
numseq=length(sequences)


Out[5]:
249

This is way more than we have annotations for.

Let's look at the first entry.


In [6]:
sequences[1]


Out[6]:
<INSDSeq>
  <INSDSeq_locus>KM034549</INSDSeq_locus>
  <INSDSeq_length>18835</INSDSeq_length>
  <INSDSeq_strandedness>single</INSDSeq_strandedness>
  <INSDSeq_moltype>cRNA</INSDSeq_moltype>
  <INSDSeq_topology>linear</INSDSeq_topology>
  <INSDSeq_division>VRL</INSDSeq_division>
  <INSDSeq_update-date>15-DEC-2014</INSDSeq_update-date>
  <INSDSeq_create-date>30-JUN-2014</INSDSeq_create-date>
  <INSDSeq_definition>Zaire ebolavirus isolate Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM095B, complete genome</INSDSeq_definition>
  <INSDSeq_primary-accession>KM034549</INSDSeq_primary-accession>
  <INSDSeq_accession-version>KM034549.1</INSDSeq_accession-version>
  <INSDSeq_other-seqids>
    <INSDSeqid>gb|KM034549.1|</INSDSeqid>
    <INSDSeqid>gi|661348595</INSDSeqid>
  </INSDSeq_other-seqids>
  <INSDSeq_project>PRJNA257197</INSDSeq_project>
  <INSDSeq_source>Zaire ebolavirus</INSDSeq_source>
  <INSDSeq_organism>Zaire ebolavirus</INSDSeq_organism>
  <INSDSeq_taxonomy>Viruses; ssRNA viruses; ssRNA negative-strand viruses; Mononegavirales; Filoviridae; Ebolavirus</INSDSeq_taxonomy>
  <INSDSeq_references>
    <INSDReference>
      <INSDReference_reference>1</INSDReference_reference>
      <INSDReference_position>1..18835</INSDReference_position>
      <INSDReference_authors>
        <INSDAuthor>Gire,S.K.</INSDAuthor>
        <INSDAuthor>Goba,A.</INSDAuthor>
        <INSDAuthor>Andersen,K.G.</INSDAuthor>
        <INSDAuthor>Sealfon,R.S.</INSDAuthor>
        <INSDAuthor>Park,D.J.</INSDAuthor>
        <INSDAuthor>Kanneh,L.</INSDAuthor>
        <INSDAuthor>Jalloh,S.</INSDAuthor>
        <INSDAuthor>Momoh,M.</INSDAuthor>
        <INSDAuthor>Fullah,M.</INSDAuthor>
        <INSDAuthor>Dudas,G.</INSDAuthor>
        <INSDAuthor>Wohl,S.</INSDAuthor>
        <INSDAuthor>Moses,L.M.</INSDAuthor>
        <INSDAuthor>Yozwiak,N.L.</INSDAuthor>
        <INSDAuthor>Winnicki,S.</INSDAuthor>
        <INSDAuthor>Matranga,C.B.</INSDAuthor>
        <INSDAuthor>Malboeuf,C.M.</INSDAuthor>
        <INSDAuthor>Qu,J.</INSDAuthor>
        <INSDAuthor>Gladden,A.D.</INSDAuthor>
        <INSDAuthor>Schaffner,S.F.</INSDAuthor>
        <INSDAuthor>Yang,X.</INSDAuthor>
        <INSDAuthor>Jiang,P.P.</INSDAuthor>
        <INSDAuthor>Nekoui,M.</INSDAuthor>
        <INSDAuthor>Colubri,A.</INSDAuthor>
        <INSDAuthor>Coomber,M.R.</INSDAuthor>
        <INSDAuthor>Fonnie,M.</INSDAuthor>
        <INSDAuthor>Moigboi,A.</INSDAuthor>
        <INSDAuthor>Gbakie,M.</INSDAuthor>
        <INSDAuthor>Kamara,F.K.</INSDAuthor>
        <INSDAuthor>Tucker,V.</INSDAuthor>
        <INSDAuthor>Konuwa,E.</INSDAuthor>
        <INSDAuthor>Saffa,S.</INSDAuthor>
        <INSDAuthor>Sellu,J.</INSDAuthor>
        <INSDAuthor>Jalloh,A.A.</INSDAuthor>
        <INSDAuthor>Kovoma,A.</INSDAuthor>
        <INSDAuthor>Koninga,J.</INSDAuthor>
        <INSDAuthor>Mustapha,I.</INSDAuthor>
        <INSDAuthor>Kargbo,K.</INSDAuthor>
        <INSDAuthor>Foday,M.</INSDAuthor>
        <INSDAuthor>Yillah,M.</INSDAuthor>
        <INSDAuthor>Kanneh,F.</INSDAuthor>
        <INSDAuthor>Robert,W.</INSDAuthor>
        <INSDAuthor>Massally,J.L.</INSDAuthor>
        <INSDAuthor>Chapman,S.B.</INSDAuthor>
        <INSDAuthor>Bochicchio,J.</INSDAuthor>
        <INSDAuthor>Murphy,C.</INSDAuthor>
        <INSDAuthor>Nusbaum,C.</INSDAuthor>
        <INSDAuthor>Young,S.</INSDAuthor>
        <INSDAuthor>Birren,B.W.</INSDAuthor>
        <INSDAuthor>Grant,D.S.</INSDAuthor>
        <INSDAuthor>Scheiffelin,J.S.</INSDAuthor>
        <INSDAuthor>Lander,E.S.</INSDAuthor>
        <INSDAuthor>Happi,C.</INSDAuthor>
        <INSDAuthor>Gevao,S.M.</INSDAuthor>
        <INSDAuthor>Gnirke,A.</INSDAuthor>
        <INSDAuthor>Rambaut,A.</INSDAuthor>
        <INSDAuthor>Garry,R.F.</INSDAuthor>
        <INSDAuthor>Khan,S.H.</INSDAuthor>
        <INSDAuthor>Sabeti,P.C.</INSDAuthor>
      </INSDReference_authors>
      <INSDReference_title>Genomic surveillance elucidates Ebola virus origin and transmission during the 2014 outbreak</INSDReference_title>
      <INSDReference_journal>Science 345 (6202), 1369-1372 (2014)</INSDReference_journal>
      <INSDReference_xref>
        <INSDXref>
          <INSDXref_dbname>doi</INSDXref_dbname>
          <INSDXref_id>10.1126/science.1259657</INSDXref_id>
        </INSDXref>
      </INSDReference_xref>
      <INSDReference_pubmed>25214632</INSDReference_pubmed>
    </INSDReference>
    <INSDReference>
      <INSDReference_reference>2</INSDReference_reference>
      <INSDReference_position>1..18835</INSDReference_position>
      <INSDReference_authors>
        <INSDAuthor>Goba,A.</INSDAuthor>
        <INSDAuthor>Khan,H.</INSDAuthor>
        <INSDAuthor>Momoh,M.</INSDAuthor>
        <INSDAuthor>Jalloh,S.</INSDAuthor>
        <INSDAuthor>Fullah,M.</INSDAuthor>
        <INSDAuthor>Kanneh,L.</INSDAuthor>
        <INSDAuthor>Gevao,S.</INSDAuthor>
        <INSDAuthor>Happi,C.</INSDAuthor>
        <INSDAuthor>Gire,S.K.</INSDAuthor>
        <INSDAuthor>Andersen,K.</INSDAuthor>
        <INSDAuthor>Malboeuf,C.</INSDAuthor>
        <INSDAuthor>Matranga,C.</INSDAuthor>
        <INSDAuthor>Sealfon,R.</INSDAuthor>
        <INSDAuthor>Wohl,S.</INSDAuthor>
        <INSDAuthor>Gladden,A.</INSDAuthor>
        <INSDAuthor>Yang,X.</INSDAuthor>
        <INSDAuthor>Winnicki,S.</INSDAuthor>
        <INSDAuthor>Park,D.</INSDAuthor>
        <INSDAuthor>Qu,J.</INSDAuthor>
        <INSDAuthor>Sabeti,P.</INSDAuthor>
        <INSDAuthor>Garry,R.</INSDAuthor>
      </INSDReference_authors>
      <INSDReference_consortium>Viral Hemorrhagic Fever Consortium</INSDReference_consortium>
      <INSDReference_title>Direct Submission</INSDReference_title>
      <INSDReference_journal>Submitted (16-JUN-2014) Infectious Disease Initiative, Broad Institute of MIT and Harvard, 75 Ames St., Cambridge, MA 02142, USA</INSDReference_journal>
    </INSDReference>
  </INSDSeq_references>
  <INSDSeq_comment>##Assembly-Data-START## Assembly Method Novoalign v. 3 Sequencing Technology Illumina ##Assembly-Data-END##</INSDSeq_comment>
  <INSDSeq_feature-table>
    <INSDFeature>
      <INSDFeature_key>source</INSDFeature_key>
      <INSDFeature_location>1..18835</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>1</INSDInterval_from>
          <INSDInterval_to>18835</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>organism</INSDQualifier_name>
          <INSDQualifier_value>Zaire ebolavirus</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>mol_type</INSDQualifier_name>
          <INSDQualifier_value>viral cRNA</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>isolate</INSDQualifier_name>
          <INSDQualifier_value>Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM095B</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>isolation_source</INSDQualifier_name>
          <INSDQualifier_value>serum</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>host</INSDQualifier_name>
          <INSDQualifier_value>Homo sapiens</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>db_xref</INSDQualifier_name>
          <INSDQualifier_value>taxon:186538</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>country</INSDQualifier_name>
          <INSDQualifier_value>Sierra Leone</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>collection_date</INSDQualifier_name>
          <INSDQualifier_value>25-May-2014</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>gene</INSDFeature_key>
      <INSDFeature_location>11..2981</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>11</INSDInterval_from>
          <INSDInterval_to>2981</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>NP</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>mRNA</INSDFeature_key>
      <INSDFeature_location>11..2981</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>11</INSDInterval_from>
          <INSDInterval_to>2981</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>NP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>nucleoprotein</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transcription</INSDQualifier_name>
          <INSDQualifier_value>GAGGAAGATTAATAATTTTCCTCTCATTGAAATTTATATCGGAATTTAAATTGAAATTGTTACTGTAATCATACCTGGTTTGTTTCAGAGCCATATCACCAAGATAGAGAACAACCTAGGTCTCCGGAGGGGGCAAGGGCATCAGTGTGCTCAGTTGAAAATCCCTTGTCAACATCTAGGCCTTATCACATCACAAGTTCCGCCTTAAACTCTGCAGGGTGATCCAACAACCTTAATAGCAACATTATTGTTAAAGGACAGCATTAGTTCACAGTCAAACAAGCAAGATTGAGAATTAACTTTGATTTTGAACCTGAACACCCAGAGGACTGGAGACTCAACAACCCTAAAGCCTGGGGTAAAACATTAGAAATAGTTTAAAGACAAATTGCTCGGAATCACAAAATTCCGAGTATGGATTCTCGTCCTCAGAAAGTCTGGATGACGCCGAGTCTCACTGAATCTGACATGGATTACCACAAGATCTTGACAGCAGGTCTGTCCGTTCAACAGGGGATTGTTCGGCAAAGAGTCATCCCAGTGTATCAAGTAAACAATCTTGAGGAAATTTGCCAACTTATCATACAGGCCTTTGAAGCTGGTGTTGATTTTCAAGAGAGTGCGGACAGTTTCCTTCTCATGCTTTGTCTTCATCATGCGTACCAAGGAGATTACAAACTTTTCTTGGAAAGTGGCGCAGTCAAGTATTTGGAAGGGCACGGGTTCCGTTTTGAAGTCAAGAAGCGTGATGGAGTGAAGCGCCTTGAGGAATTGCTGCCAGCAGTATCTAGTGGGAGAAACATTAAGAGAACACTTGCTGCCATGCCGGAAGAGGAGACGACTGAAGCTAATGCCGGTCAGTTCCTCTCCTTTGCAAGTCTATTCCTTCCGAAATTGGTAGTAGGAGAAAAGGCTTGCCTTGAGAAGGTTCAAAGGCAAATTCAAGTACATGCAGAGCAAGGACTGATACAATATCCAACAGCTTGGCAATCAGTAGGACACATGATGGTGATTTTCCGTTTGATGCGAACAAATTTTTTGATCAAATTTCTTCTAATACACCAAGGGATGCACATGGTTGCCGGACATGATGCCAACGATGCTGTGATTTCAAATTCAGTGGCTCAAGCTCGTTTTTCAGGTCTATTGATTGTCAAAACAGTACTTGATCATATCCTACAAAAGACAGAACGAGGAGTTCGTCTCCATCCTCTTGCAAGGACCGCCAAGGTAAAAAATGAGGTGAACTCCTTCAAGGCTGCACTCAGCTCCCTGGCCAAGCATGGAGAGTATGCTCCTTTCGCCCGACTTTTGAACCTTTCTGGAGTAAATAATCTTGAGCATGGTCTTTTCCCTCAACTGTCGGCAATTGCACTCGGAGTCGCCACAGCCCACGGGAGCACCCTCGCAGGAGTAAATGTTGGAGAACAGTATCAACAGCTCAGAGAGGCAGCCACTGAGGCTGAGAAGCAACTCCAACAATATGCGGAGTCTCGTGAACTTGACCATCTTGGACTTGATGATCAGGAAAAGAAAATTCTTATGAACTTCCATCAGAAAAAGAACGAAATCAGCTTCCAGCAAACAAACGCGATGGTAACTCTAAGAAAAGAGCGCCTGGCCAAGCTGACAGAAGCTATCACTGCTGCATCACTGCCCAAAACAAGTGGACATTACGATGATGATGACGACATTCCCTTTCCAGGACCCATCAATGATGACGACAATCCTGGCCATCAAGATGATGATCCGACTGACTCACAGGATACGACCATTCCCGATGTGGTAGTTGACCCCGATGATGGAGGCTACGGCGAATACCAAAGTTACTCGGAAAACGGCATGAGTGCACCAGATGACTTGGTCCTATTCGATCTAGACGAGGACGACGAGGACACCAAGCCAGTGCCTAACAGATCGACCAAGGGTGGACAACAGAAAAACAGTCAAAAGGGCCAGCATACAGAGGGCAGACAGACACAATCCACGCCAACTCAAAACGTCACAGGCCCTCGCAGAACAATCCACCATGCCAGTGCTCCACTCACGGACAATGACAGAAGAAACGAACCCTCCGGCTCAACCAGCCCTCGCATGCTGACCCCAATCAACGAAGAGGCAGACCCACTGGACGATGCCGACGACGAGACGTCTAGCCTTCCGCCCTTAGAGTCAGATGATGAAGAACAGGACAGGGACGGAACTTCTAACCGCACACCCACTGTCGCCCCACCGGCTCCCGTATACAGAGATCACTCCGAAAAGAAAGAACTCCCGCAAGATGAACAACAAGATCAGGACCACATTCAAGAGGCCAGGAACCAAGACAGTGACAACACCCAGCCAGAACATTCTTTTGAGGAGATGTATCGCCACATTCTAAGATCACAGGGGCCATTTGATGCCGTTTTGTATTATCATATGATGAAGGATGAGCCTGTAGTTTTCAGTACCAGTGATGGTAAAGAGTACACGTATCCGGACTCCCTTGAAGAGGAATATCCACCATGGCTCACTGAAAAAGAGGCCATGAATGATGAGAATAGATTTGTTACACTGGATGGTCAACAATTTTATTGGCCAGTAATGAATCACAGGAATAAATTCATGGCAATCCTGCAACATCATCAGTGAATGAGCATGTAATAATGGGATGATTTAATCGACAAATAGCTAACATTAAATAGTCAAGGAACGCAAACAGGAAGAATTTTTGATGTCTAAGGTGTGAATTATTATCACAATAAAAGTGATTCTTAGTTTTGAATTTAAAGCTAGCTTATTATTACTAGCCGTTTTTCAAAGTTCAATTTGAGTCTTAATGCAAATAAGCGTTAAGCCACAGTTATAGCCATAATGGTAACTCAATATCTTAGCCAGCGATTTATCTAAATTAAATTACATTATGCTTTTATAACTTACCTACTAGCCTGCCCAACATTTACACGATCGTTTTATAATTAAGAAAAAA</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>11..22</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>11</INSDInterval_from>
          <INSDInterval_to>22</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>other</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>L</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative transcription start signal</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>CDS</INSDFeature_key>
      <INSDFeature_location>425..2644</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>425</INSDInterval_from>
          <INSDInterval_to>2644</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>NP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>involved in encapsidation of genomic RNA</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>codon_start</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transl_table</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>nucleoprotein</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>protein_id</INSDQualifier_name>
          <INSDQualifier_value>AIE11797.1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>db_xref</INSDQualifier_name>
          <INSDQualifier_value>GI:661348596</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>translation</INSDQualifier_name>
          <INSDQualifier_value>MDSRPQKVWMTPSLTESDMDYHKILTAGLSVQQGIVRQRVIPVYQVNNLEEICQLIIQAFEAGVDFQESADSFLLMLCLHHAYQGDYKLFLESGAVKYLEGHGFRFEVKKRDGVKRLEELLPAVSSGRNIKRTLAAMPEEETTEANAGQFLSFASLFLPKLVVGEKACLEKVQRQIQVHAEQGLIQYPTAWQSVGHMMVIFRLMRTNFLIKFLLIHQGMHMVAGHDANDAVISNSVAQARFSGLLIVKTVLDHILQKTERGVRLHPLARTAKVKNEVNSFKAALSSLAKHGEYAPFARLLNLSGVNNLEHGLFPQLSAIALGVATAHGSTLAGVNVGEQYQQLREAATEAEKQLQQYAESRELDHLGLDDQEKKILMNFHQKKNEISFQQTNAMVTLRKERLAKLTEAITAASLPKTSGHYDDDDDIPFPGPINDDDNPGHQDDDPTDSQDTTIPDVVVDPDDGGYGEYQSYSENGMSAPDDLVLFDLDEDDEDTKPVPNRSTKGGQQKNSQKGQHTEGRQTQSTPTQNVTGPRRTIHHASAPLTDNDRRNEPSGSTSPRMLTPINEEADPLDDADDETSSLPPLESDDEEQDRDGTSNRTPTVAPPAPVYRDHSEKKELPQDEQQDQDHIQEARNQDSDNTQPEHSFEEMYRHILRSQGPFDAVLYYHMMKDEPVVFSTSDGKEYTYPDSLEEEYPPWLTEKEAMNDENRFVTLDGQQFYWPVMNHRNKFMAILQHHQ</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>2970..2981</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>2970</INSDInterval_from>
          <INSDInterval_to>2981</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>polyA_signal_sequence</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>NP</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>gene</INSDFeature_key>
      <INSDFeature_location>2987..4362</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>2987</INSDInterval_from>
          <INSDInterval_to>4362</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP35</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>mRNA</INSDFeature_key>
      <INSDFeature_location>2987..4362</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>2987</INSDInterval_from>
          <INSDInterval_to>4362</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP35</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>polymerase complex protein</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transcription</INSDQualifier_name>
          <INSDQualifier_value>GATGAAGATTAAAACCTTCATCATCCTTACGTCAATTGAATTCTCTAGCACTAGAAGCTTATTGTCTTCAATGTAAAAGAAAAGCTGGCCTAACAAGATGACAACTAGAACAAAGGGCAGGGGCCATACTGTGGCCACGACTCAAAACGACAGAATGCCAGGCCCTGAGCTTTCGGGCTGGATCTCTGAGCAGCTAATGACCGGAAGGATTCCTGTAAACGACATCTTCTGTGATATTGAGAACAATCCAGGATTATGCTACGCATCCCAAATGCAACAAACGAAGCCAAACCCGAAGATGCGCAACAGTCAAACCCAAACGGACCCAATTTGCAATCATAGTTTTGAGGAGGTAGTACAAACATTGGCTTCATTGGCTACTGTTGTGCAACAACAAACCATCGCATCAGAATCATTAGAACAACGCATTACGAGTCTTGAGAATGGTCTAAAGCCAGTTTATGATATGGCAAAAACAATCTCCTCATTGAACAGGGTTTGTGCTGAGATGGTTGCAAAATATGATCTTCTGGTGATGACAACCGGTCGGGCAACAGCAACCGCTGCGGCAACTGAGGCTTATTGGGCTGAACATGGTCAACCACCACCTGGACCATCACTTTATGAAGAAAGTGCGATTCGGGGTAAGATTGAATCTAGAGATGAGACTGTCCCTCAAAGTGTTAGGGAGGCATTCAACAATCTAGACAGTACCACTTCACTAACTGAGGAAAATTTTGGGAAACCTGACATTTCGGCAAAGGATTTGAGAAACATTATGTATGATCACTTGCCTGGTTTTGGAACTGCTTTCCACCAATTAGTACAAGTGATTTGTAAATTGGGAAAAGATAGCAATTCATTGGACATTATTCATGCTGAGTTCCAGGCCAGCCTGGCTGAAGGAGACTCCCCTCAATGTGCCCTAATTCAAATTACAAAAAGAGTTCCAATCTTCCAAGATGCTGCTCCACCTGTCATCCACATCCGCTCTCGAGGTGACATTCCCCGAGCTTGCCAGAAGAGCTTGCGTCCAGTCCCACCATCACCCAAGATTGATCGAGGTTGGGTATGTGTTTTTCAGCTTCAAGATGGTAAAACACTTGGACTCAAAATTTGAGCCAATCTCTTTTCCCTCCGAAAGAGGCAACTAATAGCAGAGGCTTCAACTGCTGAACTATAGGGTATGTTACATTAATGATACACTTGTGAGTATCAGCCCTAGATAATATAAGTCAATTAAACAACCAAGATAAAATTGTTCATATCCCGCTAGCAGCTTTAAAGATAAATGTAATAGGAGCTATACCTCTGACAGTATTATAATTAATTGTTATTAAGTAACCCAAACCAAAAATGATGAAGATTAAGAAAAA</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>2987..2998</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>2987</INSDInterval_from>
          <INSDInterval_to>2998</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>other</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP35</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative transcription start signal</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>CDS</INSDFeature_key>
      <INSDFeature_location>3084..4106</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>3084</INSDInterval_from>
          <INSDInterval_to>4106</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP35</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>involved in encapsidation of genomic RNA</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>codon_start</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transl_table</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>polymerase complex protein</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>protein_id</INSDQualifier_name>
          <INSDQualifier_value>AIE11798.1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>db_xref</INSDQualifier_name>
          <INSDQualifier_value>GI:661348597</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>translation</INSDQualifier_name>
          <INSDQualifier_value>MTTRTKGRGHTVATTQNDRMPGPELSGWISEQLMTGRIPVNDIFCDIENNPGLCYASQMQQTKPNPKMRNSQTQTDPICNHSFEEVVQTLASLATVVQQQTIASESLEQRITSLENGLKPVYDMAKTISSLNRVCAEMVAKYDLLVMTTGRATATAAATEAYWAEHGQPPPGPSLYEESAIRGKIESRDETVPQSVREAFNNLDSTTSLTEENFGKPDISAKDLRNIMYDHLPGFGTAFHQLVQVICKLGKDSNSLDIIHAEFQASLAEGDSPQCALIQITKRVPIFQDAAPPVIHIRSRGDIPRACQKSLRPVPPSPKIDRGWVCVFQLQDGKTLGLKI</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>gene</INSDFeature_key>
      <INSDFeature_location>4345..5849</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>4345</INSDInterval_from>
          <INSDInterval_to>5849</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP40</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>mRNA</INSDFeature_key>
      <INSDFeature_location>4345..5849</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>4345</INSDInterval_from>
          <INSDInterval_to>5849</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP40</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>matrix protein</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transcription</INSDQualifier_name>
          <INSDQualifier_value>GATGAAGATTAAGAAAAACCTACCTCGACTGAGAGAGTGTTTTTTCATTAACCTTCATCTTGTAAACGTTGAGCAAAATTGTTAAAAATATGAGGCGGGTTATATTGCCTACTGCTCCTCCTGAATATATGGAGGCCATATACCCTGCCAGGTCAAATTCAACAATTGCTAGGGGTGGCAACAGCAATACAGGCTTCCTGACACCGGAGTCAGTCAATGGAGACACTCCATCGAATCCACTCAGGCCAATTGCTGATGACACCATCGACCATGCCAGCCACACACCAGGCAGTGTGTCATCAGCATTCATCCTCGAAGCTATGGTGAATGTCATATCGGGCCCCAAAGTGCTAATGAAGCAAATTCCAATTTGGCTTCCTCTAGGTGTCGCTGATCAAAAGACCTACAGCTTTGACTCAACTACGGCCGCCATCATGCTTGCTTCATATACTATCACCCATTTCGGCAAGGCAACCAATCCGCTTGTCAGAGTCAATCGGCTGGGTCCTGGAATCCCGGATCACCCCCTCAGGCTCCTGCGAATTGGAAACCAGGCTTTCCTCCAGGAGTTCGTTCTTCCACCAGTCCAACTACCCCAGTATTTCACCTTTGATTTGACAGCACTCAAACTGATCACTCAACCACTGCCTGCTGCAACATGGACCGATGACACTCCAACTGGATCAAATGGAGCGTTGCGTCCAGGAATTTCATTTCATCCAAAACTTCGCCCCATTCTTTTACCCAACAAAAGTGGGAAGAAGGGGAACAGTGCCGATCTAACATCTCCGGAGAAAATCCAAGCAATAATGACTTCACTCCAGGACTTTAAGATCGTTCCAATTGATCCAACCAAAAATATCATGGGTATCGAAGTGCCAGAAACTCTGGTCCACAAGCTGACCGGTAAGAAGGTGACTTCCAAAAATGGACAACCAATCATCCCTGTTCTTTTGCCAAAGTACATTGGGTTGGACCCGGTGGCTCCAGGAGACCTCACCATGGTAATCACACAGGATTGTGACACGTGTCATTCTCCTGCAAGTCTTCCAGCTGTGGTTGAGAAGTAATTGCAATAATTGACTCAGATCCAGTTTTACAGAATCTTCTCAGGGATAGTGATAACATCTTTTTAATAATCCGTCTACTAGAAGAGATACTTCTAATTGATCAATATACTAAAGGTGCTTTACACCATTGTCTCTTTTCTCTCCTAAATGTAGAGCTTAACAAAAGACTCATAATATACCTGTTTTTAAAAGATTGATTGATGAAAGATCATGACTAATAACATTACAAACAATCCTACTATAATCAATACGGTGATTCAAATGTCAATCTTTCTCATTGCACATACTCTTTGTCCTTATCCTCAAATTGCCTACATGCTTACATCTGAGGACAGCCAGTGTGACTTGGATTGGAGATGTGGAGGAAAAATCGGGGCCCATTTCTAAGTTGTTCACAATCTAAGTACAGACATTGCTCTTCTAATTAAGAAAAAA</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>4345..4356</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>4345</INSDInterval_from>
          <INSDInterval_to>4356</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>other</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP40</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative transcription start signal</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>4352..4362</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>4352</INSDInterval_from>
          <INSDInterval_to>4362</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>polyA_signal_sequence</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP35</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>CDS</INSDFeature_key>
      <INSDFeature_location>4434..5414</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>4434</INSDInterval_from>
          <INSDInterval_to>5414</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP40</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>codon_start</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transl_table</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>matrix protein</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>protein_id</INSDQualifier_name>
          <INSDQualifier_value>AIE11799.1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>db_xref</INSDQualifier_name>
          <INSDQualifier_value>GI:661348598</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>translation</INSDQualifier_name>
          <INSDQualifier_value>MRRVILPTAPPEYMEAIYPARSNSTIARGGNSNTGFLTPESVNGDTPSNPLRPIADDTIDHASHTPGSVSSAFILEAMVNVISGPKVLMKQIPIWLPLGVADQKTYSFDSTTAAIMLASYTITHFGKATNPLVRVNRLGPGIPDHPLRLLRIGNQAFLQEFVLPPVQLPQYFTFDLTALKLITQPLPAATWTDDTPTGSNGALRPGISFHPKLRPILLPNKSGKKGNSADLTSPEKIQAIMTSLQDFKIVPIDPTKNIMGIEVPETLVHKLTGKKVTSKNGQPIIPVLLPKYIGLDPVAPGDLTMVITQDCDTCHSPASLPAVVEK</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>5838..5849</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>5838</INSDInterval_from>
          <INSDInterval_to>5849</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>polyA_signal_sequence</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP40</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>gene</INSDFeature_key>
      <INSDFeature_location>5855..8260</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>5855</INSDInterval_from>
          <INSDInterval_to>8260</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>GP</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>5855..5866</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>5855</INSDInterval_from>
          <INSDInterval_to>5866</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>other</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>GP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative transcription start signal</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>mRNA</INSDFeature_key>
      <INSDFeature_location>join(&lt;5994..6878,6878..&gt;8023)</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>5994</INSDInterval_from>
          <INSDInterval_to>6878</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
        <INSDInterval>
          <INSDInterval_from>6878</INSDInterval_from>
          <INSDInterval_to>8023</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_operator>join</INSDFeature_operator>
      <INSDFeature_partial5 value="true"/>
      <INSDFeature_partial3 value="true"/>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>GP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>virion spike glycoprotein precursor</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>ribosomal slippage</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transcription</INSDQualifier_name>
          <INSDQualifier_value>ATGGGTGTTACAGGAATATTGCAGTTACCTCGTGATCGATTCAAGAGGACATCATTCTTTCTTTGGGTAATTATCCTTTTCCAAAGAACATTTTCCATCCCGCTTGGAGTTATCCACAATAGTACATTACAGGTTAGTGATGTCGACAAACTAGTTTGTCGTGACAAACTGTCATCCACAAATCAATTGAGATCAGTTGGACTGAATCTCGAGGGGAATGGAGTGGCAACTGACGTGCCATCTGTGACTAAAAGATGGGGCTTCAGGTCCGGTGTCCCACCAAAGGTGGTCAATTATGAAGCTGGTGAATGGGCTGAAAACTGCTACAATCTTGAAATCAAAAAACCTGACGGGAGTGAGTGTCTACCAGCAGCGCCAGACGGGATTCGGGGCTTCCCCCGGTGCCGGTATGTGCACAAAGTATCAGGAACGGGACCATGTGCCGGAGACTTTGCCTTCCACAAAGAGGGTGCTTTCTTCCTGTATGATCGACTTGCTTCCACAGTTATCTACCGAGGAACGACTTTCGCTGAAGGTGTCGTTGCATTTCTGATACTGCCCCAAGCTAAGAAGGACTTCTTCAGCTCACACCCCTTGAGAGAGCCGGTCAATGCAACGGAGGACCCGTCGAGTGGCTATTATTCTACCACAATTAGATATCAGGCTACCGGTTTTGGAACTAATGAGACAGAGTACTTGTTCGAGGTTGACAATTTGACCTACGTCCAACTTGAATCAAGATTCACACCACAGTTTCTGCTCCAGCTGAATGAGACAATATATGCAAGTGGGAAGAGGAGCAACACCACGGGAAAACTAATTTGGAAGGTCAACCCCGAAATTGATACAACAATCGGGGAGTGGGCCTTCTGGGAAACTAAAAAAAACCTCACTAGAAAAATTCGCAGTGAAGAGTTGTCTTTCACAGCTGTATCAAACGGACCCAAAAACATCAGTGGTCAGAGTCCGGCGCGAACTTCTTCCGACCCAGAGACCAACACAACAAATGAAGACCACAAAATCATGGCTTCAGAAAATTCCTCTGCAATGGTTCAAGTGCACAGTCAAGGAAGGAAAGCTGCAGTGTCGCATCTGACAACCCTTGCCACAATCTCCACGAGTCCTCAACCTCCCACAACCAAAACAGGTCCGGACAACAGCACCCATAATACACCCGTGTATAAACTTGACATCTCTGAGGCAACTCAAGTTGGACAACATCACCGTAGAGCAGACAACGACAGCACAGCCTCCGACACTCCCCCCGCCACGACCGCAGCCGGACCCTTAAAAGCAGAGAACACCAACACGAGTAAGAGCGCTGACTCCCTGGACCTCGCCACCACGACAAGCCCCCAAAACTACAGCGAGACTGCTGGCAACAACAACACTCATCACCAAGATACCGGAGAAGAGAGTGCCAGCAGCGGGAAGCTAGGCTTAATTACCAATACTATTGCTGGAGTAGCAGGACTGATCACAGGCGGGAGAAGGACTCGAAGAGAAGTAATTGTCAATGCTCAACCCAAATGCAACCCCAATTTACATTACTGGACTACTCAGGATGAAGGTGCTGCAATCGGATTGGCCTGGATACCATATTTCGGGCCAGCAGCCGAAGGAATTTACACAGAGGGGCTAATGCACAACCAAGATGGTTTAATCTGTGGGTTGAGGCAGCTGGCCAACGAAACGACTCAAGCTCTCCAACTGTTCCTGAGAGCCACAACTGAGCTGCGAACCTTTTCAATCCTCAACCGTAAGGCAATTGACTTCCTGCTGCAGCGATGGGGTGGCACATGCCACATTTTGGGACCGGACTGCTGTATCGAACCACATGATTGGACCAAGAACATAACAGACAAAATTGATCAGATTATTCATGATTTTGTTGATAAAACCCTTCCGGACCAGGGGGACAATGACAATTGGTGGACAGGATGGAGACAATGGATACCGGCAGGTATTGGAGTTACAGGTGTTATAATTGCAGTTATCGCTTTATTCTGTATATGCAAATTTGTCTTTTAG</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>CDS</INSDFeature_key>
      <INSDFeature_location>join(5994..6878,6878..8023)</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>5994</INSDInterval_from>
          <INSDInterval_to>6878</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
        <INSDInterval>
          <INSDInterval_from>6878</INSDInterval_from>
          <INSDInterval_to>8023</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_operator>join</INSDFeature_operator>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>GP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>ribosomal_slippage</INSDQualifier_name>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>additional A residue inserted during transcription; encodes two disulfide linked subunits GP1 and GP2; receptor binding and fusion</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>codon_start</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transl_table</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>virion spike glycoprotein precursor</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>protein_id</INSDQualifier_name>
          <INSDQualifier_value>AIE11800.1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>db_xref</INSDQualifier_name>
          <INSDQualifier_value>GI:661348599</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>translation</INSDQualifier_name>
          <INSDQualifier_value>MGVTGILQLPRDRFKRTSFFLWVIILFQRTFSIPLGVIHNSTLQVSDVDKLVCRDKLSSTNQLRSVGLNLEGNGVATDVPSVTKRWGFRSGVPPKVVNYEAGEWAENCYNLEIKKPDGSECLPAAPDGIRGFPRCRYVHKVSGTGPCAGDFAFHKEGAFFLYDRLASTVIYRGTTFAEGVVAFLILPQAKKDFFSSHPLREPVNATEDPSSGYYSTTIRYQATGFGTNETEYLFEVDNLTYVQLESRFTPQFLLQLNETIYASGKRSNTTGKLIWKVNPEIDTTIGEWAFWETKKNLTRKIRSEELSFTAVSNGPKNISGQSPARTSSDPETNTTNEDHKIMASENSSAMVQVHSQGRKAAVSHLTTLATISTSPQPPTTKTGPDNSTHNTPVYKLDISEATQVGQHHRRADNDSTASDTPPATTAAGPLKAENTNTSKSADSLDLATTTSPQNYSETAGNNNTHHQDTGEESASSGKLGLITNTIAGVAGLITGGRRTRREVIVNAQPKCNPNLHYWTTQDEGAAIGLAWIPYFGPAAEGIYTEGLMHNQDGLICGLRQLANETTQALQLFLRATTELRTFSILNRKAIDFLLQRWGGTCHILGPDCCIEPHDWTKNITDKIDQIIHDFVDKTLPDQGDNDNWWTGWRQWIPAGIGVTGVIIAVIALFCICKFVF</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>mRNA</INSDFeature_key>
      <INSDFeature_location>&lt;5994..&gt;7088</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>5994</INSDInterval_from>
          <INSDInterval_to>7088</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_partial5 value="true"/>
      <INSDFeature_partial3 value="true"/>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>GP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>sGP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transcription</INSDQualifier_name>
          <INSDQualifier_value>ATGGGTGTTACAGGAATATTGCAGTTACCTCGTGATCGATTCAAGAGGACATCATTCTTTCTTTGGGTAATTATCCTTTTCCAAAGAACATTTTCCATCCCGCTTGGAGTTATCCACAATAGTACATTACAGGTTAGTGATGTCGACAAACTAGTTTGTCGTGACAAACTGTCATCCACAAATCAATTGAGATCAGTTGGACTGAATCTCGAGGGGAATGGAGTGGCAACTGACGTGCCATCTGTGACTAAAAGATGGGGCTTCAGGTCCGGTGTCCCACCAAAGGTGGTCAATTATGAAGCTGGTGAATGGGCTGAAAACTGCTACAATCTTGAAATCAAAAAACCTGACGGGAGTGAGTGTCTACCAGCAGCGCCAGACGGGATTCGGGGCTTCCCCCGGTGCCGGTATGTGCACAAAGTATCAGGAACGGGACCATGTGCCGGAGACTTTGCCTTCCACAAAGAGGGTGCTTTCTTCCTGTATGATCGACTTGCTTCCACAGTTATCTACCGAGGAACGACTTTCGCTGAAGGTGTCGTTGCATTTCTGATACTGCCCCAAGCTAAGAAGGACTTCTTCAGCTCACACCCCTTGAGAGAGCCGGTCAATGCAACGGAGGACCCGTCGAGTGGCTATTATTCTACCACAATTAGATATCAGGCTACCGGTTTTGGAACTAATGAGACAGAGTACTTGTTCGAGGTTGACAATTTGACCTACGTCCAACTTGAATCAAGATTCACACCACAGTTTCTGCTCCAGCTGAATGAGACAATATATGCAAGTGGGAAGAGGAGCAACACCACGGGAAAACTAATTTGGAAGGTCAACCCCGAAATTGATACAACAATCGGGGAGTGGGCCTTCTGGGAAACTAAAAAAACCTCACTAGAAAAATTCGCAGTGAAGAGTTGTCTTTCACAGCTGTATCAAACGGACCCAAAAACATCAGTGGTCAGAGTCCGGCGCGAACTTCTTCCGACCCAGAGACCAACACAACAAATGAAGACCACAAAATCATGGCTTCAGAAAATTCCTCTGCAATGGTTCAAGTGCACAGTCAAGGAAGGAAAGCTGCAGTGTCGCATCTGA</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>CDS</INSDFeature_key>
      <INSDFeature_location>5994..7088</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>5994</INSDInterval_from>
          <INSDInterval_to>7088</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>GP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>sGP secreted as an anti-parallel oriented homodimer; small non-structural secreted glycoprotein</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>codon_start</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transl_table</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>sGP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>protein_id</INSDQualifier_name>
          <INSDQualifier_value>AIE11801.1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>db_xref</INSDQualifier_name>
          <INSDQualifier_value>GI:661348600</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>translation</INSDQualifier_name>
          <INSDQualifier_value>MGVTGILQLPRDRFKRTSFFLWVIILFQRTFSIPLGVIHNSTLQVSDVDKLVCRDKLSSTNQLRSVGLNLEGNGVATDVPSVTKRWGFRSGVPPKVVNYEAGEWAENCYNLEIKKPDGSECLPAAPDGIRGFPRCRYVHKVSGTGPCAGDFAFHKEGAFFLYDRLASTVIYRGTTFAEGVVAFLILPQAKKDFFSSHPLREPVNATEDPSSGYYSTTIRYQATGFGTNETEYLFEVDNLTYVQLESRFTPQFLLQLNETIYASGKRSNTTGKLIWKVNPEIDTTIGEWAFWETKKTSLEKFAVKSCLSQLYQTDPKTSVVRVRRELLPTQRPTQQMKTTKSWLQKIPLQWFKCTVKEGKLQCRI</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>mRNA</INSDFeature_key>
      <INSDFeature_location>join(&lt;5994..6877,6879..&gt;6888)</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>5994</INSDInterval_from>
          <INSDInterval_to>6877</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
        <INSDInterval>
          <INSDInterval_from>6879</INSDInterval_from>
          <INSDInterval_to>6888</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_operator>join</INSDFeature_operator>
      <INSDFeature_partial5 value="true"/>
      <INSDFeature_partial3 value="true"/>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>GP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>ssGP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>ribosomal slippage</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transcription</INSDQualifier_name>
          <INSDQualifier_value>ATGGGTGTTACAGGAATATTGCAGTTACCTCGTGATCGATTCAAGAGGACATCATTCTTTCTTTGGGTAATTATCCTTTTCCAAAGAACATTTTCCATCCCGCTTGGAGTTATCCACAATAGTACATTACAGGTTAGTGATGTCGACAAACTAGTTTGTCGTGACAAACTGTCATCCACAAATCAATTGAGATCAGTTGGACTGAATCTCGAGGGGAATGGAGTGGCAACTGACGTGCCATCTGTGACTAAAAGATGGGGCTTCAGGTCCGGTGTCCCACCAAAGGTGGTCAATTATGAAGCTGGTGAATGGGCTGAAAACTGCTACAATCTTGAAATCAAAAAACCTGACGGGAGTGAGTGTCTACCAGCAGCGCCAGACGGGATTCGGGGCTTCCCCCGGTGCCGGTATGTGCACAAAGTATCAGGAACGGGACCATGTGCCGGAGACTTTGCCTTCCACAAAGAGGGTGCTTTCTTCCTGTATGATCGACTTGCTTCCACAGTTATCTACCGAGGAACGACTTTCGCTGAAGGTGTCGTTGCATTTCTGATACTGCCCCAAGCTAAGAAGGACTTCTTCAGCTCACACCCCTTGAGAGAGCCGGTCAATGCAACGGAGGACCCGTCGAGTGGCTATTATTCTACCACAATTAGATATCAGGCTACCGGTTTTGGAACTAATGAGACAGAGTACTTGTTCGAGGTTGACAATTTGACCTACGTCCAACTTGAATCAAGATTCACACCACAGTTTCTGCTCCAGCTGAATGAGACAATATATGCAAGTGGGAAGAGGAGCAACACCACGGGAAAACTAATTTGGAAGGTCAACCCCGAAATTGATACAACAATCGGGGAGTGGGCCTTCTGGGAAACTAAAAAACCTCACTAG</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>CDS</INSDFeature_key>
      <INSDFeature_location>join(5994..6877,6879..6888)</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>5994</INSDInterval_from>
          <INSDInterval_to>6877</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
        <INSDInterval>
          <INSDInterval_from>6879</INSDInterval_from>
          <INSDInterval_to>6888</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_operator>join</INSDFeature_operator>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>GP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>ribosomal_slippage</INSDQualifier_name>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>one A residue is deleted or two additional A residues are inserted at the editing site during transcription; second non-structural secreted glycoprotein; secreted in a monomeric form</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>codon_start</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transl_table</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>ssGP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>protein_id</INSDQualifier_name>
          <INSDQualifier_value>AIE11802.1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>db_xref</INSDQualifier_name>
          <INSDQualifier_value>GI:661348601</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>translation</INSDQualifier_name>
          <INSDQualifier_value>MGVTGILQLPRDRFKRTSFFLWVIILFQRTFSIPLGVIHNSTLQVSDVDKLVCRDKLSSTNQLRSVGLNLEGNGVATDVPSVTKRWGFRSGVPPKVVNYEAGEWAENCYNLEIKKPDGSECLPAAPDGIRGFPRCRYVHKVSGTGPCAGDFAFHKEGAFFLYDRLASTVIYRGTTFAEGVVAFLILPQAKKDFFSSHPLREPVNATEDPSSGYYSTTIRYQATGFGTNETEYLFEVDNLTYVQLESRFTPQFLLQLNETIYASGKRSNTTGKLIWKVNPEIDTTIGEWAFWETKKPH</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>misc_feature</INSDFeature_key>
      <INSDFeature_location>7481..7495</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>7481</INSDInterval_from>
          <INSDInterval_to>7495</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>GP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>cleavage site; precursor GP is cleaved by subtilisin-like cellularprotease furin into subunits GP1 and GP2 that are linked by a disulfide bond</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>misc_feature</INSDFeature_key>
      <INSDFeature_location>7745..7825</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>7745</INSDInterval_from>
          <INSDInterval_to>7825</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>GP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>immunosuppressive motif</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>gene</INSDFeature_key>
      <INSDFeature_location>8243..9695</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>8243</INSDInterval_from>
          <INSDInterval_to>9695</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP30</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>mRNA</INSDFeature_key>
      <INSDFeature_location>8243..9695</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>8243</INSDInterval_from>
          <INSDInterval_to>9695</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP30</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>VP30</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transcription</INSDQualifier_name>
          <INSDQualifier_value>GATGAAGATTAAGAAAAAGGTAATCTTTCGATTATCTTTAGTCTTCATCCTTGATTCTACAATCATGACAGTTGTCTTTAATGAAAAAGGAAAAAAGCCTTTTTATTAAGTTGTAATAATCAGATCTGCAAACCGGTAGAATTTAGTTGTAACCTAACACACACAAAGCATTGGTAAAAAAGTCAATAGAAATTTAAACAGTGAGTGCAGACAACTCTTAAATGGAAGCTTCATATGAGAGAGGACGCCCCCGAGCTGCCAGACAGCATTCAAGGGATGGACACGACCACCATGTTCGAGCACGATCATCATCCAGAGAGAATTATCGAGGTGAGTACCGTCAATCAAGGAGCGCCTCACAAGTGCGCGTTCCTACTGTATTTCATAAGAAGAGAGTTGAACCATTAACAGTTCCTCCAGCACCTAAAGACATATGTCCGACCTTGAAAAAAGGATTTTTGTGTGACAGTAGTTTTTGCAAAAAAGACCACCAGTTAGAAAGTTTAACTGATAGGGAATTACTCCTACTAATCGCCCGTAAGACTTGTGGATCAGTAGAACAACAATTAAATATAACTGCACCCAAGGACTCGCGCTTAGCAAATCCAACGGCTGATGATTTCCAGCAAGAGGAAGGTCCAAAAATTACCTTGTTGACACTGATCAAGACGGCAGAACACTGGGCGAGACAAGACATCCGAACCATAGAGGATTCCAAATTAAGGGCATTGTTAACTCTATGTGCTGTGATGACGAGGAAATTCTCAAAATCCCAGCTGAGTCTTTTGTGTGAGACACACCTAAGGCGCGAAGGGCTTGGGCAAGATCAGGCAGAACCCGTTCTCGAAGTATATCAACGATTACACAGTGATAAAGGAGGCAGTTTTGAAGCTGCACTATGGCAACAATGGGACCGACAATCCCTAATTATGTTTATCACTGCATTCTTGAATATCGCTCTCCAGTTACCGTGTGAAAGTTCTGCTGTCGTTGTTTCAGGGTTAAGAACATTGGTTCCTCAATCAGATAATGAGGAAGCTTCAACCAACCCGGGGACATGCTCATGGTCTGATGAGGGTACCCCTTAATAAGGCTGACTAAAACACTATATAACCTTCTACTTGATCACAATACTCCGTATACCTATCATCATATATTTAATCAAGACGATATCCTTTAAAACTTATTCAGTACTATAATCACTCTCATTTCAAATTGATAAGATATGCATAATTGCCTTAATATATAAAGAGGTATGATATAACCCAAACATTGACCAAAGAAAATCATAATCTCGTATCGCTCGCAATATAACCTGCCAAGCATACCTCTTGCACAAAGTGATTCTTGTACACAAATAATGTTTGACTCTACAGGAGGTAGCAACGATCCATCTCATCAAAAAATAAGTATTTTATGATTTACTAATGATCTCTTAAAATATTAAGAAAAA</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>8243..8254</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>8243</INSDInterval_from>
          <INSDInterval_to>8254</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>other</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP30</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative transcription start signal</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>8250..8260</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>8250</INSDInterval_from>
          <INSDInterval_to>8260</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>polyA_signal_sequence</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>GP</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>CDS</INSDFeature_key>
      <INSDFeature_location>8464..9330</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>8464</INSDInterval_from>
          <INSDInterval_to>9330</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP30</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>minor nucleoprotein; polymerase complex protein</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>codon_start</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transl_table</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>VP30</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>protein_id</INSDQualifier_name>
          <INSDQualifier_value>AIE11803.1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>db_xref</INSDQualifier_name>
          <INSDQualifier_value>GI:661348602</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>translation</INSDQualifier_name>
          <INSDQualifier_value>MEASYERGRPRAARQHSRDGHDHHVRARSSSRENYRGEYRQSRSASQVRVPTVFHKKRVEPLTVPPAPKDICPTLKKGFLCDSSFCKKDHQLESLTDRELLLLIARKTCGSVEQQLNITAPKDSRLANPTADDFQQEEGPKITLLTLIKTAEHWARQDIRTIEDSKLRALLTLCAVMTRKFSKSQLSLLCETHLRREGLGQDQAEPVLEVYQRLHSDKGGSFEAALWQQWDRQSLIMFITAFLNIALQLPCESSAVVVSGLRTLVPQSDNEEASTNPGTCSWSDEGTP</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>9685..9695</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>9685</INSDInterval_from>
          <INSDInterval_to>9695</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>polyA_signal_sequence</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP30</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>gene</INSDFeature_key>
      <INSDFeature_location>9840..11473</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>9840</INSDInterval_from>
          <INSDInterval_to>11473</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP24</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>mRNA</INSDFeature_key>
      <INSDFeature_location>9840..11473</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>9840</INSDInterval_from>
          <INSDInterval_to>11473</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP24</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>VP24</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transcription</INSDQualifier_name>
          <INSDQualifier_value>GATGAAGATTAATGCGGAGGTCTGATGAGAATAAACCTTATTATTCAGATTAGGCCCCAAGAGGCATTCTTCATCTCCTTTTAGCAAAATACTATTTCAGGATAGTCCAGCTAGTGACACGTCTTTTAGCTGTATACCAGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNGAGCTAAAGTGGTCTGTACACATCTCATACATTGTATTAGGGGCAATAATATCTAATTGAACTTAGCCATTTAAAATTTAGTGCATAAATCTGGGCTAACTCCACCAGGTCAACTCCATTGGCTGAAAAGAAGCCCACCTACAACGAACATTACTTTGAGCGCCCTCACAATTAAAAAATAAGAGCGTCGTTCCAACAATCGAGCGCAAGGTTACAAGGTTGAACTGAGAGTGTCTAGACAACAAAATATCGATACTCCAGACACCAAGCAAGACCTGAGAAAAAACCATGGCCAAAGCTACGGGACGATACAATCTAATATCGCCCAAAAAGGACCTGGAGAAAGGGGTTGTCTTAAGCGACCTCTGTAACTTCTTAGTTAGTCAAACTATTCAAGGGTGGAAAGTTTATTGGGCTGGTATTGAGTTTGATGTGACTCACAAAGGAATGGCCCTATTGCATAGACTGAAAACTAATGACTTTGCCCCTGCATGGTCAATGACAAGGAACCTATTTCCCCATTTATTTCAAAATCCGAATTCCACTATTGAATCACCGCTGTGGGCACTGAGAGTCATCCTTGCAGCAGGGATACAGGACCAGTTAATTGACCAGTCTTTGATTGAACCCTTAGCAGGAGCCCTTGGTCTGATCTCTGATTGGCTGCTAACAACCAACACTAACCATTTCAACATGCGAACACAACGTGTCAAGGAACAATTGAGCCTAAAAATGCTGTCGTTGATTCGATCCAATATTCTCAAGTTTATTAACAAATTGGATGCTCTACATGTCGTGAACTACAATGGATTATTGAGCAGTATTGAAATTGGAACTCAAAATCATACAATCATCATAACTCGAACTAACATGGGTTTTCTGGTGGAGCTCCAAGAACCCGACAAATCGGCAATGAACCGCAAGAAGCCTGGGCCGGCGAAATTTTCCCTCCTTCATGAGTCCACACTGAAAGCATTTACACAAGGGTCCTCGACACGAATGCAAAGTTTAATTCTTGAATTCAATAGCTCTCTTGCTATCTAACTAAGATGGAATACTTCATATTGGGCTAACTCATATATGCTGACTCAATAGTTAACTTGACATCTCTGCCTTCATAATCAGATATATAAGCATAATAAATAAATACTCATATTTCTTGATAATTTGTTTAACCACAGATAAATCCTCACTGTAAGCCAGCTTCCAAGTTGACACCCTTACAAAAACCAGGACTCAGAATCCCTCAAATAAGAGATTCCAAGACAACATCATAGAATTGCTTTATTATATTAATAAGCATTTTATCACTAGAAATCCAATATACGAAATGGTTAATTGTAACTAAACCCGCAGGTCATGTGTGTTAGGTTTCACAAATTATATATATTACTAACTCCATACTCGTAACTAACATTAGATAAGTAGGTTAAGAAAAAAGCTTGAGGAAGATTAAGAAAAA</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>9840..9851</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>9840</INSDInterval_from>
          <INSDInterval_to>9851</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>other</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP24</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative transcription start signal</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>CDS</INSDFeature_key>
      <INSDFeature_location>10300..11055</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>10300</INSDInterval_from>
          <INSDInterval_to>11055</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP24</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>membrane-associated protein</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>codon_start</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transl_table</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>VP24</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>protein_id</INSDQualifier_name>
          <INSDQualifier_value>AIE11804.1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>db_xref</INSDQualifier_name>
          <INSDQualifier_value>GI:661348603</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>translation</INSDQualifier_name>
          <INSDQualifier_value>MAKATGRYNLISPKKDLEKGVVLSDLCNFLVSQTIQGWKVYWAGIEFDVTHKGMALLHRLKTNDFAPAWSMTRNLFPHLFQNPNSTIESPLWALRVILAAGIQDQLIDQSLIEPLAGALGLISDWLLTTNTNHFNMRTQRVKEQLSLKMLSLIRSNILKFINKLDALHVVNYNGLLSSIEIGTQNHTIIITRTNMGFLVELQEPDKSAMNRKKPGPAKFSLLHESTLKAFTQGSSTRMQSLILEFNSSLAI</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>11440..11451</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>11440</INSDInterval_from>
          <INSDInterval_to>11451</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>polyA_signal_sequence</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP24</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>gene</INSDFeature_key>
      <INSDFeature_location>11456..18237</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>11456</INSDInterval_from>
          <INSDInterval_to>18237</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>L</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>mRNA</INSDFeature_key>
      <INSDFeature_location>11456..18237</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>11456</INSDInterval_from>
          <INSDInterval_to>18237</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>L</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>polymerase</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transcription</INSDQualifier_name>
          <INSDQualifier_value>GAGGAAGATTAAGAAAAACTGCTTATTGGGTCTTTCCGTGTTTTAGATGAAGCAGTTGACATTCTTCCTCTTGATATTAAATGGCTACACAACATACCCAATACCCAGACGCCAGGTTATCATCACCAATTGTATTGGACCAATGTGACCTTGTCACTAGAGCTTGCGGGTTGTATTCATCATACTCCCTTAATCCGCAACTACGCAACTGTAAACTCCCGAAACATATATACCGTTTAAAATATGATGTAACTGTTACCAAGTTCTTAAGTGATGTACCAGTGGCGACATTGCCCATAGATTTCATAGTCCCAATTCTTCTCAAGGCACTATCAGGCAATGGGTTCTGTCCTGTTGAGCCGCGGTGCCAACAGTTCTTAGATGAAATTATTAAGTACACAATGCAAGATGCTCTCTTCCTGAAATATTATCTCAAAAATGTGGGTGCTCAAGAAGACTGTGTTGATGACCACTTTCAAGAAAAAATCTTATCTTCAATTCAGGGCAATGAATTTTTACATCAAATGTTTTTCTGGTATGACCTGGCTATTTTAACTCGAAGGGGTAGATTAAATCGAGGAAACTCTAGATCAACGTGGTTTGTTCATGATGATTTAATAGACATCTTAGGCTATGGGGACTATGTTTTTTGGAAGATCCCAATTTCACTGTTACCACTGAACACACAAGGAATCCCCCATGCTGCTATGGATTGGTATCAGACATCAGTATTCAAAGAAGCGGTTCAAGGGCATACACACATTGTTTCTGTTTCTACTGCCGATGTCTTGATAATGTGCAAAGATTTAATTACATGTCGATTCAACACAACTCTAATCTCAAAAATAGCAGAGGTTGAGGACCCAGTTTGCTCTGATTATCCCAATTTTAAGATTGTGTCTATGCTTTACCAGAGCGGAGATTACTTACTCTCCATATTAGGGTCTGATGGGTATAAAATCATTAAGTTTCTCGAACCATTGTGCTTGGCTAAAATTCAATTGTGCTCAAAGTACACCGAGAGGAAGGGCCGATTCTTAACACAAATGCATTTAGCTGTAAATCACACCCTGGAAGAAATTACAGAAATACGTGCACTAAAGCCTTCACAGGCTCACAAGATCCGTGAATTCCATAGAACATTGATAAGGCTGGAGATGACGCCACAACAACTTTGTGAGCTATTTTCCATACAAAAACACTGGGGGCATCCTGTGCTACATAGTGAAACAGCAATCCAAAAAGTTAAAAAACATGCTACGGTGCTAAAAGCATTACGCCCTATCGTGATTTTCGAGACATATTGTGTTTTTAAATATAGCATTGCAAAACATTATTTTGATAGTCAAGGATCTTGGTACAGTGTTACCTCAGATAGAAATCTAACACCAGGTCTTAATTCTTATATCAAAAGAAATCAATTCCCTCCGTTGCCAATGATTAAAGAACTGCTATGGGAATTTTACCACCTTGACCATCCTCCACTTTTCTCAACCAAAATTATTAGTGACTTAAGTATTTTTATAAAAGACAGAGCTACTGCAGTAGAAAGGACATGCTGGGATGCAGTATTCGAGCCTAATGTTCTGGGATATAATCCACCTCACAAATTCAGTACCAAACGTGTACCGGAACAATTTTTAGAGCAAGAAAACTTTTCTATTGAGAATGTTCTTTCCTACGCGCAAAAACTCGAGTATCTACTACCACAATATCGGAATTTTTCTTTCTCATTGAAAGAGAAAGAGTTGAATGTAGGTAGAACTTTCGGAAAATTGCCTTATCCGACTCGCAATGTTCAAACACTTTGTGAAGCTCTGTTAGCTGATGGTCTTGCTAAAGCATTTCCTAGCAATATGATGGTAGTTACGGAACGTGAACAAAAAGAAAGCTTATTGCATCAAGCATCATGGCACCACACAAGTGATGATTTCGGTGAGCATGCCACAGTTAGAGGGAGTAGCTTTGTAACTGATTTAGAGAAATACAATCTTGCATTTAGGTATGAGTTTACAGCACCTTTTATAGAATATTGCAACCGTTGCTATGGTGTTAAGAATGTTTTTAATTGGATGCATTATACAATCCCACAGTGTTATATGCATGTCAGTGATTATTATAATCCACCGCATAACCTCACACTGGAAAATCGAAACAACCCCCCTGAAGGGCCTAGTTCATACAGGGGTCATATGGGAGGGATTGAAGGACTGCAACAAAAACTCTGGACAAGTATTTCATGTGCTCAAATTTCTTTAGTTGAAATTAAGACTGGTTTTAAGTTGCGCTCAGCTGTGATGGGTGACAATCAGTGCATTACCGTTTTATCAGTCTTCCCCTTAGAGACTGATGCAGGCGAGCAGGAACAGAGCGCCGAGGACAATGCAGCGAGGGTGGCCGCCAGCCTAGCAAAAGTTACAAGTGCCTGTGGAATCTTTTTAAAACCTGATGAAACATTTGTACATTCAGGTTTTATCTATTTTGGAAAAAAACAATATTTGAATGGGGTCCAATTGCCTCAGTCCCTTAAAACGGCTACAAGAATGGCACCATTGTCTGATGCAATTTTTGATGATCTTCAAGGGACCCTGGCTAGTATAGGTACTGCTTTTGAGCGATCCATCTCTGAGACACGACATATCTTTCCTTGCAGAATAACCGCAGCTTTCCATACGTTCTTTTCGGTGAGAATCTTGCAATATCATCACCTCGGATTTAATAAAGGTTTTGACCTTGGACAGTTAACACTCGGCAAACCTCTGGATTTCGGAACAATATCATTGGCACTAGCGGTACCGCAGGTGCTTGGAGGGTTATCCTTCTTGAATCCTGAGAAATGTTTCTACCGGAATCTAGGAGATCCAGTTACCTCAGGTTTATTCCAGTTAAAAACTTATCTCCGAATGATTGAGATGGATGATTTATTCTTACCTTTAATTGCGAAGAACCCTGGGAACTGCACTGCCATTGACTTTGTGCTAAATCCTAGCGGATTAAATGTTCCTGGGTCGCAAGACTTAACTTCATTTCTGCGCCAGATTGTACGTAGGACTATCACCCTAAGTGCGAAAAACAAACTTATTAATACCTTATTTCATGCATCAGCTGACTTCGAAGACGAAATGGTTTGTAAGTGGCTCTTATCATCAACTCCTGTTATGAGTCGTTTCGCAGCCGATATATTTTCACGCACGCCGAGCGGGAAGCGATTGCAAATTCTAGGATACTTGGAAGGAACACGCACATTATTAGCCTCTAAGATCATCAACAATAATACAGAGACGCCGGTTTTGGACAGACTGAGGAAGATAACATTGCAAAGGTGGAGTCTATGGTTTAGTTATCTTGATCATTGTGATAATATCCTGGCGGAGGCTTTAACCCAAATAACTTGCACAGTTGATTTAGCACAGATCCTGAGGGAATATTCATGGGCACATATTTTAGAGGGGAGACCTCTTATTGGAGCCACACTCCCATGTATGATTGAGCAATTCAAAGTGGTTTGGCTGAAACCCTACGAACAATGTCCGCAGTGTTCAAATGCCAAGCAACCTGGTGGGAAACCATTCGTGTCAGTAGCAGTCAAGAAACATATTGTTAGTGCATGGCCAAATGCATCCCGAATAAGCTGGACTATCGGGGATGGAATCCCATACATTGGATCAAGGACAGAAGATAAGATAGGGCAACCTGCTATTAAACCAAAATGTCCTTCCGCAGCCTTAAGAGAGGCCATTGAATTGGCGTCCCGTTTAACATGGGTAACTCAAGGCAGTTCGAACAGTGACTTGCTAATAAAACCATTTTTGGAAGCACGAGTAAATTTAAGTGTTCAAGAAATACTTCAAATGACCCCTTCACATTACTCGGGAAATATTGTTCATAGGTACAACGATCAATACAGTCCTCATTCTTTCATGGCCAATCGTATGAGTAACTCAGCAACGCGATTGATTGTTTCTACAAACACTTTAGGTGAGTTTTCAGGAGGTGGCCAATCGGCACGCGACAGCAATATTATTTTCCAGAATGTTATAAATTATGCAGTTGCACTGTTCGATATTAAATTTAGAAACACTGAGGCTACAGATATCCAGTATAATCGTGCTCACCTTCATCTAACTAAGTGTTGCACCCGGGAGGTACCAGCTCAGTACTTAACATACACATCTACATTGGATTTAGATTTAACAAGATACCGAGAAAATGAATTGATTTATGACAATAATCCTCTAAAAGGAGGACTCAATTGCAATATCTCATTTGATAACCCATTTTTCCAAGGCAAACAGCTGAACATTATAGAAGATGACCTTATTCGACTGCCTCACTTATCTGGATGGGAGCTAGCTAAGACCATCATGCAATCAATTATTTCAGATAGCAATAATTCGTCTACAGACCCAATTAGCAGTGGAGAAACAAGATCATTCACTACCCATTTCTTAACTTATCCCAAGATAGGACTTCTGTACAGTTTTGGGGCCTTTGTAAGTTATTATCTTGGCAATACAATTCTTCGGACTAAGAAATTAACACTTGACAATTTTTTATATTACTTAACTACCCAAATTCATAATCTACCACATCGCTCATTGCGAATACTTAAGCCAACATTCAAACATGCAAGCGTTATGTCACGATTAATGAGTATTGATCCCCATTTTTCTATTTACATAGGCGGTGCTGCAGGTGACAGAGGACTCTCAGATGCGGCCAGGTTATTTTTGAGAACGTCCATTTCATCTTTTCTTACATTTGTAAAGGAATGGATAATTAATCGCGGAACAATTGTCCCTTTATGGATAGTATATCCATTAGAGGGTCAAAATCCAACACCTGTTAATAATTTCCTCCATCAGATCGTAGAACTGCTGGTGCATGATTCATCAAGACACCAGGCTTTTAAAACTACCATAAATGATCATGTACATCCTCACGACAATCTTGTTTACACATGTAAGAGTACAGCCAGCAATTTCTTCCATGCGTCATTGGCGTACTGGAGGAGCAGGCACAGAAACAGCAACCGAAAAGACTTGACAAGAAACTCTTCAACTGGATCAAGCACAAACAACAGTGATGGTCATATTAAGAGAAGTCAAGAACAAACCACCAGAGATCCACATGATGGCACTGAACGGAGTCTAGTCCTGCAAATGAGCCATGAAATAAAAAGAACGACAATTCCACAAGAGAACACGCACCAGGGTCCGTCGTTCCAGTCATTTCTAAGTGACTCTGCTTGCGGTACAGCAAACCCAAAACTAAATTTCGATAGATCGAGACACAATGTGAAATCTCAGGATCATAACTCAGCATCCAAGAGGGAAGGTCATCAAATAATCTCACATCGTCTAGTCCTACCTTTCTTTACATTATCTCAAGGGACACGCCAATTAACGTCATCCAATGAGTCACAAACCCAAGATGAGATATCAAAGTACTTACGGCAATTGAGATCCGTCATTGATACCACAGTTTATTGTAGGTTTACCGGTATAGTCTCGTCCATGCATTACAAACTTGATGAGGTCCTTTGGGAAATAGAGAATTTTAAGTCGGCTGTGACGCTGGCAGAGGGAGAAGGTGCTGGTGCCTTACTATTGATTCAGAAATACCAAGTTAAGACCTTATTTTTCAACACGCTAGCTACTGAGTCCAGTATAGAGTCAGAAATAGTATCAGGAATGACTACTCCTAGGATGCTTCTACCTGTTATGTCAAAATTCCATAATGACCAAATTGAGATTATTCTTAACAACTCAGCAAGCCAAATAACAGACATAACAAATCCTACTTGGTTTAAAGACCAAAGAGCAAGGCTACCTAGGCAAGTCGAGGTTATAACCATGGATGCAGAGACGACAGAGAATATAAACAGATCGAAATTGTACGAAGCTGTACATAAATTGATCTTACACCATGTTGATCCCAGCGTATTGAAAGCAGTGGTCCTTAAAGTCTTTCTAAGTGATACCGAGGGTATGTTATGGCTAAATGATAATCTAGCCCCGTTTTTTGCCACTGGGTATTTAATTAAGCCAATAACGTCAAGTGCCAGGTCTAGTGAGTGGTATCTTTGTCTGACGAACTTCTTATCAACTACACGTAAGATGCCACACCAAAACCATCTCAGTTGTAAGCAGGTAATACTTACGGCATTGCAACTGCAAATTCAACGGAGCCCATACTGGCTAAGTCATTTAACTCAGTATGCTGACTGCGATTTACATTTAAGCTATATCCGCCTTGGTTTTCCATCATTAGAGAAAGTACTATACCACAGGTATAACCTTGTCGATTCAAAAAGAGGTCCACTAGTCTCTGTCACTCAGCACTTAGCACATCTTAGGGCAGAGATTCGAGAATTGACCAATGATTATAATCAACAGCGACAAAGTCGGACTCAAACATATCACTTTATTCGTACTGCAAAAGGACGAATCACAAAACTAGTCAATGATTATTTAAAATTCTTTCTTATTGTACAAGCATTAAAACATAATGGGACATGGCAAGCTGAGTTTAAGAAATTACCAGAGTTGATTAGTGTGTGCAATAGGTTCTATCATATTAGAGATTGTAATTGTGAAGAACGTTTCTTAGTTCAAACCTTATATTTACATAGAATGCAGGATTCTGAAGTTAAGCTTATCGAAAGGCTGACAGGGCTTCTGAGTTTATTTCCAGATGGTCTCTACAGGTTCGATTGAATAACCGTGCATAGTATTTTGATACTTGTAAAGGTTGGTTATCAACATACAGATTATAAAAAA</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>11456..11467</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>11456</INSDInterval_from>
          <INSDInterval_to>11467</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>other</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>L</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative transcription start signal</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>11463..11473</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>11463</INSDInterval_from>
          <INSDInterval_to>11473</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>polyA_signal_sequence</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>VP24</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>putative</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>CDS</INSDFeature_key>
      <INSDFeature_location>11536..18174</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>11536</INSDInterval_from>
          <INSDInterval_to>18174</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>L</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>note</INSDQualifier_name>
          <INSDQualifier_value>involved in synthesis of viral RNAs and transcriptional RNA editing</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>codon_start</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>transl_table</INSDQualifier_name>
          <INSDQualifier_value>1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>product</INSDQualifier_name>
          <INSDQualifier_value>polymerase</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>protein_id</INSDQualifier_name>
          <INSDQualifier_value>AIE11805.1</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>db_xref</INSDQualifier_name>
          <INSDQualifier_value>GI:661348604</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>translation</INSDQualifier_name>
          <INSDQualifier_value>MATQHTQYPDARLSSPIVLDQCDLVTRACGLYSSYSLNPQLRNCKLPKHIYRLKYDVTVTKFLSDVPVATLPIDFIVPILLKALSGNGFCPVEPRCQQFLDEIIKYTMQDALFLKYYLKNVGAQEDCVDDHFQEKILSSIQGNEFLHQMFFWYDLAILTRRGRLNRGNSRSTWFVHDDLIDILGYGDYVFWKIPISLLPLNTQGIPHAAMDWYQTSVFKEAVQGHTHIVSVSTADVLIMCKDLITCRFNTTLISKIAEVEDPVCSDYPNFKIVSMLYQSGDYLLSILGSDGYKIIKFLEPLCLAKIQLCSKYTERKGRFLTQMHLAVNHTLEEITEIRALKPSQAHKIREFHRTLIRLEMTPQQLCELFSIQKHWGHPVLHSETAIQKVKKHATVLKALRPIVIFETYCVFKYSIAKHYFDSQGSWYSVTSDRNLTPGLNSYIKRNQFPPLPMIKELLWEFYHLDHPPLFSTKIISDLSIFIKDRATAVERTCWDAVFEPNVLGYNPPHKFSTKRVPEQFLEQENFSIENVLSYAQKLEYLLPQYRNFSFSLKEKELNVGRTFGKLPYPTRNVQTLCEALLADGLAKAFPSNMMVVTEREQKESLLHQASWHHTSDDFGEHATVRGSSFVTDLEKYNLAFRYEFTAPFIEYCNRCYGVKNVFNWMHYTIPQCYMHVSDYYNPPHNLTLENRNNPPEGPSSYRGHMGGIEGLQQKLWTSISCAQISLVEIKTGFKLRSAVMGDNQCITVLSVFPLETDAGEQEQSAEDNAARVAASLAKVTSACGIFLKPDETFVHSGFIYFGKKQYLNGVQLPQSLKTATRMAPLSDAIFDDLQGTLASIGTAFERSISETRHIFPCRITAAFHTFFSVRILQYHHLGFNKGFDLGQLTLGKPLDFGTISLALAVPQVLGGLSFLNPEKCFYRNLGDPVTSGLFQLKTYLRMIEMDDLFLPLIAKNPGNCTAIDFVLNPSGLNVPGSQDLTSFLRQIVRRTITLSAKNKLINTLFHASADFEDEMVCKWLLSSTPVMSRFAADIFSRTPSGKRLQILGYLEGTRTLLASKIINNNTETPVLDRLRKITLQRWSLWFSYLDHCDNILAEALTQITCTVDLAQILREYSWAHILEGRPLIGATLPCMIEQFKVVWLKPYEQCPQCSNAKQPGGKPFVSVAVKKHIVSAWPNASRISWTIGDGIPYIGSRTEDKIGQPAIKPKCPSAALREAIELASRLTWVTQGSSNSDLLIKPFLEARVNLSVQEILQMTPSHYSGNIVHRYNDQYSPHSFMANRMSNSATRLIVSTNTLGEFSGGGQSARDSNIIFQNVINYAVALFDIKFRNTEATDIQYNRAHLHLTKCCTREVPAQYLTYTSTLDLDLTRYRENELIYDNNPLKGGLNCNISFDNPFFQGKQLNIIEDDLIRLPHLSGWELAKTIMQSIISDSNNSSTDPISSGETRSFTTHFLTYPKIGLLYSFGAFVSYYLGNTILRTKKLTLDNFLYYLTTQIHNLPHRSLRILKPTFKHASVMSRLMSIDPHFSIYIGGAAGDRGLSDAARLFLRTSISSFLTFVKEWIINRGTIVPLWIVYPLEGQNPTPVNNFLHQIVELLVHDSSRHQAFKTTINDHVHPHDNLVYTCKSTASNFFHASLAYWRSRHRNSNRKDLTRNSSTGSSTNNSDGHIKRSQEQTTRDPHDGTERSLVLQMSHEIKRTTIPQENTHQGPSFQSFLSDSACGTANPKLNFDRSRHNVKSQDHNSASKREGHQIISHRLVLPFFTLSQGTRQLTSSNESQTQDEISKYLRQLRSVIDTTVYCRFTGIVSSMHYKLDEVLWEIENFKSAVTLAEGEGAGALLLIQKYQVKTLFFNTLATESSIESEIVSGMTTPRMLLPVMSKFHNDQIEIILNNSASQITDITNPTWFKDQRARLPRQVEVITMDAETTENINRSKLYEAVHKLILHHVDPSVLKAVVLKVFLSDTEGMLWLNDNLAPFFATGYLIKPITSSARSSEWYLCLTNFLSTTRKMPHQNHLSCKQVILTALQLQIQRSPYWLSHLTQYADCDLHLSYIRLGFPSLEKVLYHRYNLVDSKRGPLVSVTQHLAHLRAEIRELTNDYNQQRQSRTQTYHFIRTAKGRITKLVNDYLKFFLIVQALKHNGTWQAEFKKLPELISVCNRFYHIRDCNCEERFLVQTLYLHRMQDSEVKLIERLTGLLSLFPDGLYRFD</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
    <INSDFeature>
      <INSDFeature_key>regulatory</INSDFeature_key>
      <INSDFeature_location>18227..18237</INSDFeature_location>
      <INSDFeature_intervals>
        <INSDInterval>
          <INSDInterval_from>18227</INSDInterval_from>
          <INSDInterval_to>18237</INSDInterval_to>
          <INSDInterval_accession>KM034549.1</INSDInterval_accession>
        </INSDInterval>
      </INSDFeature_intervals>
      <INSDFeature_quals>
        <INSDQualifier>
          <INSDQualifier_name>regulatory_class</INSDQualifier_name>
          <INSDQualifier_value>polyA_signal_sequence</INSDQualifier_value>
        </INSDQualifier>
        <INSDQualifier>
          <INSDQualifier_name>gene</INSDQualifier_name>
          <INSDQualifier_value>L</INSDQualifier_value>
        </INSDQualifier>
      </INSDFeature_quals>
    </INSDFeature>
  </INSDSeq_feature-table>
  <INSDSeq_sequence>gaataactatgaggaagattaataattttcctctcattgaaatttatatcggaatttaaattgaaattgttactgtaatcatacctggtttgtttcagagccatatcaccaagatagagaacaacctaggtctccggagggggcaagggcatcagtgtgctcagttgaaaatcccttgtcaacatctaggccttatcacatcacaagttccgccttaaactctgcagggtgatccaacaaccttaatagcaacattattgttaaaggacagcattagttcacagtcaaacaagcaagattgagaattaactttgattttgaacctgaacacccagaggactggagactcaacaaccctaaagcctggggtaaaacattagaaatagtttaaagacaaattgctcggaatcacaaaattccgagtatggattctcgtcctcagaaagtctggatgacgccgagtctcactgaatctgacatggattaccacaagatcttgacagcaggtctgtccgttcaacaggggattgttcggcaaagagtcatcccagtgtatcaagtaaacaatcttgaggaaatttgccaacttatcatacaggcctttgaagctggtgttgattttcaagagagtgcggacagtttccttctcatgctttgtcttcatcatgcgtaccaaggagattacaaacttttcttggaaagtggcgcagtcaagtatttggaagggcacgggttccgttttgaagtcaagaagcgtgatggagtgaagcgccttgaggaattgctgccagcagtatctagtgggagaaacattaagagaacacttgctgccatgccggaagaggagacgactgaagctaatgccggtcagttcctctcctttgcaagtctattccttccgaaattggtagtaggagaaaaggcttgccttgagaaggttcaaaggcaaattcaagtacatgcagagcaaggactgatacaatatccaacagcttggcaatcagtaggacacatgatggtgattttccgtttgatgcgaacaaattttttgatcaaatttcttctaatacaccaagggatgcacatggttgccggacatgatgccaacgatgctgtgatttcaaattcagtggctcaagctcgtttttcaggtctattgattgtcaaaacagtacttgatcatatcctacaaaagacagaacgaggagttcgtctccatcctcttgcaaggaccgccaaggtaaaaaatgaggtgaactccttcaaggctgcactcagctccctggccaagcatggagagtatgctcctttcgcccgacttttgaacctttctggagtaaataatcttgagcatggtcttttccctcaactgtcggcaattgcactcggagtcgccacagcccacgggagcaccctcgcaggagtaaatgttggagaacagtatcaacagctcagagaggcagccactgaggctgagaagcaactccaacaatatgcggagtctcgtgaacttgaccatcttggacttgatgatcaggaaaagaaaattcttatgaacttccatcagaaaaagaacgaaatcagcttccagcaaacaaacgcgatggtaactctaagaaaagagcgcctggccaagctgacagaagctatcactgctgcatcactgcccaaaacaagtggacattacgatgatgatgacgacattccctttccaggacccatcaatgatgacgacaatcctggccatcaagatgatgatccgactgactcacaggatacgaccattcccgatgtggtagttgaccccgatgatggaggctacggcgaataccaaagttactcggaaaacggcatgagtgcaccagatgacttggtcctattcgatctagacgaggacgacgaggacaccaagccagtgcctaacagatcgaccaagggtggacaacagaaaaacagtcaaaagggccagcatacagagggcagacagacacaatccacgccaactcaaaacgtcacaggccctcgcagaacaatccaccatgccagtgctccactcacggacaatgacagaagaaacgaaccctccggctcaaccagccctcgcatgctgaccccaatcaacgaagaggcagacccactggacgatgccgacgacgagacgtctagccttccgcccttagagtcagatgatgaagaacaggacagggacggaacttctaaccgcacacccactgtcgccccaccggctcccgtatacagagatcactccgaaaagaaagaactcccgcaagatgaacaacaagatcaggaccacattcaagaggccaggaaccaagacagtgacaacacccagccagaacattcttttgaggagatgtatcgccacattctaagatcacaggggccatttgatgccgttttgtattatcatatgatgaaggatgagcctgtagttttcagtaccagtgatggtaaagagtacacgtatccggactcccttgaagaggaatatccaccatggctcactgaaaaagaggccatgaatgatgagaatagatttgttacactggatggtcaacaattttattggccagtaatgaatcacaggaataaattcatggcaatcctgcaacatcatcagtgaatgagcatgtaataatgggatgatttaatcgacaaatagctaacattaaatagtcaaggaacgcaaacaggaagaatttttgatgtctaaggtgtgaattattatcacaataaaagtgattcttagttttgaatttaaagctagcttattattactagccgtttttcaaagttcaatttgagtcttaatgcaaataagcgttaagccacagttatagccataatggtaactcaatatcttagccagcgatttatctaaattaaattacattatgcttttataacttacctactagcctgcccaacatttacacgatcgttttataattaagaaaaaactaatgatgaagattaaaaccttcatcatccttacgtcaattgaattctctagcactagaagcttattgtcttcaatgtaaaagaaaagctggcctaacaagatgacaactagaacaaagggcaggggccatactgtggccacgactcaaaacgacagaatgccaggccctgagctttcgggctggatctctgagcagctaatgaccggaaggattcctgtaaacgacatcttctgtgatattgagaacaatccaggattatgctacgcatcccaaatgcaacaaacgaagccaaacccgaagatgcgcaacagtcaaacccaaacggacccaatttgcaatcatagttttgaggaggtagtacaaacattggcttcattggctactgttgtgcaacaacaaaccatcgcatcagaatcattagaacaacgcattacgagtcttgagaatggtctaaagccagtttatgatatggcaaaaacaatctcctcattgaacagggtttgtgctgagatggttgcaaaatatgatcttctggtgatgacaaccggtcgggcaacagcaaccgctgcggcaactgaggcttattgggctgaacatggtcaaccaccacctggaccatcactttatgaagaaagtgcgattcggggtaagattgaatctagagatgagactgtccctcaaagtgttagggaggcattcaacaatctagacagtaccacttcactaactgaggaaaattttgggaaacctgacatttcggcaaaggatttgagaaacattatgtatgatcacttgcctggttttggaactgctttccaccaattagtacaagtgatttgtaaattgggaaaagatagcaattcattggacattattcatgctgagttccaggccagcctggctgaaggagactcccctcaatgtgccctaattcaaattacaaaaagagttccaatcttccaagatgctgctccacctgtcatccacatccgctctcgaggtgacattccccgagcttgccagaagagcttgcgtccagtcccaccatcacccaagattgatcgaggttgggtatgtgtttttcagcttcaagatggtaaaacacttggactcaaaatttgagccaatctcttttccctccgaaagaggcaactaatagcagaggcttcaactgctgaactatagggtatgttacattaatgatacacttgtgagtatcagccctagataatataagtcaattaaacaaccaagataaaattgttcatatcccgctagcagctttaaagataaatgtaataggagctatacctctgacagtattataattaattgttattaagtaacccaaaccaaaaatgatgaagattaagaaaaacctacctcgactgagagagtgttttttcattaaccttcatcttgtaaacgttgagcaaaattgttaaaaatatgaggcgggttatattgcctactgctcctcctgaatatatggaggccatataccctgccaggtcaaattcaacaattgctaggggtggcaacagcaatacaggcttcctgacaccggagtcagtcaatggagacactccatcgaatccactcaggccaattgctgatgacaccatcgaccatgccagccacacaccaggcagtgtgtcatcagcattcatcctcgaagctatggtgaatgtcatatcgggccccaaagtgctaatgaagcaaattccaatttggcttcctctaggtgtcgctgatcaaaagacctacagctttgactcaactacggccgccatcatgcttgcttcatatactatcacccatttcggcaaggcaaccaatccgcttgtcagagtcaatcggctgggtcctggaatcccggatcaccccctcaggctcctgcgaattggaaaccaggctttcctccaggagttcgttcttccaccagtccaactaccccagtatttcacctttgatttgacagcactcaaactgatcactcaaccactgcctgctgcaacatggaccgatgacactccaactggatcaaatggagcgttgcgtccaggaatttcatttcatccaaaacttcgccccattcttttacccaacaaaagtgggaagaaggggaacagtgccgatctaacatctccggagaaaatccaagcaataatgacttcactccaggactttaagatcgttccaattgatccaaccaaaaatatcatgggtatcgaagtgccagaaactctggtccacaagctgaccggtaagaaggtgacttccaaaaatggacaaccaatcatccctgttcttttgccaaagtacattgggttggacccggtggctccaggagacctcaccatggtaatcacacaggattgtgacacgtgtcattctcctgcaagtcttccagctgtggttgagaagtaattgcaataattgactcagatccagttttacagaatcttctcagggatagtgataacatctttttaataatccgtctactagaagagatacttctaattgatcaatatactaaaggtgctttacaccattgtctcttttctctcctaaatgtagagcttaacaaaagactcataatatacctgtttttaaaagattgattgatgaaagatcatgactaataacattacaaacaatcctactataatcaatacggtgattcaaatgtcaatctttctcattgcacatactctttgtccttatcctcaaattgcctacatgcttacatctgaggacagccagtgtgacttggattggagatgtggaggaaaaatcggggcccatttctaagttgttcacaatctaagtacagacattgctcttctaattaagaaaaaatcggcgatgaagattaagccgacagtgagcgtaatcttcatctctcttagattatttgtcttccagagtaggggtcatcaggtccttttcaattggataaccaaaataagcttcactagaaggatattgtgaggcgacaacacaatgggtgttacaggaatattgcagttacctcgtgatcgattcaagaggacatcattctttctttgggtaattatccttttccaaagaacattttccatcccgcttggagttatccacaatagtacattacaggttagtgatgtcgacaaactagtttgtcgtgacaaactgtcatccacaaatcaattgagatcagttggactgaatctcgaggggaatggagtggcaactgacgtgccatctgtgactaaaagatggggcttcaggtccggtgtcccaccaaaggtggtcaattatgaagctggtgaatgggctgaaaactgctacaatcttgaaatcaaaaaacctgacgggagtgagtgtctaccagcagcgccagacgggattcggggcttcccccggtgccggtatgtgcacaaagtatcaggaacgggaccatgtgccggagactttgccttccacaaagagggtgctttcttcctgtatgatcgacttgcttccacagttatctaccgaggaacgactttcgctgaaggtgtcgttgcatttctgatactgccccaagctaagaaggacttcttcagctcacaccccttgagagagccggtcaatgcaacggaggacccgtcgagtggctattattctaccacaattagatatcaggctaccggttttggaactaatgagacagagtacttgttcgaggttgacaatttgacctacgtccaacttgaatcaagattcacaccacagtttctgctccagctgaatgagacaatatatgcaagtgggaagaggagcaacaccacgggaaaactaatttggaaggtcaaccccgaaattgatacaacaatcggggagtgggccttctgggaaactaaaaaaacctcactagaaaaattcgcagtgaagagttgtctttcacagctgtatcaaacggacccaaaaacatcagtggtcagagtccggcgcgaacttcttccgacccagagaccaacacaacaaatgaagaccacaaaatcatggcttcagaaaattcctctgcaatggttcaagtgcacagtcaaggaaggaaagctgcagtgtcgcatctgacaacccttgccacaatctccacgagtcctcaacctcccacaaccaaaacaggtccggacaacagcacccataatacacccgtgtataaacttgacatctctgaggcaactcaagttggacaacatcaccgtagagcagacaacgacagcacagcctccgacactccccccgccacgaccgcagccggacccttaaaagcagagaacaccaacacgagtaagagcgctgactccctggacctcgccaccacgacaagcccccaaaactacagcgagactgctggcaacaacaacactcatcaccaagataccggagaagagagtgccagcagcgggaagctaggcttaattaccaatactattgctggagtagcaggactgatcacaggcgggagaaggactcgaagagaagtaattgtcaatgctcaacccaaatgcaaccccaatttacattactggactactcaggatgaaggtgctgcaatcggattggcctggataccatatttcgggccagcagccgaaggaatttacacagaggggctaatgcacaaccaagatggtttaatctgtgggttgaggcagctggccaacgaaacgactcaagctctccaactgttcctgagagccacaactgagctgcgaaccttttcaatcctcaaccgtaaggcaattgacttcctgctgcagcgatggggtggcacatgccacattttgggaccggactgctgtatcgaaccacatgattggaccaagaacataacagacaaaattgatcagattattcatgattttgttgataaaacccttccggaccagggggacaatgacaattggtggacaggatggagacaatggataccggcaggtattggagttacaggtgttataattgcagttatcgctttattctgtatatgcaaatttgtcttttagtctttcttcagattgtttcacggcaaaactcaacctcaaatcaatgaaactaggatttaattatatgaatcacttgaatctaagattacttgacaaatgataacataatacactggagcttcaaacatagccaatgtgattctaactcctttaaactcacagttaatcataaacaaggtttgacatcaatctagctatatctttaagaatgataaacttgatgaagattaagaaaaaggtaatctttcgattatctttagtcttcatccttgattctacaatcatgacagttgtctttaatgaaaaaggaaaaaagcctttttattaagttgtaataatcagatctgcaaaccggtagaatttagttgtaacctaacacacacaaagcattggtaaaaaagtcaatagaaatttaaacagtgagtgcagacaactcttaaatggaagcttcatatgagagaggacgcccccgagctgccagacagcattcaagggatggacacgaccaccatgttcgagcacgatcatcatccagagagaattatcgaggtgagtaccgtcaatcaaggagcgcctcacaagtgcgcgttcctactgtatttcataagaagagagttgaaccattaacagttcctccagcacctaaagacatatgtccgaccttgaaaaaaggatttttgtgtgacagtagtttttgcaaaaaagaccaccagttagaaagtttaactgatagggaattactcctactaatcgcccgtaagacttgtggatcagtagaacaacaattaaatataactgcacccaaggactcgcgcttagcaaatccaacggctgatgatttccagcaagaggaaggtccaaaaattaccttgttgacactgatcaagacggcagaacactgggcgagacaagacatccgaaccatagaggattccaaattaagggcattgttaactctatgtgctgtgatgacgaggaaattctcaaaatcccagctgagtcttttgtgtgagacacacctaaggcgcgaagggcttgggcaagatcaggcagaacccgttctcgaagtatatcaacgattacacagtgataaaggaggcagttttgaagctgcactatggcaacaatgggaccgacaatccctaattatgtttatcactgcattcttgaatatcgctctccagttaccgtgtgaaagttctgctgtcgttgtttcagggttaagaacattggttcctcaatcagataatgaggaagcttcaaccaacccggggacatgctcatggtctgatgagggtaccccttaataaggctgactaaaacactatataaccttctacttgatcacaatactccgtatacctatcatcatatatttaatcaagacgatatcctttaaaacttattcagtactataatcactctcatttcaaattgataagatatgcataattgccttaatatataaagaggtatgatataacccaaacattgaccaaagaaaatcataatctcgtatcgctcgcaatataacctgccaagcatacctcttgcacaaagtgattcttgtacacaaataatgtttgactctacaggaggtagcaacgatccatctcatcaaaaaataagtattttatgatttactaatgatctcttaaaatattaagaaaaactgacggaacataaattctttctgcttcaagttgtggaggaggtctatggtattcgctattgttatattacaatcaataacaagcttgtaaaaatattgttcttgtttcaggaggtatattgtgaccggaaaagctaaactaatgatgaagattaatgcggaggtctgatgagaataaaccttattattcagattaggccccaagaggcattcttcatctccttttagcaaaatactatttcaggatagtccagctagtgacacgtcttttagctgtataccagnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnngagctaaagtggtctgtacacatctcatacattgtattaggggcaataatatctaattgaacttagccatttaaaatttagtgcataaatctgggctaactccaccaggtcaactccattggctgaaaagaagcccacctacaacgaacattactttgagcgccctcacaattaaaaaataagagcgtcgttccaacaatcgagcgcaaggttacaaggttgaactgagagtgtctagacaacaaaatatcgatactccagacaccaagcaagacctgagaaaaaaccatggccaaagctacgggacgatacaatctaatatcgcccaaaaaggacctggagaaaggggttgtcttaagcgacctctgtaacttcttagttagtcaaactattcaagggtggaaagtttattgggctggtattgagtttgatgtgactcacaaaggaatggccctattgcatagactgaaaactaatgactttgcccctgcatggtcaatgacaaggaacctatttccccatttatttcaaaatccgaattccactattgaatcaccgctgtgggcactgagagtcatccttgcagcagggatacaggaccagttaattgaccagtctttgattgaacccttagcaggagcccttggtctgatctctgattggctgctaacaaccaacactaaccatttcaacatgcgaacacaacgtgtcaaggaacaattgagcctaaaaatgctgtcgttgattcgatccaatattctcaagtttattaacaaattggatgctctacatgtcgtgaactacaatggattattgagcagtattgaaattggaactcaaaatcatacaatcatcataactcgaactaacatgggttttctggtggagctccaagaacccgacaaatcggcaatgaaccgcaagaagcctgggccggcgaaattttccctccttcatgagtccacactgaaagcatttacacaagggtcctcgacacgaatgcaaagtttaattcttgaattcaatagctctcttgctatctaactaagatggaatacttcatattgggctaactcatatatgctgactcaatagttaacttgacatctctgccttcataatcagatatataagcataataaataaatactcatatttcttgataatttgtttaaccacagataaatcctcactgtaagccagcttccaagttgacacccttacaaaaaccaggactcagaatccctcaaataagagattccaagacaacatcatagaattgctttattatattaataagcattttatcactagaaatccaatatacgaaatggttaattgtaactaaacccgcaggtcatgtgtgttaggtttcacaaattatatatattactaactccatactcgtaactaacattagataagtaggttaagaaaaaagcttgaggaagattaagaaaaactgcttattgggtctttccgtgttttagatgaagcagttgacattcttcctcttgatattaaatggctacacaacatacccaatacccagacgccaggttatcatcaccaattgtattggaccaatgtgaccttgtcactagagcttgcgggttgtattcatcatactcccttaatccgcaactacgcaactgtaaactcccgaaacatatataccgtttaaaatatgatgtaactgttaccaagttcttaagtgatgtaccagtggcgacattgcccatagatttcatagtcccaattcttctcaaggcactatcaggcaatgggttctgtcctgttgagccgcggtgccaacagttcttagatgaaattattaagtacacaatgcaagatgctctcttcctgaaatattatctcaaaaatgtgggtgctcaagaagactgtgttgatgaccactttcaagaaaaaatcttatcttcaattcagggcaatgaatttttacatcaaatgtttttctggtatgacctggctattttaactcgaaggggtagattaaatcgaggaaactctagatcaacgtggtttgttcatgatgatttaatagacatcttaggctatggggactatgttttttggaagatcccaatttcactgttaccactgaacacacaaggaatcccccatgctgctatggattggtatcagacatcagtattcaaagaagcggttcaagggcatacacacattgtttctgtttctactgccgatgtcttgataatgtgcaaagatttaattacatgtcgattcaacacaactctaatctcaaaaatagcagaggttgaggacccagtttgctctgattatcccaattttaagattgtgtctatgctttaccagagcggagattacttactctccatattagggtctgatgggtataaaatcattaagtttctcgaaccattgtgcttggctaaaattcaattgtgctcaaagtacaccgagaggaagggccgattcttaacacaaatgcatttagctgtaaatcacaccctggaagaaattacagaaatacgtgcactaaagccttcacaggctcacaagatccgtgaattccatagaacattgataaggctggagatgacgccacaacaactttgtgagctattttccatacaaaaacactgggggcatcctgtgctacatagtgaaacagcaatccaaaaagttaaaaaacatgctacggtgctaaaagcattacgccctatcgtgattttcgagacatattgtgtttttaaatatagcattgcaaaacattattttgatagtcaaggatcttggtacagtgttacctcagatagaaatctaacaccaggtcttaattcttatatcaaaagaaatcaattccctccgttgccaatgattaaagaactgctatgggaattttaccaccttgaccatcctccacttttctcaaccaaaattattagtgacttaagtatttttataaaagacagagctactgcagtagaaaggacatgctgggatgcagtattcgagcctaatgttctgggatataatccacctcacaaattcagtaccaaacgtgtaccggaacaatttttagagcaagaaaacttttctattgagaatgttctttcctacgcgcaaaaactcgagtatctactaccacaatatcggaatttttctttctcattgaaagagaaagagttgaatgtaggtagaactttcggaaaattgccttatccgactcgcaatgttcaaacactttgtgaagctctgttagctgatggtcttgctaaagcatttcctagcaatatgatggtagttacggaacgtgaacaaaaagaaagcttattgcatcaagcatcatggcaccacacaagtgatgatttcggtgagcatgccacagttagagggagtagctttgtaactgatttagagaaatacaatcttgcatttaggtatgagtttacagcaccttttatagaatattgcaaccgttgctatggtgttaagaatgtttttaattggatgcattatacaatcccacagtgttatatgcatgtcagtgattattataatccaccgcataacctcacactggaaaatcgaaacaacccccctgaagggcctagttcatacaggggtcatatgggagggattgaaggactgcaacaaaaactctggacaagtatttcatgtgctcaaatttctttagttgaaattaagactggttttaagttgcgctcagctgtgatgggtgacaatcagtgcattaccgttttatcagtcttccccttagagactgatgcaggcgagcaggaacagagcgccgaggacaatgcagcgagggtggccgccagcctagcaaaagttacaagtgcctgtggaatctttttaaaacctgatgaaacatttgtacattcaggttttatctattttggaaaaaaacaatatttgaatggggtccaattgcctcagtcccttaaaacggctacaagaatggcaccattgtctgatgcaatttttgatgatcttcaagggaccctggctagtataggtactgcttttgagcgatccatctctgagacacgacatatctttccttgcagaataaccgcagctttccatacgttcttttcggtgagaatcttgcaatatcatcacctcggatttaataaaggttttgaccttggacagttaacactcggcaaacctctggatttcggaacaatatcattggcactagcggtaccgcaggtgcttggagggttatccttcttgaatcctgagaaatgtttctaccggaatctaggagatccagttacctcaggtttattccagttaaaaacttatctccgaatgattgagatggatgatttattcttacctttaattgcgaagaaccctgggaactgcactgccattgactttgtgctaaatcctagcggattaaatgttcctgggtcgcaagacttaacttcatttctgcgccagattgtacgtaggactatcaccctaagtgcgaaaaacaaacttattaataccttatttcatgcatcagctgacttcgaagacgaaatggtttgtaagtggctcttatcatcaactcctgttatgagtcgtttcgcagccgatatattttcacgcacgccgagcgggaagcgattgcaaattctaggatacttggaaggaacacgcacattattagcctctaagatcatcaacaataatacagagacgccggttttggacagactgaggaagataacattgcaaaggtggagtctatggtttagttatcttgatcattgtgataatatcctggcggaggctttaacccaaataacttgcacagttgatttagcacagatcctgagggaatattcatgggcacatattttagaggggagacctcttattggagccacactcccatgtatgattgagcaattcaaagtggtttggctgaaaccctacgaacaatgtccgcagtgttcaaatgccaagcaacctggtgggaaaccattcgtgtcagtagcagtcaagaaacatattgttagtgcatggccaaatgcatcccgaataagctggactatcggggatggaatcccatacattggatcaaggacagaagataagatagggcaacctgctattaaaccaaaatgtccttccgcagccttaagagaggccattgaattggcgtcccgtttaacatgggtaactcaaggcagttcgaacagtgacttgctaataaaaccatttttggaagcacgagtaaatttaagtgttcaagaaatacttcaaatgaccccttcacattactcgggaaatattgttcataggtacaacgatcaatacagtcctcattctttcatggccaatcgtatgagtaactcagcaacgcgattgattgtttctacaaacactttaggtgagttttcaggaggtggccaatcggcacgcgacagcaatattattttccagaatgttataaattatgcagttgcactgttcgatattaaatttagaaacactgaggctacagatatccagtataatcgtgctcaccttcatctaactaagtgttgcacccgggaggtaccagctcagtacttaacatacacatctacattggatttagatttaacaagataccgagaaaatgaattgatttatgacaataatcctctaaaaggaggactcaattgcaatatctcatttgataacccatttttccaaggcaaacagctgaacattatagaagatgaccttattcgactgcctcacttatctggatgggagctagctaagaccatcatgcaatcaattatttcagatagcaataattcgtctacagacccaattagcagtggagaaacaagatcattcactacccatttcttaacttatcccaagataggacttctgtacagttttggggcctttgtaagttattatcttggcaatacaattcttcggactaagaaattaacacttgacaattttttatattacttaactacccaaattcataatctaccacatcgctcattgcgaatacttaagccaacattcaaacatgcaagcgttatgtcacgattaatgagtattgatccccatttttctatttacataggcggtgctgcaggtgacagaggactctcagatgcggccaggttatttttgagaacgtccatttcatcttttcttacatttgtaaaggaatggataattaatcgcggaacaattgtccctttatggatagtatatccattagagggtcaaaatccaacacctgttaataatttcctccatcagatcgtagaactgctggtgcatgattcatcaagacaccaggcttttaaaactaccataaatgatcatgtacatcctcacgacaatcttgtttacacatgtaagagtacagccagcaatttcttccatgcgtcattggcgtactggaggagcaggcacagaaacagcaaccgaaaagacttgacaagaaactcttcaactggatcaagcacaaacaacagtgatggtcatattaagagaagtcaagaacaaaccaccagagatccacatgatggcactgaacggagtctagtcctgcaaatgagccatgaaataaaaagaacgacaattccacaagagaacacgcaccagggtccgtcgttccagtcatttctaagtgactctgcttgcggtacagcaaacccaaaactaaatttcgatagatcgagacacaatgtgaaatctcaggatcataactcagcatccaagagggaaggtcatcaaataatctcacatcgtctagtcctacctttctttacattatctcaagggacacgccaattaacgtcatccaatgagtcacaaacccaagatgagatatcaaagtacttacggcaattgagatccgtcattgataccacagtttattgtaggtttaccggtatagtctcgtccatgcattacaaacttgatgaggtcctttgggaaatagagaattttaagtcggctgtgacgctggcagagggagaaggtgctggtgccttactattgattcagaaataccaagttaagaccttatttttcaacacgctagctactgagtccagtatagagtcagaaatagtatcaggaatgactactcctaggatgcttctacctgttatgtcaaaattccataatgaccaaattgagattattcttaacaactcagcaagccaaataacagacataacaaatcctacttggtttaaagaccaaagagcaaggctacctaggcaagtcgaggttataaccatggatgcagagacgacagagaatataaacagatcgaaattgtacgaagctgtacataaattgatcttacaccatgttgatcccagcgtattgaaagcagtggtccttaaagtctttctaagtgataccgagggtatgttatggctaaatgataatctagccccgttttttgccactgggtatttaattaagccaataacgtcaagtgccaggtctagtgagtggtatctttgtctgacgaacttcttatcaactacacgtaagatgccacaccaaaaccatctcagttgtaagcaggtaatacttacggcattgcaactgcaaattcaacggagcccatactggctaagtcatttaactcagtatgctgactgcgatttacatttaagctatatccgccttggttttccatcattagagaaagtactataccacaggtataaccttgtcgattcaaaaagaggtccactagtctctgtcactcagcacttagcacatcttagggcagagattcgagaattgaccaatgattataatcaacagcgacaaagtcggactcaaacatatcactttattcgtactgcaaaaggacgaatcacaaaactagtcaatgattatttaaaattctttcttattgtacaagcattaaaacataatgggacatggcaagctgagtttaagaaattaccagagttgattagtgtgtgcaataggttctatcatattagagattgtaattgtgaagaacgtttcttagttcaaaccttatatttacatagaatgcaggattctgaagttaagcttatcgaaaggctgacagggcttctgagtttatttccagatggtctctacaggttcgattgaataaccgtgcatagtattttgatacttgtaaaggttggttatcaacatacagattataaaaaactcataaattgctctcatacatcatcttgatctgatttcaataaataactatttagataacgaaaggagtccttacattatacactatatttggcctctctccctgcgtgataatcaaaaaattcacaatacagcatgtgtgacatattactgctgcaatgagtctaacgcaacataataaactccgcactctttataattaagctttaacgataggtctgggctcatattgttattgatatagtaatgttgtatcaatatcttgccagatggaatagtgctttggttgataacacgacttcttaaaacaaaactgatctttaagattaagttttttataattgtcattgctttaatttgtcgatttaaaaatggtgatagccttaatctttgtgtaaaataagagattaggtgtaataactttaacatttttgtctagtaagctactattccattcagaatgataaaattaaaagaaaagacatgactgtaaaatcagaaataccttctttacaatatagcagactagataataatcttcgtgttaatgataattaaggcattgaccacgctcatcagaaggctcactagaataaac</INSDSeq_sequence>
  <INSDSeq_xrefs>
    <INSDXref>
      <INSDXref_dbname>BioProject</INSDXref_dbname>
      <INSDXref_id>PRJNA257197</INSDXref_id>
    </INSDXref>
    <INSDXref>
      <INSDXref_dbname>BioSample</INSDXref_dbname>
      <INSDXref_id>SAMN02951952</INSDXref_id>
    </INSDXref>
  </INSDSeq_xrefs>
</INSDSeq>

To extract all the information about organism, host, sampling time, etc., that is held in the list of INSDQualifiers, I loop through all the sequences and generate a dictionary with accession as the key and a dictionary of qualifiers as the value.

I start by initialising an empty dictionary, with strings as both the key and the value.


In [7]:
seq_dict=Dict{ASCIIString,Dict{ASCIIString,ASCIIString}}()


Out[7]:
Dict{ASCIIString,Dict{ASCIIString,ASCIIString}} with 0 entries

Extracting the information is a mixture of find_element and find_elements_by_tagname to search for the right elements, get_elements_by_tagname, and finally using content to extract the contents of the qualifiers.


In [8]:
for i in 1:numseq
    s=sequences[i]
    accession=content(find_element(s, "INSDSeq_primary-accession"))
    feature_table=find_element(s,"INSDSeq_feature-table")
    features=get_elements_by_tagname(feature_table,"INSDFeature")
    feature_quals=get_elements_by_tagname(features[1], "INSDFeature_quals")
    qualifiers=get_elements_by_tagname(feature_quals[1], "INSDQualifier")
    qualifier_dict=Dict{ASCIIString,ASCIIString}()
    for q in qualifiers
        n=find_element(q,"INSDQualifier_name")
        v=find_element(q,"INSDQualifier_value")
        if v!=nothing
            qualifier_dict[content(n)]=content(v)
        end
    end
    seq_dict[accession]=qualifier_dict
end;

Here is an example of the features for the first accession.


In [9]:
seq_dict[accessions[1]]


Out[9]:
Dict{ASCIIString,ASCIIString} with 8 entries:
  "organism"         => "Zaire ebolavirus"
  "isolation_source" => "serum"
  "host"             => "Homo sapiens"
  "mol_type"         => "viral cRNA"
  "collection_date"  => "25-May-2014"
  "isolate"          => "Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM095B"
  "db_xref"          => "taxon:186538"
  "country"          => "Sierra Leone"

To flatten the dictionary, I first make a dictionary of all feature names, with the number of times the field is found.


In [10]:
fn_dict=(ASCIIString=>Int64)[]
for acc in keys(seq_dict)
    features=seq_dict[acc]
    for k in keys(features)
        current_count=get(fn_dict,k,0)
        fn_dict[k]=current_count+1
    end
end
fn_dict


WARNING: deprecated syntax "(ASCIIString=>Int64)[]" at In[10]:1.
Use "Dict{ASCIIString,Int64}()" instead.
Out[10]:
Dict{ASCIIString,Int64} with 9 entries:
  "organism"         => 249
  "isolation_source" => 165
  "host"             => 249
  "collected_by"     => 150
  "mol_type"         => 249
  "collection_date"  => 249
  "isolate"          => 249
  "db_xref"          => 249
  "country"          => 249

I extract the names of the qualifiers as a list, that will be used below to construct a DataFrame.


In [11]:
feature_names=collect(keys(fn_dict))


Out[11]:
9-element Array{ASCIIString,1}:
 "organism"        
 "isolation_source"
 "host"            
 "collected_by"    
 "mol_type"        
 "collection_date" 
 "isolate"         
 "db_xref"         
 "country"         

I then loop through each feature name, for each sequence, determine whether the feature is present, and construct a DataArray, which is then added to a DataFrame.


In [12]:
df=DataFrame(accession=accessions)
numfeatures=length(feature_names)
for i in 1:numfeatures
    key=feature_names[i]
    dv=DataArray(ASCIIString[],Bool[])
    for j in 1:numseq
        acc=accessions[j]
        f=seq_dict[acc]
        val=get(f,key,NA) # NA is the default
        push!(dv,val)
    end
    df[symbol(key)]=dv
end;

I now have a DataFrame that has the features in a flat format.


In [13]:
head(df)


Out[13]:
accessionorganismisolation_sourcehostcollected_bymol_typecollection_dateisolatedb_xrefcountry
1KM034549Zaire ebolavirusserumHomo sapiensNAviral cRNA25-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM095Btaxon:186538Sierra Leone
2KM034550Zaire ebolavirusserumHomo sapiensNAviral cRNA25-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM095taxon:186538Sierra Leone
3KM034551Zaire ebolavirusserumHomo sapiensNAviral cRNA26-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM096taxon:186538Sierra Leone
4KM034552Zaire ebolavirusserumHomo sapiensNAviral cRNA26-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM098taxon:186538Sierra Leone
5KM034553Zaire ebolavirusserumHomo sapiensNAviral cRNA27-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3670.1taxon:186538Sierra Leone
6KM034554Zaire ebolavirusserumHomo sapiensNAviral cRNA27-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3676.1taxon:186538Sierra Leone

Extract patient ID from dataframe


In [14]:
ids = [x |> # select
    (x)->split(x,"-") |> # split on hyphen
    last |>  # take last
    (x)->split(x,".") |> # split on period
    first for x in df[:isolate]]
df[:ids] = ids;

Load in annotations, obtained from the Sabeti/Garry labs available hereand select on the basis of IDs.


In [15]:
annot = readtable("ebola-data.csv")


Out[15]:
Patient_IDDiagnosisAgeGenderVillageChiefdomDistrictOutcomeDate_of_OutcomeAdmitted_at_reportPre_admission_dateDate_of_admissionDate_of_dischargeTemperatureSystolic_pressureDiastolic_pressureHearth_rateRespiratory_rateDays_since_onsetOxygen_saturationBleeding_gumsBleeding_noseBlood_in_stoolBlood_in_vomitBleeding_injectionBleeding_hematomaBlood_in_sputumBlood_in_urineVaginal_bleedingNo_bleedingAbdominal_painJoint_painMuscle_painBack_painSide_painRetrosternal_painOther_painNo_painFeverConjunctivitisEdemaInflammationRashHeadacheSore_throatVomitCoughDiarrheaWeaknessDizzinessHearingConvulsionsConfusionJaundiceOther_symptomsNo_symptomsAntimalarialsCeftriaxoneParacetamolMetronidazoleArtemisinin_Combination_TherapyCiprofloxacinAmpicillinOmeprazoleDate_of_metabolic_panel_1Alanine_Aminotransferase_U_L_day_1Albumin_g_L_day_1Alkaline_Phosphatase_U_L_day_1Aspartate_Aminotransferase_U_L_day_1Calcium_mmol_L_day_1Chloride_mmol_L_day_1Creatinine_umol_L_day_1Glucose_mmol_L_day_1Potassium_mmol_L_day_1Sodium_mmol_L_day_1Total_Bilirubin_umol_L_day_1Total_Carbon_Dioxide_mmol_L_day_1Total_Protein_g_L_day_1Blood_Urea_Nitrogen_mmol_urea_L_day_1Date_of_metabolic_panel_2Alanine_Aminotransferase_U_L_day_2Albumin_g_L_day_2Alkaline_Phosphatase_U_L_day_2Aspartate_Aminotransferase_U_L_day_2Calcium_mmol_L_day_2Chloride_mmol_L_day_2Creatinine_umol_L_day_2Glucose_mmol_L_day_2Potassium_mmol_L_day_2Sodium_mmol_L_day_2Total_Bilirubin_umol_L_day_2Total_Carbon_Dioxide_mmol_L_day_2Total_Protein_g_L_day_2Blood_Urea_Nitrogen_mmol_urea_L_day_2Date_of_metabolic_panel_3Alanine_Aminotransferase_U_L_day_3Albumin_g_L_day_3Alkaline_Phosphatase_U_L_day_3Aspartate_Aminotransferase_U_L_day_3Calcium_mmol_L_day_3Chloride_mmol_L_day_3Creatinine_umol_L_day_3Glucose_mmol_L_day_3Potassium_mmol_L_day_3Sodium_mmol_L_day_3Total_Bilirubin_umol_L_day_3Total_Carbon_Dioxide_mmol_L_day_3Total_Protein_g_L_day_3Blood_Urea_Nitrogen_mmol_urea_L_day_3Date_of_metabolic_panel_4Alanine_Aminotransferase_U_L_day_4Albumin_g_L_day_4Alkaline_Phosphatase_U_L_day_4Aspartate_Aminotransferase_U_L_day_4Calcium_mmol_L_day_4Chloride_mmol_L_day_4Creatinine_umol_L_day_4Glucose_mmol_L_day_4Potassium_mmol_L_day_4Sodium_mmol_L_day_4Total_Bilirubin_umol_L_day_4Total_Carbon_Dioxide_mmol_L_day_4Total_Protein_g_L_day_4Blood_Urea_Nitrogen_mmol_urea_L_day_4Date_of_metabolic_panel_5Alanine_Aminotransferase_U_L_day_5Albumin_g_L_day_5Alkaline_Phosphatase_U_L_day_5Aspartate_Aminotransferase_U_L_day_5Calcium_mmol_L_day_5Chloride_mmol_L_day_5Creatinine_umol_L_day_5Glucose_mmol_L_day_5Potassium_mmol_L_day_5Sodium_mmol_L_day_5Total_Bilirubin_umol_L_day_5Total_Carbon_Dioxide_mmol_L_day_5Total_Protein_g_L_day_5Blood_Urea_Nitrogen_mmol_urea_L_day_5First_measured_viral_load_log_units_Maximum_measured_viral_load_log_units_Minimum_measured_viral_load_log_units_Averaged_viral_load_log_units_Date_of_qPCR_1EBOV_copies_mL_plasma_log_units_day_1Date_of_qPCR_2EBOV_copies_mL_plasma_log_units_day_2Date_of_qPCR_3EBOV_copies_mL_plasma_log_units_day_3Date_of_qPCR_4EBOV_copies_mL_plasma_log_units_day_4Date_of_qPCR_5EBOV_copies_mL_plasma_log_units_day_5Date_of_qPCR_6EBOV_copies_mL_plasma_log_units_day_6SNP_572SNP_800SNP_1024SNP_1288SNP_1492SNP_1849SNP_2124SNP_2185SNP_2341SNP_2364SNP_2497SNP_2931SNP_3116SNP_3388SNP_3638SNP_4340SNP_4505SNP_4709SNP_4759SNP_4976SNP_5461SNP_6175SNP_6283SNP_6909SNP_8280SNP_8928SNP_9390SNP_9536SNP_9923SNP_10005SNP_10218SNP_10252SNP_10268SNP_10509SNP_10743SNP_10801SNP_11142SNP_11811SNP_11943SNP_12878SNP_12885SNP_13856SNP_13923SNP_14019SNP_14232SNP_15599SNP_15660SNP_15963SNP_16054SNP_16455SNP_16750SNP_17142SNP_17985SNP_18412SNP_18895Allele_Frequency_10218Cluster_mutations_from_clusterSub_cluster_mutations_from_sub_cluster
1EM-095Positive42.0FemaleKoinduKissi TengKailahunNANAYesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA5.175204559195.175204559195.175204559195.175204559192014-05-275.17520455919NANANANANANANANANANANoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNACluster 10NANA
2EM-95BPositiveNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA6.124507401456.124507401456.124507401456.12450740145NA6.12450740145NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
3EM-099NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-05-27NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
4EM-100NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-02NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
5EM-101NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-02NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
6EM-102NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-02NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
7EM-103NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-02NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
8EM-105NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-02NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
9EM-108NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-02NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
10EM-109NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-02NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
11EM-112Positive65.0FemaleNjalaJawieKailahunDied2014-06-03NoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA7.748225767997.748225767997.748225767997.748225767992014-06-037.74822576799NANANANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.750850993741Cluster 30NANA
12EM-114NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-03NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
13EM-117NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-03NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
14EM-118NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-03NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
15EM-121Positive44.0MaleFoinduKissi KamaKailahunDied2014-06-06YesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA7.100693308827.100693308827.100693308827.100693308822014-06-047.10069330882NANANANANANANANANANANoYesNoNoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.0Cluster 21NANA
16EM-122Positive11.0FemaleDaruJawieKailahunDied2014-06-08NoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-07NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
17EM-123Positive46.0FemaleDaruJawieKailahunDied2014-06-09Yes2014-06-05NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-06NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
18EM-124Positive35.0FemaleDaruJawieKailahunDied2014-06-22Yes2014-06-05NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-10263.023.0335.0638.01.6295.0720.05.7NA124.01218.06827.02014-06-11187.021217351.01.751057365.82.012714166033.72014-06-1213722164212.01.83102754.04.32.112720206036.32014-06-1310322144117.01.84998263.72.012516236541.32014-06-148922117931.771007482.82.312512226440.35.162846764015.162846764012.579310130713.749398672282014-06-065.16284676401NA4.31662430974NA3.87158337583NA2.57931013071NA2.816628781092014-06-18NANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.868852459016Cluster 30NANA
19EM-125NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-06NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
20EM-127NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-06NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
21G-3670Positive20.0FemaleKoinduKissi TengKailahunDischarged2014-07-08Yes2014-05-26NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA7.699014700877.699014700877.699014700877.699014700872014-05-277.699014700872014-06-06NANANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.0Cluster 22NANA
22G-3676Positive45.0FemaleBueduKissi TengKailahunDied2014-05-30YesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA8.171336707698.807610748268.171336707698.489473727972014-05-278.17133670769NA8.80761074826NANANANANANANANANoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNo0.0Cluster 10NANA
23G-3677Positive50.0FemaleKoinduKissi TengKailahunDied2014-05-27YesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA9.144730792559.144730792559.116301381849.130516087192014-05-269.144730792552014-05-279.11630138184NANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.0Cluster 20NANA
24G-3678NegativeNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-05-29NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
25G-3679Positive15.0FemaleNyummduKissi TengKailahunNANAYesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA7.262231693577.3171576127.262231693577.289694652792014-05-267.262231693572014-05-287.317157612NANANANANANANANANoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoYesNo0.0Cluster 22Sub-cluster a0
26G-3680Positive8.0FemaleNyummduKissi TengKailahunNANANoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA8.571036395398.571036395398.571036395398.571036395392014-05-288.57103639539NANANANANANANANANANANoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNACluster 11NANA
27G-3681Positive55.0FemaleKolosuKissi TengKailahunDiedNANoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA5.392866423145.392866423145.392866423145.392866423142014-05-285.39286642314NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA
28G-3682Positive54.0FemaleKolosuKissi TengKailahunDiedNAYesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA5.920247780369.207368186015.920247780367.563807983192014-05-275.920247780362014-05-289.20736818601NANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.0Cluster 20NANA
29G-3683Positive57.0FemaleFokomaKissi TengKailahunNANANoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA7.108498052767.108498052767.108498052767.108498052762014-05-287.10849805276NANANANANANANANANANANoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNACluster 11NANA
30G-3686Positive27.0FemaleBueduKissi TongiKailahunNANANoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA9.643371168229.643371168229.643371168229.643371168222014-05-299.64337116822NANANANANANANANANANANoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNo0.0Cluster 12NANA
&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip

In [16]:
annot[:ids] = [x |> (x)->replace(x,"-","") for x in annot[:Patient_ID]]


Out[16]:
213-element Array{Any,1}:
 "EM095"
 "EM95B"
 "EM099"
 "EM100"
 "EM101"
 "EM102"
 "EM103"
 "EM105"
 "EM108"
 "EM109"
 "EM112"
 "EM114"
 "EM117"
 ⋮      
 "G3832"
 "G3833"
 "G3842"
 "G3852"
 "G3853"
 "G3854"
 "G3844"
 "G3849"
 "G3858"
 "G3859"
 "G3835"
 "G3836"

Marge sequences and annotations


In [17]:
bigdf = join(annot,df,on=:ids,kind=:inner)


Out[17]:
Patient_IDDiagnosisAgeGenderVillageChiefdomDistrictOutcomeDate_of_OutcomeAdmitted_at_reportPre_admission_dateDate_of_admissionDate_of_dischargeTemperatureSystolic_pressureDiastolic_pressureHearth_rateRespiratory_rateDays_since_onsetOxygen_saturationBleeding_gumsBleeding_noseBlood_in_stoolBlood_in_vomitBleeding_injectionBleeding_hematomaBlood_in_sputumBlood_in_urineVaginal_bleedingNo_bleedingAbdominal_painJoint_painMuscle_painBack_painSide_painRetrosternal_painOther_painNo_painFeverConjunctivitisEdemaInflammationRashHeadacheSore_throatVomitCoughDiarrheaWeaknessDizzinessHearingConvulsionsConfusionJaundiceOther_symptomsNo_symptomsAntimalarialsCeftriaxoneParacetamolMetronidazoleArtemisinin_Combination_TherapyCiprofloxacinAmpicillinOmeprazoleDate_of_metabolic_panel_1Alanine_Aminotransferase_U_L_day_1Albumin_g_L_day_1Alkaline_Phosphatase_U_L_day_1Aspartate_Aminotransferase_U_L_day_1Calcium_mmol_L_day_1Chloride_mmol_L_day_1Creatinine_umol_L_day_1Glucose_mmol_L_day_1Potassium_mmol_L_day_1Sodium_mmol_L_day_1Total_Bilirubin_umol_L_day_1Total_Carbon_Dioxide_mmol_L_day_1Total_Protein_g_L_day_1Blood_Urea_Nitrogen_mmol_urea_L_day_1Date_of_metabolic_panel_2Alanine_Aminotransferase_U_L_day_2Albumin_g_L_day_2Alkaline_Phosphatase_U_L_day_2Aspartate_Aminotransferase_U_L_day_2Calcium_mmol_L_day_2Chloride_mmol_L_day_2Creatinine_umol_L_day_2Glucose_mmol_L_day_2Potassium_mmol_L_day_2Sodium_mmol_L_day_2Total_Bilirubin_umol_L_day_2Total_Carbon_Dioxide_mmol_L_day_2Total_Protein_g_L_day_2Blood_Urea_Nitrogen_mmol_urea_L_day_2Date_of_metabolic_panel_3Alanine_Aminotransferase_U_L_day_3Albumin_g_L_day_3Alkaline_Phosphatase_U_L_day_3Aspartate_Aminotransferase_U_L_day_3Calcium_mmol_L_day_3Chloride_mmol_L_day_3Creatinine_umol_L_day_3Glucose_mmol_L_day_3Potassium_mmol_L_day_3Sodium_mmol_L_day_3Total_Bilirubin_umol_L_day_3Total_Carbon_Dioxide_mmol_L_day_3Total_Protein_g_L_day_3Blood_Urea_Nitrogen_mmol_urea_L_day_3Date_of_metabolic_panel_4Alanine_Aminotransferase_U_L_day_4Albumin_g_L_day_4Alkaline_Phosphatase_U_L_day_4Aspartate_Aminotransferase_U_L_day_4Calcium_mmol_L_day_4Chloride_mmol_L_day_4Creatinine_umol_L_day_4Glucose_mmol_L_day_4Potassium_mmol_L_day_4Sodium_mmol_L_day_4Total_Bilirubin_umol_L_day_4Total_Carbon_Dioxide_mmol_L_day_4Total_Protein_g_L_day_4Blood_Urea_Nitrogen_mmol_urea_L_day_4Date_of_metabolic_panel_5Alanine_Aminotransferase_U_L_day_5Albumin_g_L_day_5Alkaline_Phosphatase_U_L_day_5Aspartate_Aminotransferase_U_L_day_5Calcium_mmol_L_day_5Chloride_mmol_L_day_5Creatinine_umol_L_day_5Glucose_mmol_L_day_5Potassium_mmol_L_day_5Sodium_mmol_L_day_5Total_Bilirubin_umol_L_day_5Total_Carbon_Dioxide_mmol_L_day_5Total_Protein_g_L_day_5Blood_Urea_Nitrogen_mmol_urea_L_day_5First_measured_viral_load_log_units_Maximum_measured_viral_load_log_units_Minimum_measured_viral_load_log_units_Averaged_viral_load_log_units_Date_of_qPCR_1EBOV_copies_mL_plasma_log_units_day_1Date_of_qPCR_2EBOV_copies_mL_plasma_log_units_day_2Date_of_qPCR_3EBOV_copies_mL_plasma_log_units_day_3Date_of_qPCR_4EBOV_copies_mL_plasma_log_units_day_4Date_of_qPCR_5EBOV_copies_mL_plasma_log_units_day_5Date_of_qPCR_6EBOV_copies_mL_plasma_log_units_day_6SNP_572SNP_800SNP_1024SNP_1288SNP_1492SNP_1849SNP_2124SNP_2185SNP_2341SNP_2364SNP_2497SNP_2931SNP_3116SNP_3388SNP_3638SNP_4340SNP_4505SNP_4709SNP_4759SNP_4976SNP_5461SNP_6175SNP_6283SNP_6909SNP_8280SNP_8928SNP_9390SNP_9536SNP_9923SNP_10005SNP_10218SNP_10252SNP_10268SNP_10509SNP_10743SNP_10801SNP_11142SNP_11811SNP_11943SNP_12878SNP_12885SNP_13856SNP_13923SNP_14019SNP_14232SNP_15599SNP_15660SNP_15963SNP_16054SNP_16455SNP_16750SNP_17142SNP_17985SNP_18412SNP_18895Allele_Frequency_10218Cluster_mutations_from_clusterSub_cluster_mutations_from_sub_clusteridsaccessionorganismisolation_sourcehostcollected_bymol_typecollection_dateisolatedb_xrefcountry
1EM-095Positive42.0FemaleKoinduKissi TengKailahunNANAYesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA5.175204559195.175204559195.175204559195.175204559192014-05-275.17520455919NANANANANANANANANANANoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNACluster 10NANAEM095KM034550Zaire ebolavirusserumHomo sapiensNAviral cRNA25-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM095taxon:186538Sierra Leone
2EM-112Positive65.0FemaleNjalaJawieKailahunDied2014-06-03NoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA7.748225767997.748225767997.748225767997.748225767992014-06-037.74822576799NANANANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.750850993741Cluster 30NANAEM112KM233039Zaire ebolavirusNAHomo sapiensNAviral cRNA03-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM112taxon:186538Sierra Leone
3EM-121Positive44.0MaleFoinduKissi KamaKailahunDied2014-06-06YesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA7.100693308827.100693308827.100693308827.100693308822014-06-047.10069330882NANANANANANANANANANANoYesNoNoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.0Cluster 21NANAEM121KM233044Zaire ebolavirusNAHomo sapiensNAviral cRNA04-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM121taxon:186538Sierra Leone
4EM-124Positive35.0FemaleDaruJawieKailahunDied2014-06-22Yes2014-06-05NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-10263.023.0335.0638.01.6295.0720.05.7NA124.01218.06827.02014-06-11187.021217351.01.751057365.82.012714166033.72014-06-1213722164212.01.83102754.04.32.112720206036.32014-06-1310322144117.01.84998263.72.012516236541.32014-06-148922117931.771007482.82.312512226440.35.162846764015.162846764012.579310130713.749398672282014-06-065.16284676401NA4.31662430974NA3.87158337583NA2.57931013071NA2.816628781092014-06-18NANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.868852459016Cluster 30NANAEM124KM233045Zaire ebolavirusNAHomo sapiensNAviral cRNA04-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM124.1taxon:186538Sierra Leone
5EM-124Positive35.0FemaleDaruJawieKailahunDied2014-06-22Yes2014-06-05NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-10263.023.0335.0638.01.6295.0720.05.7NA124.01218.06827.02014-06-11187.021217351.01.751057365.82.012714166033.72014-06-1213722164212.01.83102754.04.32.112720206036.32014-06-1310322144117.01.84998263.72.012516236541.32014-06-148922117931.771007482.82.312512226440.35.162846764015.162846764012.579310130713.749398672282014-06-065.16284676401NA4.31662430974NA3.87158337583NA2.57931013071NA2.816628781092014-06-18NANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.868852459016Cluster 30NANAEM124KM233046Zaire ebolavirusNAHomo sapiensNAviral cRNA06-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM124.2taxon:186538Sierra Leone
6EM-124Positive35.0FemaleDaruJawieKailahunDied2014-06-22Yes2014-06-05NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-10263.023.0335.0638.01.6295.0720.05.7NA124.01218.06827.02014-06-11187.021217351.01.751057365.82.012714166033.72014-06-1213722164212.01.83102754.04.32.112720206036.32014-06-1310322144117.01.84998263.72.012516236541.32014-06-148922117931.771007482.82.312512226440.35.162846764015.162846764012.579310130713.749398672282014-06-065.16284676401NA4.31662430974NA3.87158337583NA2.57931013071NA2.816628781092014-06-18NANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.868852459016Cluster 30NANAEM124KM233047Zaire ebolavirusNAHomo sapiensNAviral cRNA08-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM124.3taxon:186538Sierra Leone
7EM-124Positive35.0FemaleDaruJawieKailahunDied2014-06-22Yes2014-06-05NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-10263.023.0335.0638.01.6295.0720.05.7NA124.01218.06827.02014-06-11187.021217351.01.751057365.82.012714166033.72014-06-1213722164212.01.83102754.04.32.112720206036.32014-06-1310322144117.01.84998263.72.012516236541.32014-06-148922117931.771007482.82.312512226440.35.162846764015.162846764012.579310130713.749398672282014-06-065.16284676401NA4.31662430974NA3.87158337583NA2.57931013071NA2.816628781092014-06-18NANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.868852459016Cluster 30NANAEM124KM233048Zaire ebolavirusNAHomo sapiensNAviral cRNA09-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-EM124.4taxon:186538Sierra Leone
8G-3670Positive20.0FemaleKoinduKissi TengKailahunDischarged2014-07-08Yes2014-05-26NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA7.699014700877.699014700877.699014700877.699014700872014-05-277.699014700872014-06-06NANANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.0Cluster 22NANAG3670KM034553Zaire ebolavirusserumHomo sapiensNAviral cRNA27-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3670.1taxon:186538Sierra Leone
9G-3676Positive45.0FemaleBueduKissi TengKailahunDied2014-05-30YesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA8.171336707698.807610748268.171336707698.489473727972014-05-278.17133670769NA8.80761074826NANANANANANANANANoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNo0.0Cluster 10NANAG3676KM034554Zaire ebolavirusserumHomo sapiensNAviral cRNA27-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3676.1taxon:186538Sierra Leone
10G-3676Positive45.0FemaleBueduKissi TengKailahunDied2014-05-30YesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA8.171336707698.807610748268.171336707698.489473727972014-05-278.17133670769NA8.80761074826NANANANANANANANANoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNo0.0Cluster 10NANAG3676KM034555Zaire ebolavirusserumHomo sapiensNAviral cRNA06-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3676.2taxon:186538Sierra Leone
11G-3677Positive50.0FemaleKoinduKissi TengKailahunDied2014-05-27YesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA9.144730792559.144730792559.116301381849.130516087192014-05-269.144730792552014-05-279.11630138184NANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.0Cluster 20NANAG3677KM034556Zaire ebolavirusserumHomo sapiensNAviral cRNA26-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3677.1taxon:186538Sierra Leone
12G-3677Positive50.0FemaleKoinduKissi TengKailahunDied2014-05-27YesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA9.144730792559.144730792559.116301381849.130516087192014-05-269.144730792552014-05-279.11630138184NANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.0Cluster 20NANAG3677KM034557Zaire ebolavirusserumHomo sapiensNAviral cRNA27-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3677.2taxon:186538Sierra Leone
13G-3679Positive15.0FemaleNyummduKissi TengKailahunNANAYesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA7.262231693577.3171576127.262231693577.289694652792014-05-267.262231693572014-05-287.317157612NANANANANANANANANoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoYesNo0.0Cluster 22Sub-cluster a0G3679KM034558Zaire ebolavirusserumHomo sapiensNAviral cRNA28-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3679.1taxon:186538Sierra Leone
14G-3680Positive8.0FemaleNyummduKissi TengKailahunNANANoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA8.571036395398.571036395398.571036395398.571036395392014-05-288.57103639539NANANANANANANANANANANoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNACluster 11NANAG3680KM034559Zaire ebolavirusserumHomo sapiensNAviral cRNA28-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3680.1taxon:186538Sierra Leone
15G-3682Positive54.0FemaleKolosuKissi TengKailahunDiedNAYesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA5.920247780369.207368186015.920247780367.563807983192014-05-275.920247780362014-05-289.20736818601NANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.0Cluster 20NANAG3682KM034560Zaire ebolavirusserumHomo sapiensNAviral cRNA28-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3682.1taxon:186538Sierra Leone
16G-3683Positive57.0FemaleFokomaKissi TengKailahunNANANoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA7.108498052767.108498052767.108498052767.108498052762014-05-287.10849805276NANANANANANANANANANANoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNACluster 11NANAG3683KM034561Zaire ebolavirusserumHomo sapiensNAviral cRNA28-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3683.1taxon:186538Sierra Leone
17G-3686Positive27.0FemaleBueduKissi TongiKailahunNANANoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA9.643371168229.643371168229.643371168229.643371168222014-05-299.64337116822NANANANANANANANANANANoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNo0.0Cluster 12NANAG3686KM034562Zaire ebolavirusserumHomo sapiensNAviral cRNA28-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3686.1taxon:186538Sierra Leone
18G-3687Positive38.0FemaleBueduKissi TongiKailahunNANANoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA5.306107213015.306107213015.306107213015.306107213012014-05-285.30610721301NANANANANANANANANANANoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNACluster 11NANAG3687KM034563Zaire ebolavirusserumHomo sapiensNAviral cRNA28-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3687.1taxon:186538Sierra Leone
19G-3707Positive38.0FemaleDaruJawieKailahunDied2014-06-06Yes2014-05-312014-05-31NA36.7110709822NA100NoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesYesNoNoNoNoYesNoNoNoYesNoYesNoNoNoNoNoneNoNoYesYesYesNoNoNoNoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA6.911624109856.911624109856.911624109856.911624109852014-05-316.91162410985NANANANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.832594235033Cluster 30NANAG3707KM233049Zaire ebolavirusNAHomo sapiensNAviral cRNA31-May-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3707taxon:186538Sierra Leone
20G-3713Positive37.0FemaleDambuNjaluahunKailahunDied2014-06-11Yes2014-06-042014-06-042014-06-1138.4100609624797NoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNeck painNoYesYesYesNoNoYesNoYesNoNoYesYesNoNoNoNoNauseaNoYesYesYesNoNoNoNoNo2014-06-0442.039.056.0221.01.999.089.06.83.6125.0821.0712.72014-06-09740.0309002000.011.589810127.53.21231687231.32014-06-10607279122000.011.511031343.04.7NA1282676945.22014-06-11402198432000.011.1611314724.43.31273255649.7NANANANANANANANANANANANANANANA8.439906960498.439906960497.881325426338.214497925222014-06-07NA2014-06-098.43990696049NA7.88132542633NA8.32226138883NANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoYesYesNoYesNoNoNoYesNoNoNo1.0Cluster 31NANAG3713KM233050Zaire ebolavirusNAHomo sapiensNAviral cRNA09-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3713.2taxon:186538Sierra Leone
21G-3713Positive37.0FemaleDambuNjaluahunKailahunDied2014-06-11Yes2014-06-042014-06-042014-06-1138.4100609624797NoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNeck painNoYesYesYesNoNoYesNoYesNoNoYesYesNoNoNoNoNauseaNoYesYesYesNoNoNoNoNo2014-06-0442.039.056.0221.01.999.089.06.83.6125.0821.0712.72014-06-09740.0309002000.011.589810127.53.21231687231.32014-06-10607279122000.011.511031343.04.7NA1282676945.22014-06-11402198432000.011.1611314724.43.31273255649.7NANANANANANANANANANANANANANANA8.439906960498.439906960497.881325426338.214497925222014-06-07NA2014-06-098.43990696049NA7.88132542633NA8.32226138883NANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoYesYesNoYesNoNoNoYesNoNoNo1.0Cluster 31NANAG3713KM233051Zaire ebolavirusNAHomo sapiensNAviral cRNA11-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3713.3taxon:186538Sierra Leone
22G-3713Positive37.0FemaleDambuNjaluahunKailahunDied2014-06-11Yes2014-06-042014-06-042014-06-1138.4100609624797NoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNeck painNoYesYesYesNoNoYesNoYesNoNoYesYesNoNoNoNoNauseaNoYesYesYesNoNoNoNoNo2014-06-0442.039.056.0221.01.999.089.06.83.6125.0821.0712.72014-06-09740.0309002000.011.589810127.53.21231687231.32014-06-10607279122000.011.511031343.04.7NA1282676945.22014-06-11402198432000.011.1611314724.43.31273255649.7NANANANANANANANANANANANANANANA8.439906960498.439906960497.881325426338.214497925222014-06-07NA2014-06-098.43990696049NA7.88132542633NA8.32226138883NANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoYesYesNoYesNoNoNoYesNoNoNo1.0Cluster 31NANAG3713KM233052Zaire ebolavirusNAHomo sapiensNAviral cRNA13-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3713.4taxon:186538Sierra Leone
23G-3724Positive45.0FemaleBendumaJawieKailahunDied2014-06-07Yes2014-06-052014-06-052014-06-0739.91006010924894NoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoSacral PainNoYesYesYesNoYesYesNoNoYesNoYesNoYesNoNoNoSemiconsciousNoNoYesNoNoNoNoNoNo2014-06-0595.028.0682.01421.01.9106.0108.06.52.9124.01517.0706.5NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA10.56304727510.56304727510.56304727510.5630472752014-06-0510.563047275NANANANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoYesNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo1.0Cluster 31NANAG3724KM233053Zaire ebolavirusNAHomo sapiensNAviral cRNA05-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3724taxon:186538Sierra Leone
24G-3729Positive28.0FemaleNjalaJawieKailahunNANANoNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-0719.034.0124.028.01.6789.028.01.72.4122.01119.0752.12014-06-092000.012616942000.011.65876614.08.5113519NA8419.4NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA8.838634738588.838634738588.838634738588.838634738582014-06-078.83863473858NANANANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.126352395672Cluster 20NANAG3729KM233054Zaire ebolavirusNAHomo sapiensNAviral cRNA07-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3729taxon:186538Sierra Leone
25G-3734Negative39.0FemaleKailahunLuawaKailahunNANAYesNANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA2014-06-07221.045.0210.01251.02.16120.0145.02.16.3157.0816.08119.92014-06-1080.0357562.02.29107878.44.01291521786.1NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo0.0Cluster 20NANAG3734KM233055Zaire ebolavirusNAHomo sapiensNAviral cRNA07-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3734.1taxon:186538Sierra Leone
26G-3735Positive60.0MaleDaruJawieKailahunDied2014-06-09Yes2014-06-07NA2014-06-0937.41309011822796NoNoNoNoNoNoNoNoNoYesYesYesNoYesNoNoNoNoYesNoNoNoNoYesYesYesNoNoYesYesNoNoNoNoPoor appetiteNoNoYesNoNoNoNoNoNo2014-06-07818.038.0420.02000.011.98101.0237.06.05.4137.01121.06813.52014-06-091503.02611952000.011.991027185.06.413725145830.6NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA9.109590447729.372771586629.109590447729.241181017172014-06-079.109590447722014-06-099.37277158662NANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo1.0Cluster 30NANAG3735KM233056Zaire ebolavirusNAHomo sapiensNAviral cRNA07-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3735.1taxon:186538Sierra Leone
27G-3735Positive60.0MaleDaruJawieKailahunDied2014-06-09Yes2014-06-07NA2014-06-0937.41309011822796NoNoNoNoNoNoNoNoNoYesYesYesNoYesNoNoNoNoYesNoNoNoNoYesYesYesNoNoYesYesNoNoNoNoPoor appetiteNoNoYesNoNoNoNoNoNo2014-06-07818.038.0420.02000.011.98101.0237.06.05.4137.01121.06813.52014-06-091503.02611952000.011.991027185.06.413725145830.6NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA9.109590447729.372771586629.109590447729.241181017172014-06-079.109590447722014-06-099.37277158662NANANANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNo1.0Cluster 30NANAG3735KM233057Zaire ebolavirusNAHomo sapiensNAviral cRNA09-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3735.2taxon:186538Sierra Leone
28G-3750Positive50.0MaleDaruJawieKailahunDischarged2014-06-14Yes2014-06-112014-06-112014-06-1436.21207010520395NoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesYesNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoneNoYesNoNoNoNoNoNoNo2014-06-10136.032.0141.0472.01.82105.0260.06.93.0128.01215.07410.42014-06-11211.027122400.01.981081888.63.3129923758.42014-06-131172465124.01.77109120.05.83.4134830604.8NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA4.27466882364.27466882362.268516360013.479611287942014-06-104.2746688236NA3.89564868023NA2.26851636001NANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoYesNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNoNACluster 31NANAG3750KM233058Zaire ebolavirusNAHomo sapiensNAviral cRNA10-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3750.1taxon:186538Sierra Leone
29G-3750Positive50.0MaleDaruJawieKailahunDischarged2014-06-14Yes2014-06-112014-06-112014-06-1436.21207010520395NoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesYesNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoneNoYesNoNoNoNoNoNoNo2014-06-10136.032.0141.0472.01.82105.0260.06.93.0128.01215.07410.42014-06-11211.027122400.01.981081888.63.3129923758.42014-06-131172465124.01.77109120.05.83.4134830604.8NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA4.27466882364.27466882362.268516360013.479611287942014-06-104.2746688236NA3.89564868023NA2.26851636001NANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoYesNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNoNACluster 31NANAG3750KM233059Zaire ebolavirusNAHomo sapiensNAviral cRNA12-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3750.2taxon:186538Sierra Leone
30G-3750Positive50.0MaleDaruJawieKailahunDischarged2014-06-14Yes2014-06-112014-06-112014-06-1436.21207010520395NoNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoYesYesNoNoNoNoNoNoNoNoYesNoNoNoNoNoNoNoneNoYesNoNoNoNoNoNoNo2014-06-10136.032.0141.0472.01.82105.0260.06.93.0128.01215.07410.42014-06-11211.027122400.01.981081888.63.3129923758.42014-06-131172465124.01.77109120.05.83.4134830604.8NANANANANANANANANANANANANANANANANANANANANANANANANANANANANANA4.27466882364.27466882362.268516360013.479611287942014-06-104.2746688236NA3.89564868023NA2.26851636001NANANANANANANoYesNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoNoYesNoNoNoNoYesNoYesNoNoNoNoYesNoNoNoNoNoNoNoYesNoYesNoNoNoYesNoNoNoNACluster 31NANAG3750KM233060Zaire ebolavirusNAHomo sapiensNAviral cRNA14-Jun-2014Ebola virus/H.sapiens-wt/SLE/2014/Makona-G3750.3taxon:186538Sierra Leone
&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip&vellip

In [18]:
size(bigdf)


Out[18]:
(92,226)

The annotations can now be written to file as a table.


In [19]:
writetable("ebola-sle-2014.txt", df, separator = '\t', header = true)

I make a dictionary of the sequences by accession...


In [20]:
seqstrings=[content(find_element(s,"INSDSeq_sequence")) for s in sequences];
seqdict = Dict{ASCIIString,ASCIIString}()
for i in 1:numseq
    seqdict[accessions[i]]=seqstrings[i]
end

...then I write them out to a FASTA file.


In [21]:
f=open("ebola-sle-2014.fasta","w")
for i in 1:size(bigdf)[1]
    acc = bigdf[:accession][i]
    @printf(f,">%s\n%s\n",acc,seqdict[acc])
end
close(f)