Repetitive DNA elements ("repeats") are DNA sequences prevalent in genomes, especially of higher eukaryotes. Repeats make up about 50% of the human genome and over 80% of the maize genome. Repeats can be categorized as interspersed, where similar DNA sequences are spread throughout the genome, or tandem, where similar sequences are adjacent (see Treangen and Salzberg). Some interspersed repeats are long segmental duplications, but most are relatively short transposons and retrotransposons. Though repeats are sometimes referred to as “junk,” they are involved in processes of current scientific interest, including genome expansion, speciation, and epigenetic regulation (see Fedoroff). Some are still actively expressed and duplicated, including in the human genome (see Witherspoon et al, Tyekucheva et al).

RepeatMasker

RepeatMasker is both a tool for identifying repeats in a genome sequence, and a database of repeats that have been found. The database covers some well known model species, like human, chimpanzee, gorilla, rhesus, rat, mouse, horse, cow, cat, dog, chicken, zebrafish, bee, fruitfly and roundworm. People often use RepeatMasker to remove ("mask out") repetitive sequences from the genome so that they can be ignored (or otherwise treated specially) in later analyses, though that's not our goal here.

It's intructive to click on some of the species listed in the database and examine the associated bar and pie charts describing their repeat content. For example, note the differences between the bar charts for human and mouse, especially for SINE/Alu and LINE/L1.

Working with RepeatMasker databases

Let's obtain and parse a RepeatMasker database. We'll start with roundworm because it's relatively small (only about 2.5 megabytes compressed).


In [1]:
import urllib.request
rm_site = 'http://www.repeatmasker.org'
fn = 'ce10.fa.out.gz'
url = '%s/genomes/ce10/RepeatMasker-rm405-db20140131/%s' % (rm_site, fn)
urllib.request.urlretrieve(url, fn)


Out[1]:
('ce10.fa.out.gz', <http.client.HTTPMessage at 0x7ff3accac278>)

In [2]:
import gzip
import itertools
fh = gzip.open(fn, 'rt')
for ln in itertools.islice(fh, 10):
    print(ln, end='')


   SW  perc perc perc  query      position in query           matching       repeat              position in  repeat
score  div. del. ins.  sequence    begin     end    (left)    repeat         class/family         begin  end (left)   ID

  508   0.0  0.0  0.0  chrI            1     432 (15071991) +  (GCCTAA)n      Simple_repeat            1  432    (0)      1
 1226  10.0  0.0  0.0  chrI          566     595 (15071828) +  (GCCTAA)n      Simple_repeat            1   41  (240)      2
  344  22.2  0.0  0.0  chrI          596     676 (15071747) C  RCS5           Satellite             (41) 1387   1307      3
 1226  10.0  0.0  0.0  chrI          677     846 (15071577) +  (GCCTAA)n      Simple_repeat           42  281    (0)      2
  432  21.9  2.4  0.0  chrI         1622    1744 (15070679) +  LONGPAL1       DNA/MULE-MuDR          136  261 (2330)      4
 8509   0.6  0.0  0.1  chrI         2052    3026 (15069397) +  PALTTTAAA3     DNA                      1  974  (529)      5
 4521   1.1  0.2  0.2  chrI         3124    3652 (15068771) +  PALTTTAAA3     DNA                    974 1502    (1)      6

Above are the first several lines of the .out.gz file for the roundworm (C. elegans). The columns have headers, which are somewhat helpful. More detail is available in the RepeatMasker documentation under "How to read the results". (Note that in addition to the 14 fields descrived in the documentation, there's also a 15th ID field.)

Here's an extremely simple class that parses a line from these files and stores the individual values in its fields:


In [3]:
class Repeat(object):
    def __init__(self, ln):
        # parse fields
        (self.swsc, self.pctdiv, self.pctdel, self.pctins, self.refid,
         self.ref_i, self.ref_f, self.ref_remain, self.orient, self.rep_nm,
         self.rep_cl, self.rep_prior, self.rep_i, self.rep_f, self.unk) = ln.split()
        # int-ize the reference coordinates
        self.ref_i, self.ref_f = int(self.ref_i), int(self.ref_f)

We can parse a file into a list of Repeat objects:


In [4]:
def parse_repeat_masker_db(fn):
    reps = []
    with gzip.open(fn) if fn.endswith('.gz') else open(fn) as fh:
        fh.readline()  # skip header
        fh.readline()  # skip header
        fh.readline()  # skip header
        while True:
            ln = fh.readline()
            if len(ln) == 0:
                break
            reps.append(Repeat(ln.decode('UTF8')))
    return reps

In [5]:
reps = parse_repeat_masker_db('ce10.fa.out.gz')

Extracting repeats from the genome in FASTA format

Now let's obtain the genome for the roundworm in FASTA format. For more information on FASTA, see the FASTA notebook. As seen above, the name of the genome assembly used by RepeatMasker is ce10. We can get it from the UCSC server. It's around 30 MB.


In [6]:
ucsc_site = 'http://hgdownload.cse.ucsc.edu/goldenPath'
fn = 'chromFa.tar.gz'
urllib.request.urlretrieve("%s/ce10/bigZips/%s" % (ucsc_site, fn), fn)


Out[6]:
('chromFa.tar.gz', <http.client.HTTPMessage at 0x7ff38f4ac518>)

In [7]:
!tar zxvf chromFa.tar.gz


chrI.fa
chrII.fa
chrIII.fa
chrIV.fa
chrM.fa
chrV.fa
chrX.fa

Let's load chromosome I into a string so that we can see the sequences of the repeats.


In [8]:
from collections import defaultdict

def parse_fasta(fns):
    ret = defaultdict(list)
    for fn in fns:
        with open(fn, 'rt') as fh:
            for ln in fh:
                if ln[0] == '>':
                    name = ln[1:].rstrip()
                else:
                    ret[name].append(ln.rstrip())
    for k, v in ret.items():
        ret[k] = ''.join(v)
    return ret

In [9]:
genome = parse_fasta(['chrI.fa', 'chrII.fa', 'chrIII.fa', 'chrIV.fa', 'chrM.fa', 'chrV.fa', 'chrX.fa'])

In [10]:
genome['chrI'][:1000]  # printing just the first 1K nucleotides


Out[10]:
'gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaaAAAATTGAGATAAGAAAACATTTTACTTTTTCAAAATTGTTTTCATGCTAAATTCAAAACGTTTTTTTTTTAGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCTGCCTGCCAACCTATATGCTCCTGTGTTtaggcctaatactaagcctaagcctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagcctaaAAGAATATGGTAGCTACAGAAACGGTAGTACACTCTTCTGAAAATACAAAAAATTTGCAATTTTTATAGCTAGGGCACTTTTTGTCTGCCCAAATATAGGCAACCAAAAATAATTGCCAAGTTTTTAATGATTTGTTGCATATTGAAAAAAACA'

Note the combination of lowercase and uppercase. Actually, that relates to our discussion here. The lowercase stretches are repeats! The UCSC genome sequences use the lowercase/uppercase distinction to make it clear where the repeats are -- and they know this because they ran RepeatMasker on the genome beforehand. In this case, the two repeats you can see are both simple hexamer repeats. Also, note that their position in the genome corresponds to the first two rows of the RepeatMasker database that we printed above.

We write a function that, given a Repeat and given a dictionary containing the sequences of all the chromosomes in the genome, outputs each repeat string.


In [11]:
def extract_repeat(rep, genome):
    assert rep.refid in genome
    return genome[rep.refid][rep.ref_i-1:rep.ref_f]

In [12]:
extract_repeat(reps[0], genome)


Out[12]:
'gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaa'

In [13]:
extract_repeat(reps[1], genome)


Out[13]:
'CCTGTGTTtaggcctaatactaagcctaag'

In [14]:
extract_repeat(reps[2], genome)


Out[14]:
'cctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcct'

Let's specifically try to extract a repeat from the DNA/CMC-Chapaev family.


In [15]:
chapaevs = filter(lambda x: 'DNA/CMC-Chapaev' == x.rep_cl, reps)

In [16]:
[extract_repeat(chapaev, genome) for chapaev in chapaevs]


Out[16]:
['cacggcccggcggggggtacatggatgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtgtaaactttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgacatttagcaaaaattgaacataagatttttcggaattatgaagctcaattttcacaaaaataatgagttttttgtagaatttatgaaaaaacgtgaatatatagattttttgttcatgatattcaagaaaaagcgatttttagttcttcacagaggaatcctctcgcatttcacttgctcatgatgttttttgctccactttaggacgataaaaatgcgaattgttgataaaatgaatgaataatataaaaa',
 'ggggctgctgaaaccaatgtcggcatgatgagagttccggtcttctgaatccatttcctgcgtgggctgtggcgacgagctgcacgtctgaaaatcaagtttttgtaatt',
 'tttgggcgcatgatatggagctgaatcattcgattttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaatttaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcagctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaagtttcaaaaattctgagaccaatttttattgagaaaaataatttttcgctcgaattattgaattttcactaaatgcaaaaaacagtaaacttgggcccatgctacaagcctgaatctttcaaattaagaaccagcatgattttttcaatattctaggacgtttaaaaaaaatctggaccaacagtttttgaggaacgtaattttttatacaaaaatgttctgatttttcactaaactcaaaaaaatagtcaagttgggcccatgctgtacacctaaatcattaaaattcagaaccgccatgtattttttcttaccaaaggctctttaaaaaaaatctggaccaacagtttttgagatatttagaaaaacaactcacttttcgacgtttttcgccttttcgtggctcacccggttgatttttgcggcgatttgtggtctttcgctgaaaatattatttttatttcaattattaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaaacgcgaaaaaacatcgaaaaaccaccgcaacctcatgaacaaaaaaaaagcattgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg',
 'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtttaaagtttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgaaatttcgcgaaaattaacagaagatttttttcggaattatagagctgaaattgaaaaaaaaactatcaaattttcatcgaatttgtgaaaaatcgtaagtatgaagatcttttcttcactatattcaaggaaaatcgatatttcgcttttcacagacgaatgatgtctcattttactcgatgaaagtttctgatgagctgtttttatcgatttttgagcgataaaaatgcgatttgttgataaaatggatcaattatataaagaaacaacatatattgctctgagattactttttgagaatcaattctttatttttcggtcattttaaattaagcattaaaataaaaatattagaaatcataataaaaaaaacagaaaatcgatatattactttttcttcggaatttcacgacttttttggacgaattttattctgtaaactttcttcttcgaatttgtgtccacgtggctttcagtcgaagaagattctgcagcactccttcttgcttgcccacaacttactcgaattttctaaaatttttaacttattgaaattgtcatttcacctttacactcacttcagctaaactattactgcatttcggaagttgataggatactggtggagcaacaagtggatggcttctagtgattggctggcttgtcgagcaagtttgtgtgattgcctgaaataatttttgatttcaattttgagttgatttaaagcagtgaacctaccaccgggttcggacgagaaagagcattactcggtagaccacggaatccaattttcgttgaattgcctccaaatgcaatagaagtttgtacgttttgtgagaagtcgggctgaaaattttcaaaatttgaaacttttcgagaaaaataaaaatctcaccacagcatttcgagattttgtcgattgtggaagccttttcctggagcgaaaattgattttttttttcgctaaattttttcttttttgggcagccgtgacgtcccgaataactgcttttgggtcccgaagatcattttgcgaagaaattggcagaactgttgcatcttttggtacgatggaaagaccgggaatggacgtgttctgaaatagttgtgtttttaagaatgcagaaatgtttttctgtaccaaaattaccatagtcatgtcattcatgatgttacgacacatgagctctctcagaacatggatgtaacgccttttcttgtcccggtaattgcaaaatctcctctcaagtgcattgaaaatcgcgtggacagattcaactccttgttctgtgatccttccaatgtttctcacatcttttgccatttgtggtgcatggtagaccaacaagtgcagctttaaaataattgtttcttcgggaaccgctactttcaaatcctccacaaatccgcgaatcgaattttgaagtattaagacgtcggaatcatttaaaaacttgtttcccgaaagtgacataatagttgaaagctttcccattgctgatttcaatccgagcaacattgggcataaatttgggccaaaaatgttgaaagtctcctctacaacagccggcgttagcagcaatttcaaatggtttccgcaaaatgattggaaccaagcctgcttgtccgctccaaacttagcccaacactgtcccattttttcaagtgttccttcgggagtaccattcacaattgtatcgagcaacaatttttccgattgaagtgctttcagttcagcatgcgactccaatttcatctttccggtggctccttgatacttttcttccgcacttttaattaggttaacagcgttttttagagttgcttttcgtgttttcaggataggaaaagaagtagtgttatccaaagtatcagaatatttccagaggggattgaagatatatttgtcaaaaatacccatgataatgtgcagaagaggaatcaaatagaacatgatcgcaacgtgtggcagaagtggagtacatcctttgcgaacacccaagtcgccattttcacaacaagctttgtaaagatcgattgttcgtgggtggaatgtttcatcaacattcatatccttgattttcatcctctcttcagctccccgtggattctgtgcaaaacatttgaagcagaaattgtgggatgaatgtccttggtgtccaagaatatcagattgaaacttgcaatctccagttgcaatttgcacaatttttgcggttttttgaactcctttgtccaaatatcaaattttcgttagcttgccaagctgctcaagaacgtccggaatgaattttttcagagacgaataattgtcggatccgtcatatactgcaattaccataacgtgtctcgaagaattcggtcgagatacgtttccgattaccaatgccaactttgtgcttccacctccagcgtcaccaacgactccaatcttgattactcctttcgtgtatccgtcgtccacaaattgatttgaattgcatagaagctctattcgataggctaaaacttctgcaattttcatgcactgcacaatggtaatcacttttcctttattgtcgaacgaagtggaaactttgaaactggagatcattgataactggattgacaaatctcttgtgttctttaccgatggaagcaaatcatagccaatggcattagtcaaatagtttttgattttttccatctgacttagagataatccgcattttgataaaaagtcaacggcctcaaagtttgaaagcttgtttttgtagctttgattctcttctgaattcaggaattttgtgaattttcgaataaattgtccgacgtcatcctcgaggcagatttcgtgttgaagcaagtgaagagctttgcgaaatcgatttttgatacaacttttgcttcttagattcgaaatattaactttaaaagctgattttttaaggttttcaacttcttcggcgtgtctttgtagactcagaaccatagctttgccacttttcttcacatctgcacagcttctcaccaatcgaccttctataccactgacgatcgttcgtatattgcatacttccatttgcagcgaagaattagatgctcttatagtgatattttcatggcggactatttgcatttcttccgaaaacaccgcaaactcatcaatccgcttttgtatttcttctgatatttcatttttttcatttttcagtcgttcgatcgttagtcggagcattttgatctgcggaatttgctcaacattggagattattcgaaccctcggtgtactgaacgagtttcgtaaaggtgtcggtggaaatacgggattggagaatctcagcaaaatcatataatattagttttgaaatattgaaaaaaattacattgtgagaaaaagtcggaatttcgtcactaaaatccatttccacgtctctcgtcagaattccttcatccatattgaaacaatttgacgacctgcatgtagttgcggagctactggaagcaatgtcgggatggtgggagtttcgatcttctgaactgatttcctgattagcctgtggcgacgagctgcacgtctgaaaatcacgtttttgaagttagaacaaactactccaacttaattaaagttgacaaaattgagctgaacgaacctccactttcgaattgttcagttcttcctcttcagtttgatcttttgaaactccattagcactgttccttgctctctgggcatttgctaaaagaaggcctgcacaagatttttcttttcttttttgtttgaagtatacttttgtcatctggaaatattgcatgaatattataagggaaacaatttttaaatatcgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttaggcgcatgatattgagctgaatgttttgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaattcaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcggctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaaggtttcaaaaattctgagaccaatttttgttgagaaaaataatttttcgttcgaattatcgatttttcacgaaatgccaaaaacagtaaacttgggcccatgctaaaagcctgaatctttcaaattaaaaaccagcatgattttttctatattctaagacgtttaaaaaaaatctggaccaacagttcttgaggaaagtaattttttatacaaaaatgtgctgatttttcactaaattcaaaaaaataatcaagttgggcccatgctatacacctaaatcattaaaattcagaaccgccatgtatgtattttttcataccataggctctttaaaaaaaatctggaccaacagtttttgagatatgtcaaaaaaaacaactcactttttgacgtttttcgccttttcgcggatgatgcggtcgatttttgcggcgatttgtggtctttcgctgaaaatattatttttatttcaatttttaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaacacgaaaaaaacgtcgaaaaactcccgcaacctcatgaaaaaaaataaagcactgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg',
 'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcgtcaatatcagc',
 'aattcctaaattttttattaaaatcgaaaaaaaaaaatgaaatacgtgagattgagtttcgagacttttttattcagaatcagcatatatttctccatatttgagtaggttttcagaaatattgtaccataatttttggaaaaatgtaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaaataaatatgaaatacgcgagattgaggttcaagactttttaattcggaatcagcatatatttttccatatttgagtagattttcagaaatattgtaccataatttttcgagatattttgaataataacttacttttcgacgttttttgcctttgtccggtttaatccatcgaatttcgaagcggtttgcgtagattagctgaaaacattatgcttattccacgtagtaacaagaaaaaacaagaaaaaataagaaaaaacgaagaaaa',
 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaata',
 'cacggtatcacaaaaactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagtgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattaatagtttcaaggtgaaaaaaaaaataaaaaatagattttattattcaatttagtagctaacaattgagaattgaattattgaaacacagaaaaattaattttgaacagtaaaacaaagaaataatcgaaacggaagattgaaaattgaaaaatacatacaaatcgattcaaaaaacgaatttaatatgtggaatcagcctccttcttcgctttacggatgcaagttgaaatttttcttggtaaaactttcgaaaataaggaaacccggtgaaatttcttaattgcctcgactcctggtgaactttttgagataaatccatcgaaattttgctgagtagctcattgacaaggtttcaatctgaaatttcataaaatcaattattttcgaataattttaatcccaacaaaccgaaggaaatcctgaattttagcttttcgatagaatcctaaggtgtcacatcgacaattttccaagttgaaagaaaccatgtggagcatctccgcttataatatcctcgacaaaatattgatgagaaatccatcaaatttgcaccgagtatctgcttgtcaatgtttttacctgaaatttaatgaaattaataattttttaataattttaagcacaactaaccatagaaaatcccgaattttatgatattgaggaaatcccgaaattttaattgcgaaaaggttcagaattgtgtgagggagagcgctgggtggtttattgggaagacgaggcgtcctgcaaagaaaatgttacgatttgtttttttgaaaggttccacttacttcctccatatcagaaattagattgctacatgacagctccttagcaatgtacaggtatcttttccctgtgtttctaaccgagtggaagcgtctttcgaggcgattgaaaacggcatgaagcgattcgattccttgctccgaaagccttcccaaattcctttccgtttttgcgacttctactacgtgagcaaccaggacatgcaatttttgagtgattgattcttctgggtgggcagcttgtaggaattcaacgaattccttcatcgactcatccaattcgctgatgtcatcgtcagagagcaatcgattggccgacaatgacatgattttggacagcctgctcatcgcgttcttgacattcagaagcatcggagtcatgtggtttttcagatttttgaatgcagctgtgacacctttctcagacaaaatcagcttcgtgtgatttccagtgtacatttgaaaccacgctcggcgtgttgcaccaacttcttccagatcattctcaaattgtttaagatatccaccagcaagtccatccagcgtctcgtccaataacactttttcttctttcagagcagtaaactgtgctttcatctctctttttctcttaagtggtgaagcttcgaattttttgttcgcgtcttgaatcttcaaatcggcaactctttttgtctcttctttatttcttcgaatttcaaatgatgttttattgtctaaactcacaacggccatccaaatcggctcaaagatgtatttcgtgaacagtccgacaatcaagtgtagcattgcaggcaagtagtgttccaacttgacattttgaagaattggtccacttccgcatctgacgccaaagctaccgtatttagaattgagcttgtaagaattcattgttttcaagaaatatatcttttgcaagttcagatctttaagcttttgcatcagtcctcctcttggattggtttcgaaacaaaacgggcaaaaataggtggcacattgttttttatgtgacagtaaatcacatgtaaacttaaaatcaccgactactttttggacaacgccacgtgtcaccacccttccatcttccatataggtgatgctggtgaagttgttgatcttcacgataagatcagacaggtaggccatgataagttcccgcgaatcagagtcatcaaaaacagcgagaaggacgattcggtgcggcgagttcgcatgatcacaatttccgatcagaagacaaagtttcgtcgttcctcctcccgaatctccaccaattccgataacaattttgccatcggtataggaatcatggcgtaattgttttgatacagacaaacgctccagcttcggaatgacacttttctcgacatcgataattttgacaacggtcaccttttctccttttgatatatatgttgtggccttataattgtcgatagttgacattctcgttttccgttgcatcgtcaaattaatagttggcatgatttcaagatcggtgaacatcttatagttttgcttcacccgccgcaattgattgttagagaatctacatttttcttgaaaaataatcgtttgccaatgcgtcaattggacttggaaggactgcgaatcattttgttggagatatttctggaaatcaatcatgaaattacgaatatcctcttctcgactgattcgttgaagcaaatccagtgccaaatctatcctagattttccagagaacgattccttcttcgcaactctgatatccatattattctgtactttttcgcccttgacgtaatttgagcgatttgttttcttaatatcagattttcgttggttttccgttttgatgtttttcttattgaaattatctctttcctttgcatgaaaaattcatcagaaaaaacggcaaactcgtcgtctctctccatcaaccgatttagaagtgtctcttcgtcaatgtctgttcttggattgagctggaccgaatggtttgagctggacaggatggcggttctgaaatggaaaattgaataaacaatcaacaacaaagaaataaatctacctctgatatgctatccaaaatggaatcaacttctagaaatctcattcgttctctactcaagtcttctctaattcccttcacatatggtgtctcattcgaaaacacgccgacgctgtcactcgatgaatagagtggacttgacgatctttcaaaagagtgtggtgttcttggaggtgatgaagaagattccgcaatagaagaagttgatggttgtgtgtcgaacggctggaaaaattatttttaaaatattataatcgtcttaggatccgagttgtagcatggattaattcacttacaatatcagtttcaggtgttattgtcagattcaaatcgatattgcgtttcgactccacatccagattatcatcaaaatccacaaatggagacactcgtcttctctcaggagtgatctgcgcatctccgtttctgtcaagagcttctgtcaattctacattcgacaggtttaggtccagggaagcatttctttcagtatttgaattttcgtcttcttcatcttttttccatcgtcgtttccgagcatttagaagatttgaagaccatgctgattgatttgatggaggaggcatctgaaaaaaaaattttttactcaattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtacgatttgtgtatatgtttgcctttatcttgtagagaatttttagctgattctaattcggcaataaaaataaaacattttgtgatttttttcgaaaaaaatcatttttcctccattttcagtgcaaaattttagtttcgaaaaaattttcttgaagaatgtacgatttctaaaaaattctctctgtatcttgcttaaaattaacagttctttctctttgaagcaaaaaactaacacattttgtgatttttttggcaaaaaaaattattttataaattcttattacaaaaaaaaattttttcgaaaaaattttaatgaaaaattaagatttctctatgaattgagttccatcttatacacaattaaattttgatcataaaacaactataaatcgtaaagagtttttgattttcttgaaaaagggaacgtttttaaaaactattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtacgattcgtgtatatgtttgcctttatcttgtagagattttttagctgattctaattcggcaataaaaatataacattttgtgatttttttcgaaaaaaatcatttttcctccattttcagtgcaaaattttagtttcgaaaaaaaaattaatgaagaaggtacgatttctaaaaaattcggcccgtatcttgttcagaatttttagctctctctttttcaatcaaaaaaataaaatattttgtgattttttaagagaactcgtcaaaaaactcacttaattcgttttcctttctgcccacgccgacgctcctttgttttcctcgaattttttcgcttactacctgaaacaagacacactttttcgattttacaaaaaaaaataacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatcgatgccgagtgaaaggcttgaaatttaaacaattgttcgcaatagagcgtgtttgcctccatctagagattgaaccaccgtg',
 'tgctgaaaattgctgaaaatcgaaatttcgtcagctgatgtcgattattctgcgcgggggtacggtacgcaagtccgcaaacactgtcacgccaaattgcgga',
 'cacggtagcacagaaactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagcgaaatggtccgcaatgtgtctcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattactagtttaaagtgaataaaattaaaaaaaaaagattttattattaaatttatcaactaacaattcaaaattaatgtattaaaacacagaaaagttgattttgaacagaaaaacggagtaatcatttaaaaagacaatattaaagtgaaaaaacacgcaaatcgattgaaaaaacgaatttaatatgtggaatcggccttctttttcgcttttcggctgcaagttggaatttttcttgggaaaaactttcgaaaatgaggaaatcagtgggaacttcttaatttcctcgactcctggtgaactttttgtgttaaatccatcgaaattgtgcggaatagctcattgacaaggtttcaatctgcaatttgatgaaattctatatttttaaattattttaatcacaacaaacctgaggaaatcccgaattcgatcctttcgataaaatccagagatgtcactttgccacttttccaaaatgaaggaaaccatgtggagcttctcagctttttgagttcctggccgaaatcatgatgaaaaactatcgaaattgacttgagcagcttcttggcaaggtttttgtctgaaattttaagattttaatgatttttgaacgtttttaacacaacgaaccaaaggaaatcctgaatttcacatttctgactgtttcctgggatgttacatcggcagttttccaaaatgaaggacatcatgtagagcatctccacttattgaaattctggtgaagttcttgccgacaaatccatcgacattacgttgaacgtcttctaggcaaggtgtttatctgaaaattcatga',
 'taagcagtttttgaaaagttttcgaaaaaaaAAAGAATTTCCGTTTTTTGAGATttaattttcagtgaaaaaaatttacttttggaaaatttcaagtgaaaaatgtacgattcgtgtatatgtttgcctttatcttgtagagaatttttagctgattctaattcggcaataaaaatataacattttgtgatcgttttcgaaaaaaaaatctttttctttatttttagtgcaaaattttagtttcgataatttttctatgaagaatgtacgatttctagaaaattctgcctgtatcttgctcaaaattaacagttctttctttttaaagcaaaaaattaacacattttgtgattttttggcaaaaaaaattattttataatttcttatttcaaaaaattttttttcgaaaaaatcttaatgaaaaattaagatttctctatgaatttagttccatcttatacaaaatttaatgctgatcataaaacaactataaaatgtgaagactttttgattttcttgaaaaatggaacgtttttaaaaactgttttcagtgaaaaaaatttacttttgaaaaattgtgaaaattgaaaaattgtgaataagtgaaatatgtacgatctctcaataattttgtcttcatcttgtagagaattgttagctgtttctgattcggcaagaaaaatacaacattttgtgatcgttttcgaaaaaaaaaatttttttcttaatttttagtgcaaaattttagtttcgaaaaaaaatttatgaagaaggtacgatttctagaaaattctgctcgtatcttgttcagaatttttagctctttctttttcataccaaaaaataaaatattttgtgattttttaagagaactcgtcaaaaaactcacttaattcgttttccttgccgcccacttcgacgttcctttgtttttctcgaattttttcgcttactacttgaaacaagacacactcttttgattttacaaaaaaaaattacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatcgatgccgagcgaaaggcttgaaatttaaacgattgttcgcaatagagcgtgtttaccgccatctagcgactgaaccaccgtg',
 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',
 'cacggcccggcgaaagagacgtggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacgaataaacatttttaatttggctgctaagctcatttatctttgttttttctcgttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcacaaaaccgcggcaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagagtttttt',
 'tgcgaaaaactgtttaaagtatcgattttcttcaatatcagcaacatacaatcctttaaaatgattattttttgtaaattcgataaaaattcatttatttttcacaacttctgcccgaaaattaccgaaataaccagcgtttctataactaagaaagtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',
 'ttgttcggtggagtgcgtttgcacccatctagcaactgaaccaccgtg',
 'tttttcgtgtttttttatgtttttttatgttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggacttaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttgtttt',
 'aaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaactcaacacattttcgatcatttttgaacaaaaaaattgttttctgaaaaatttgacgcttaatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacactttttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatataaagaaaaacatgcttgttctgaattagtaagaattgcgactcaagctcgcgtatctcatatttatttgttcgatttcagtggaaaatcgacacatttttgggaaaatttttttttttcgaaatattcgctcctaatttttaatgatttccagatgaca',
 'tctctatattattcattcattttatcaacaaacttctatcgccctaacgtcgatcaaaaaagctcatcagcaactgccgtcgagt',
 'tctcgtcgagtgaaatgcgatagaatttgtctgtgaaaaaccaaatatcgattttccttgaatatcgtgaagaacaaatcttcatacttacgattcttcacaaattcgatgaaaatctgatagttttttcaattttagctctataattccgaaaaaaatcttctgttaattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaagtttacactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaatcctcgtccatgtaccccccgccgggccgtg',
 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',
 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',
 'gttttttcttattttttcttgtttttttttgttactacgtggaataagcataatgttttcagctaatctacgcaaaccgcttcgaaattcgatggattaaaccggacaaaggtaagaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattatggtacaatatttctgaaaacctactcaaatatggaaaaatatatgctgattccggattgaaaagtcttgaacctcaatctcgcgtatttcttattgagtttctcgattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaagaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatagaaaaatatatgctgattctgaattagtaacatttgaaactcaatctcacgtatttcataatttttttttcgattttaatgaaaaatatattcagtgcaattttcgattttttttcaaaa',
 'gctgatattgacgaattgtcgtcaaataaaatacgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt',
 'tgtgggctacggtagtcaagtacgcaaacaccacgagcattttcacaattgcgtacaaaatttttttcaagcttt',
 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',
 'ttttgaaaaaaagttactttttgttcgaaaatgtattgattttcacttattttcactagattcaaaaaaa',
 'ttggacccatgctatactcaacatttttttggaattctgaatcagcattctcttcataaattagacaatttctaaaaaatctggaccaa',
 'ttttccggtgatttgctaaaactataatttctatttcaattattaaccgagaaaaccagaaaaaa',
 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',
 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaata',
 'tattaattgctaaaatttatgtggactacgatagccaagtccgcaaacaccacg',
 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaatacttcccggtttcgttttacatgataatatcgttgaatttaagcaaaaaatgcagcattagtttcatgaaaaaaaaaataagaga',
 'atttttcctcaaaaactagaatttttcattgaaaaatggcttaaaaatcgatttttgttcgaaaaaa',
 'aaggttttcatcgataaagtcacgaatttgtcgaaatgctttggtgagttttatctttttcagaaaaaaaattcgaaaattttcag',
 'cacggtggttcagttgctagatgggtgcaaacgcgct',
 'attgttcggtagagcgcgtttgcactcatctagctgctgtaccaccgtg',
 'agaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcg',
 'ttttggaaaaatttaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgaaaaaaaatatatgaaatacgcgagattgaggttcaaggcttttcaattcggaatcagcctatatttttccatacttcaatgggttttcagaaatattgtatgtgaatttttggaaaaatgtgatttttaatttgaaattgcactgaatttcttgaatttttcaataaaatcgagaaaataaatatgaaatacgcgagattgaggttcaagactttttaattcggaatcagcatatatttttctatatttgagtgggttttcagaaatattgtatgtgaattttttgagaaaa',
 'tttttcgaaaagtgttggaccagatttttttgatcttagaatatgaataaaagcatgctg',
 'caaggcccggcaaaccgtacgtggctgcaaaatctctaccgtagtaaaatttggctgactgaatattcaacgtttgatactcagtgaaaagattcgtac',
 'cacggtggttcagtcgctagatggcggtaaacacgctctattgcgaacaatcgtttaaatttcaagcctttcgctcggcatcgattctctccgttttttgccgattttaaccgtttttccgtattttttttgttatttttttttgcaaaattgaaaaagtgtgtcttgtttcaggtagtaatcgaaaaatttcgaggaaaacgaaggaacgtcaccgtgagcagcaaggaaaacgaattcagtgagttttttgactatttctctctgaaaaaattaaaaaatgtattactttttgattgaaaaagaaagagcaaaaaattctgaagaagatgtaacttaatttattcagaacaaaagttttttttgtctcgaaattacaatttttcgaccaaaaatagaaaatgatcattttctcgaaaatgaatcccaaatttatgtcagttttttatgaaaaaataatcagcgtaaacttctgtactagaaatactaccttttttttggattgaaagtttgtatcgtgctcaaaattttaaaagtaaaattattttacgctgaaaaatgcaaaatcataactttttttgaggagaaacccccaaatgttatcgattttgttatcaaattgaactcagcttaaaattatctacaagatgaagctaaattcattgaga',
 'atattaattaagttttttgtatcaaattgtgtgttttctcaattttatattgcctttttatttcataactttccttttctgttcaaaatcaacttttttttgtgttttaacacttcaattatcaattgttagtttataaatttcataataaactctgattttttattttttttcatcttgaaactattaattctgctgtttttctgctaaaatttgcttcaaaaatctatttgccgcttcattgttttgcggactacggtacgcaagtacgcaaacaccgcaacgacacattgcggaccatttcgctgcgtacgctgcgagatctttctcaaattttacgagagatctagtttctgtgctactgtg',
 'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtaaaatttagctgactgaatattcaacgtttgatactcagcgaaaagtttcatacgacattttgcggacgcggcattttaattgacgacactttcttagttatagaaacgctggttatttcggcaattttcgggcagaaattgtgaaaaataaatgaatttttatcgaatttacaaaaaataataattttaaaggattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca',
 'aaaaaaaactcacttttttcgaattttcctccttccagttggcggttgagtgctgttttgccgcggttttgtgattttctgcaacaaaaaaaatatttttatcgataaaaatgagaaaaaacgagaaaaaacgaagataaatgagcttagcagtcaaattaaaaatgtttattcgttcaaaaatcttaaccataggaggcggtggcctagccggcgcagctctcgcggccacgtctctttcgccgggccgtg',
 'ttttcttcgttttttcttattttttcttgttttttcttgttactacgtggaataagcataatgttttcagctaaactacgcaaatcgcttcgaaattcgatggattaaaccggacaaaggcaaaaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattctggtacaatatttctgaaaacccactgaaatatggatatgctgatttcgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttattttctcgattttattgaaaaatttgagaaattcagtacaatttcgaattaaaaattaaatttttccaaaaattctggtacaatatttctgaaaacccactgaaatatggaaaaatatgctgatttcgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttattttctcgattttattgaaaaattcaagaaattcagtgcaatttcaaattaaaaatcacatttttccacaa',
 'gctgatattgacgaattgtcgtcaaataaaatgcgcgtacgcaatgtacacaatagtgtaaaaaactttatcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt',
 'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcgtcaatatcagc',
 'aaaatgttgaagagaaaccagagaaattgatcgagtagattcttggcaagttttgaaattatatggttttaataagttttgaacatttttaaatacaactaaccatgga',
 'ttgtttggtggagcgcgtttgcaccaatctagcaactgaaccaccgtg',
 'tttttttaaatttttttcttggctgctttactgatgtttttttctcaattttttcttgttttctttgttactaatttaaattaaaaaaactattttcagcttatcacagcaaatcggagcgaaactcgaccgcgataacaggaaaaagtcgaaaagtgagttttttgccaaaatatctcgaaaaactcatattttgttttgaaaacagatgcaaataaaaagaaatacat',
 'atgtattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',
 'tttcgtaatgtttttttcgagtttttgattgttttttctcatttttttttgtttttctttattattagttaaaatataaaaactattttaagctaatcaacgcaaatcgaggcgaaaaccgatcgcagaaagaggaaaagtcgaaaagtgagtttttttgcaaaaatatttca',
 'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactctgcgtacgcgcattttatttgacgacaattcgttaatatcagc',
 'ttcattaaaatcgaaaaaaaaattatgaaatacgtgagattgagtttcaaatgttactaattcagaatcagcatatatttttctatatttgagtgggttttcagaaatattgtaccataatttttggaaaaattaaattttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaactaaatatgaaatacgcgaaattgaggttcaagacttttaattcagaatcagcatatatttttctatatttgagtgggttttcagaaatattgtaccataatttttggaaaaatttaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaactaaatatgaaatacgcgagattgagattcaagacttttaaattcggaatcagcacatatttttccatatttgagtaggttttcagaaatattgtaccatattttttcgagatattttgaataataacttacttttcgacgttttttgcctttgtccggtttaatccatcgaatttcgaagcggtttgcgtagattagctgaaaacattatgcttattcca',
 'gcaaacatactcttttgcgaataagcgatttttttgttttttttttttggtgttttccgttttttgcttgttttcaccgtttcccctctttttttttgtttttttttgtcaaatcgagaaagagtgtgtttttttttcaggtgttaaacaagatttgcgagcaaaacgagggcacaccatcgtaagaagcgaagaaaacgagaaaagtgagttttttgaagattcctctttaaaaaatagggaaatgttttagttttgagccaaaaaagaaagagctgaatttttcaaacaagatacatgc',
 'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtgaaatttggctgactgaatattcaacgtttgatactcagcgaaaagtttcgtacgacattttgcggacgcggcattttaattgacgacactttcttagttatagaaacgctggttatttcggcaatttcgggcagaaattgtgaaaaataaatggatttttatcgaatttacagaaaataattatttgaaagtattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca',
 'aaaaaactcacttttttcggattttccgcctcccagttggcggttgagtgctgttttgtcgcggttttttaattttctgcaacaaaatgtatatttttatcgataaaaatgagaaaaaacgagaaaaaac',
 'ttgttcggtggagcgcgtttgcacccatctagcagctgaaccaccgtg',
 'cacggtggttcacttgctagatgggtgcaaacgcgctccactgaacaa',
 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',
 'gcgaaattctgcattttgtcgtgagatccgcggtgtttgcgtacttctggggctaccgtaacccggaaaa',
 'tcgagttttacattgaaaaaaaatggccaaaaatcggagaaaaatgggcaaaaaacggagagaattgatgacaaatcaaag',
 'tcgagttttacattgaaaaaaaatggccaaaaatcggagaaaaatgggcaaaaaacggagagaattgatgacaaatcaaag',
 'ccttaaaaggaagaaatttggtggaaaaatacaattttcgctctaaaaaattccgtaaattcgagaatttatgaaaaatactttggttttttat',
 'gcaaaattctgcaatatgtcgtcaaattcggtgtttgcgtattttcgacgctaccgtaccccgcggaa',
 'ttttcttcgttttttcttattttttcttgttttttcttgttactacgtggaataagcataatgttttcagctaatctacgcaaaccgcttcgaaattcgatggattaaaccggacaaaggtaaaaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattatggtacaatatttctgaaaacctactcaaatatggaaaaatatatgctgattccggattgaaaagtcttgaacctcaatctcgcgtatttcttattgagtttctcgattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatagaaaaatatatgctgattctgaattagtaacatttgaaactcaatctcacgtatttcataatttttttttcgattttaatgaaaaatctaagaattcagtgcaattttcgattttttttcaaaa',
 'gctgatattgacgaattgtcgtcaaataaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt',
 'actacggtgcgcaagtactcaaacactgcgacgtcagagcgcagac',
 'tttagacgtatttttcttttctctgctcttatgatcgattttcgcagaggtttttgattatccggtaaatattactagttattctaatttttcattaaaaaattacatcgaaaataacgaaaaaacatcgaaaaacgcgaaagatcaacgaaaccaattcatgaattaattcgaatttataattcagtacaaaagcgattcggtcgcgggactagattttgcaacttcctaggccatttccaatttgcagtgc',
 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
 'ggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',
 'ttgttcggtggagctcgtttgcacccatctagcaactgaaccaccgtg',
 'cacggtgcttcagttgctagatgggtgcgaacgcgctccaccgaacaa',
 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',
 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
 'gcatggggcgtggccgaaaattctctactaccgtttaccaatttggctaatttgccaatcaacgttgaaaagttttgtacatcg',
 'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttgactgactgcgtgctcaacgttgagtactcagtttaaagtttcgtacaccgttgcgtactacacagcgcgcattttaattgacgacatttcgcgaaaattaacagaagattttttcggaattatagagctgaaattgaaaaaaaactatcaaattttcatcgaatttgtgaaaaatcgtaagtatgaagatcttttcttcactatattcaaggaaaatcgatatttagcttttcacagacgaatgatgtctcattttact',
 'gcctcattttactcgatggaagtttctgatgagctgtttttatcgatttttgagcgataaaaatgcgatttgttgataaaatggataaattatataaagaaacaacatatattgctctgagattactttttgagaatcaattctttatttttcggtcattttaaattaagcattaaaataaaaatattagaaatcataataaaaaaaacagaaaatcgatatattactatttcttcggaatttcacgacttttttggacgaattttagtctgtaaactttcttcttcgaatttgtgtccacgtggctttcagtcgaagaagattctgcagcactccttcttgcttgcccacaacttgctcgaattttctaaaatttttaacttattgaaattgtcatttcacctttacactctcttcagctaaactattactgcatttcggaagttgataagatactggtggagcaacaagtggatggcttctagtgattggctggcttgtcgagcaagtttgtgtgattgcctgaaataatttttgatttcaattttgagttgatttaaag',
 'gatttaaagcagtgaacctaccatcgggttcggacgagaaagagcattgctcggtagaccacggaatccaattttcgttgaattgcctccaaatgcaatagaagtttgtacgttttgtgagaagtcgggctggaaattttcaaaatttgaaacttttcgtgaaaaataaaaatctcaccacagcatttcgagattttgtcgattgtggaagccttttcttggagctaaaattgattt',
 'tacgatggaaagaccgggaatggacgtgttctgaaatagttgtgtttttaagaatgcataaatttttttctgtaccaaaattaccatagtcatgtcattcatgatgttacgacacatgagctctctcagaacatggatgtaacgccttttcttgtcccggtaattgcaaaatctcctctcaagtgcattgaaaatcgcgtggacagattcaactccttgttctgtgatccttccaatgtttctcacatcttttgccatttgtggtgcatggtagaccaacaagtgcagctttaaaataattgtttcttcgggaaccgctactttcaaatcctccacaaatccgcgaatcgaattttgaagtattaagacgtcggaatcatttaaaaacttgtttcccgaaagtgacataatagttgaaagctttcccattgctgatttcaatccgagcaacattgggcataaatttgggccaaaaatgttgaaagtctcctctacaacagccggcgttagcagcaatttcaaatggtttccgcaaaatgattggaaccaagcctgcttgtccgctccaaacttagcccaacactgttccattttttcaagtgttcctccgggagtaccattcacaattgtgtcgagcaacaatttttccgattgaagtgctttcagttcagcatgcgactccaatttcatctttccggtggctgcttgatacttttcttccgcacttttgattaggttaacagcgttttttagagttgcttttcgtgttttcaggatagggaaacaagtagtgttatccaaagtgacagaatatttccagaggggattgaagatatatttgtcaaaaatacccatgataatgtgcagaagaggaatcaaatagaacatgatcgcaacgtgtggcagaagtggagtacatcctttgcgaacacccaagtcgccattttcacaacaagctttgtaaagatcgattgttcgtgggtggaatgtttcatcaacattcatatccttgattttcatcctctcttcagctccccgtggattctgtgcaaaacatttgaagcagaaattgtgggatgaatgtccttggtgtccaagaatatcagattgaaacttgcaatctccagttgcaatttgcacaatttttgcggttttttgaactcctttgtccaaatatcgaattttcgttagcttgccaagctgctcaagaacgtccggaatgaattttttcagagacgagtaattgtcggatccgtcatatactgcaattaccataacgtgtctcgaagaattcggtcgagatacgtttccgattaccaatgccaactttgtgcttccacctccagcgtcaccaacgactccaatgttgattactcctttcgtgtatccgtcgtccacaaattgatttgaattgcatagaagctctattcgataggctaaaacttctgcaattttcatgcactgcacaatggtaatcacttttcctttattgtcgaacgaagtggaaactttgaaactggagatcattgataactggattggcaaatctcttgcgttctttaccgatggaagcaaatcatagccaatggcattagtcaaatagtttttgattttttccatctgacttagagataatccgcattttgataaaaagtcaacggcctcaaagtttgaaagcttgtttttgtagctttgattctcttctgaattcaggaattttgtaaattttcgaataaattgtccgacgtcatcctcgaggcagatttcgtgttgaagcaagtgaagagctttgcgaaatcgatttttgatacaacttttgtttcttagattcgaaaatttaactttaaaagctgattttttaaggttttcaacttcttcggcgtgtctttgtagactcagaaccatagctttgccacttttcttcacatctgcacagcttctcaccaatcgaccttctataccactgacgatcgttcgtatattgcatacttccatttgcagcgaagaattagatgctcttatagtgatattttcatggcggactatttgtatttcttccgaaaacaccgcaaacgcatcattctgcttttgtatttcttctgatatttcatttttttcatttttcagtcgttcgatcgttagtcggagcattttgatctgcggaatttgctcaacattggagattattcgaaccctcggtgtactgaacgagtttcgtgaaggtgtcggtggaaatacgggattggagaatctctgcgaaatcatataatataatattagttttgaaatattgaaaaaaattacattgtgagaaaaagtcggaatatcgtcactaaaatccatttccacgtctctcgtcagaattccttcatccatattgaaacaatttgacgacctgcatgtagttgcggagctactggaagcaatgtcgggatggtgggagtttcgatcttctgaactgatttcctgattagcctgtggcgacgaactgcacgtctgaaaatcacgtttttgaagttagaacaaactactccaacttaattaaagtagacaaaattgagctgaacgaacctccactttcgaattgttcagttcttcctcttcagtttgatcttttgaaactccattagcactgttccttgctctctgggcatttgctaaaagaaggcctgcacaagatttttcttttcttttttgtttgaagtatacttttgtcatctggaaatattgcatgaatattataagggaaacaatttttaaatatcgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaattcaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcggctgagcattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaagatttcaaaaattctgagaccaatttttgttgagaaaaataatttttcgttcgaattatcgatttttcacgaaatgccaaaaacagtaaacttgggcccatgctaaaagcctgaatctttcaaattaaaaaccaacatgattttttctatattctaagacgtttaaaaaaaatctggaccaacagttcttgaggaaagtaattttttatacaaaaatgtgctgatttttcactaaattcaaaaaaatagtcaagttgggcccatgctatacacctaaatcattaaaattcagaaccgccatgtattttttcataccataggctctttaaaaaaaatctggaccaacagtttttgagagatgtcaaaaaaacaactcacttttcgacgtttttcgtgtttccccggatgatgcggtcgatttttgctgcgatttgtggtctttcgctgaaaatattatttttatttcaatttttaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaacacgaaaaaaacgtcgaaaaactcccgcaacctcatgaaaaaaaataaagcactgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgtcgtg',
 'aaaaaaactcacttttcgactttttcctgtttctgcgatcgggttttgcgtcgatttgtggtaattagctgaaaatataaactatagtttttatattttaactattaataaagaaaacaagagaaaagtgagaaaaaacaatcaaaaactcgaaaaa',
 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',
 'TGCGAAAAACTGTTTAaagtatcgattttcttcaatatcagcaacatacaattctttaaaatgattattttttgtaaattcgataaaaattaatttatttttcacaatttctgcccgaaaattgccgaaatgaccagcgtttctagaactaaaacaagtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',
 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',
 'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaattttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgttaattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacataaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaatcttctaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctctcgacagaaacggagatgcgcagatcactcctgagagaagacgagtgtctccatttgtggattttgatgataatctggatgtggagtcgaaacgcaatatcgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaatgatttttccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcattacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgccaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttcacaaagccgaaacgttgtcacgcttcccaagtataccatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttttctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaatcgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaagcttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggaaatcacacgaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggagctgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttacctagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg',
 'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',
 'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaattttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgttaattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacataaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaatcttctaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctctcgacagaaacggagatgcgcagatcactcctgagagaagacgagtgtctccatttgtggattttgatgataatctggatgtggagtcgaaacgcaatatcgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaatgatttttccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcattacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgccaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttcacaaagccgaaacgttgtcacgcttcccaagtataccatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttttctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaatcgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaagcttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggaaatcacacgaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggagctgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttacctagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg',
 'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtgtaaactttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgacatttagcaaaaattgaacagaagatttttcggaattatgaagctcaattttcacaaaaataatgagttttttgtagaatttatgaaaaaacgtgaatatatagattttttgttcatgatattcaagaaaaatcgatttttagttcttcacagagtaatcctatcgcatttcacttgctcatgatgtttttgctcgactttaggacgataaaaatgcgaattgttgataaaatgaatgaacaatataaagaa',
 'ggggctgctggaaccaatgtcggcatgacgagagttccggtcttctggatccatttcctgcgtgggctgtggcgacgagctgcacgtctgaaaatcaagtttttgtaatt',
 'tttgggcgcatgatatggagctgaatcattcgattttagaatcagcaagcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggcccaacagttttcgaaaaaatttaatttttgttcagaaatgtgaatattcacgaaatcgaaaaaaataattgcaaaatccgtcagctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaaggtttcaaaaattctgagaccaatttttattgagaaaaataatttttcgctcgaattattgaattttcactaaatgcaaaaaacagtaaacttgggcccgtgctacaagcctgaatctttcaaattaaaaaccagcatgattttttcaatattctaggacgtttaaaaaaaatctggaccaacagtttttgaggaacgtaattttttatacaaaaatgtactgatttttcactaaactcaaaaaaatagtcaagttgggcccatgctatacacctaaattattaaaattcagaaccgccatgtattttttcatactataggctctttaaaaaaaatctggaccaacagtttttgagatatttagaaaaacaactcacttttcgacgtttttcgccttttcgcggctcacccggtcgatttttgcggcgatttgtgttctttcgctgaaaatattatttttatttcaattattaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaaacgcgaaaaaacatcgaaaaaccaccgaaacctcatgaaaaaaataaagcattgcagccgcgggattagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg',
 'aactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagcgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattaatagtttcaaggtgaaaaaaaaaaaaatagattttattattcaatttattagctaacaattgagaattgaattatcaaaacacagaaaaattaattttgaacagtaaaacaaagaaataatcgaaacggtagattgaaaattgaaaaatacatacaaaacgattcaaaaaacgaattaatatgtggaatcggcctccttcttcgctttacggatgcaagttgagatttttcttggaaaaactttcgaaaataaggaaatcagtgggaacttcttatttcctcgactcctgcaggatcctggtgaactttttctgttaaatccatcgaaattgtgcggagtagctcattgacaaggtttcaatctgaaattttgtgaaattttatatttttgaataattttaatcacagcaaacctagggaaatcccgaattcgagcctttcgataaaatccagagatgtcacatcgccacttttccaaaatgaaggaaaccaggtggagcttctcagctttttggcttcctggtcgaaatcttgatgaaaaaaccatcgaaatttacttgagcagcttcttggcaaggtttttgtctgaaattttaggattttaatgatttttaacatttttaaacacaactaaccataaacaatccggattttttcggttttgactgaatccttggatttatgtagaaaacatgcccagaaatcaaggaacgaggtggaacatctcatttttttgaaattctggtgtaattcttgatgaaaaatccatcgacattacgttgaacgtcttcttggcaaggtgttttcttctgaaaattcatga',
 'ctgtaacatctaagcagtttttgaaaagttttcgaaaaaaaaataaatttcagtttttgagatttaattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtaccgtttctgaaaatgtttgcttttatcgtgtagagaatttttagctggttctaatccggcaagaaaaacagaacattttgtgatcgttttcgaaaaaaaaatttttttctttaattttaagtgcaaaattttagtttcgataatttttctgtgaagaatgtacgatttctagaaaattctgcctgtatcttgcttaaaatgaacagttctttctttttaaagcaaaaaactaacacattttgtgattttttttggcaaaaaaaattattttataatttcttatttcaaaaaatgttttttcgaaaaaattttaatgaaaaattaatatttctttatgaacttagttccgtcttatacaaaatttaatgctgattataaaataactataaaacgtgaaga',
 'cacggcccggcgaaagagacttggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacgaataaacatttttaatttggctgctaagctcatttattttcgttttttctcgtttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcaaaaaaccgcgacaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagtgagtttttt',
 'tgcgaaaaactgtttaaagtatcgattttcttcaatatcagcaacatacaatcctttaaaatgattattttttgtaaattcgataaaaattaatttatttttcacaatttctgcccgaaaattgccgaaataaccagcgtttctataactaaaacaagtgtcgtcaattaaaatgccgcatccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',
 'ttgttcagtggagcgtatttgcataaatctagcaactgaaccaccatg',
 'attgaccaaaatcgagaaacattgcgaaaaactgtttaaagtgtcgattttcttcaatatcagcaacatacaatactttcaaatgattattttctgtaaattcgataaaaatccatttatttttcacaatttctgcccgaaaattgccgaaataaccagcgtttctataactaagaaagcgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagaatttacagccacgtacggttcgccgggccgtg',
 'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtaaaatttggctgactgaatattcaacgtttgatactcagcgaaaagtttcgtacgacattttgcggacgcggcattttaattgacgacacttgttttagttatagaaacgctggttatttcggcaattttcgggcagaaattgtgacaaataaatgaatttttatcgaatttacaaaaaataatcattttaaaggattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca',
 'aaattgtagtcagtatcactgcagatgctggagcaggaatcacaaagttttgtctgattatcgagaattgt',
 'ttttcgagatattttggcaaaaacctcacttttcgtcgttttcctcctactgcgatcgattttcgccccgatgattagctgaaaataattttatatgttagttagtaacaaagaaaataagaaaaattgagaaaaaacaatcaaaaactcgagaaaa',
 'cgaattgtcgtcaaattaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttcttcgcgttgaatattcagtcgcgactagtcagccaaatgggactactgtagaaatttcctcggccacgttccaaacgccgctccgtg',
 'ttgttcgatggagcgcgtttgaacccatctagcaactgaaccaccgtg',
 'cacggtggttcagttgctagttgggtgcaaacgcgctccaccgaacaa',
 'cacggtggttcagttgctagttgggtgcaaacgcgctccaccgaacaa',
 'cacggtagcacagaaactagatctctcgtaaaatttgagaaagatctcgcagcgtacgcagcgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacaatgaagcggcaaatagatttttgaagcaaattttaacagaaaaacagcagaattaatagtttcaagatgaaaaaaaaaataaaaaatcagagtttattatgaaatttataaactaacaattgataattgaagtgttaaaacacaaaaaaagttgattttgaacagaaaaggaaagttatgaaataaaaaggcaatataaaattgagaaaacacacaatttgatacaaaaaacttaattaatat',
 'tgcataaattcaatttcatcttgcccagaattttaagctgattcgaattcatatcaaaatctgaagattcttcaattttttttatcgaaaaatttttttttctaaattttgtgaaaaatttt',
 'tttagcttcatcttgtagataactttaagctgagttcaatttgataacaaaatcgataacatttggggatttctccacataaaaaattacgatttttaattttttagtgaaaaatatttttactttcgaaattttgagcatgacacaaactttcaatcgaaaaaaaagcgtacatatctggtacagaattttatgtagattattttttcataaaaaactgatccaattttgggattcgttttcgagatagtgatcattttctatttttggtctaaaatttttgatttcgaggcaaaaaaaaattattctgaataaattaagttacatcttattcagaatttttagcttaatttctagctcgaattaaataaaaaatctgaagattcttcaatttttttttaccaaaaacagttttttttctaaactttgtgaaaaaatttttttaattcaaatggatctgacaaatttttttcaatctcaattattttagctttctcttgtacagaattttaagctgttttcaattcaataacaaaatcgataacatttgggggtttctcctcaaaaaaagttatgattttgcatttttcagcgtaaaataattttacttttaaaattttgagcacgatacaaactttcaatccaaaaaaaggtagtatttctagtacagaattttacgctgattattttttcataaaaaacttacataaatttgggattcattttcgagaaaatgatgattttctatttttggccgaaaaattgtaatttcgagacaaaaaaaaaccttttgttctgaataaattaagttatatcttcttcagaattttttgctctttctttttcaatcgaaaagtaatacattttttaattttttcaaagagaaatagtcaaaaaactcactgaattcgttttccttgttgctcacggtgacgttccttcgttttcctcgaaatgtttcgattactacctaaaacaagacacactttttcaattttgcaaaaaaaaataacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatccatgccgagcgaaaggcttgaaatttaaacgattgttcgcaatagagcgtgtttgccgccatctagcgactgaaccaccgtg',
 'tcagttttcacaaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcattcggtgtataaattgtaaac',
 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
 'tttgctggttacggtaaaaaagtacgcaaacaccaaacgtgaagtgcagacattgcgttttaccgtacttccgtgttcttttt',
 'cacggcacggcgaacgggacatggcctagaaagttgcgaaaactagtcccgcggctgcaatgctttatttttttcatttggttgcggtgggcttttcgatgttttttcgtgttttttgaagttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttttttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaa',
 'tagaaagatttgagactcaagctcgcgtatttcagcttatttttttcatttggttgcggtgggcttttcgatgttttttcgtgttttttgaagttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttttttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaataagaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaataagaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacagttgagacccaagctcgcgtatttcatattttttttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttttttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagactcaagctcgcgtatttcattttttttttcgattttagcacaaaatcaacactcttttgaaaaaaatattcgaaatattcgctcctaatttttaatgatttccagatgacaagctggttcacaccacattctactgaggaaatcgcttcaaataaccgaaatcatcccgacattggctccagcagctccgcaactacctacagctcgtcaaattgtttgaatatggatgaaggaattctgacgagagacgttgaaatggctttcagtgatgaaattccgactttttctaaaaccgtaatttttttaaaaatttcaaaaataacattatatgattttgcagaggacctccaattccgtgtttcctcctgcaccttatcgaaattcgttcagt',
 'ttctttacattattcattcattttatcaacaattcgcacttctatcgctctaaagtcgatcaaaaaagcttatcagcaactgccgtcgagtgaaatgcgatacaatttgtctgtgaaaaaacaaatatcgattttccttgaatatcgtgaagaacaaatcttcatacttacgattattcacaaattcgatgaaaatctgatagttttttccaatttcagctctataattccaaaaaaaaaccttctgttaattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtccatgtaccccccgccgggccgtg',
 'gctgatattgacgaattgtcgtcgaataaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcggccacgt',
 'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaatttttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaattattttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgtttattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacacaaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaattttttaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctcttgacagaaacggagatgcgccgatcactcctgagagaagacgagtgtctccatttgtggattttggtgataatctggatgtggagtcgaaacgcaatattgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaataattttcccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcatcacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgtcaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttctcaaagccgaaacgttgtcacgcttcccaagtatatcatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttgtctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaattgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaacttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggagctgattttgtctgagaaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggaatttgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttaccaagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg',
 'atgtcgtcaactaaaatgccgcggtacacgtccgcagtcggtgaacaaaacttttcgttgaatactcagtcagccaaatttaactactgtagaaatttctccccacacgtcgcgattgccgctccgtg',
 'ttgtttaaaataagtttccttttttttgaatacgcaaagaacttgatttttccaaaaaaaaaaattgttttcgaatttttatgataaaaaaaaatttttt',
 'cacggcacggcgaacgggacatggcctagaaagttgcgaaaactagtcccgcggctgcaatgctttattttttcatttggttgcggtgggtttttcgatgttttttcgtgtttttttaatgttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagcaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttgtttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttctgaattagaaagatttgagactcaagctcgcgtatttcaaatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatatgaagaaaattttgattaaccggaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcattttcgaacaaaaaattcattttctcaaaaattcaacgcttattttttttgaaactgcactcaaatatgaagaaaaacatgcttgttctgaattagtaagaattgcgactcaagctcgcgtatctcatatttatttgttcgatttcagtggaaaatcaacacatttttgggaaaaattttttttcgaaa',
 'ttctaatttttaatgattttcagatgacacgccggttcacaccacattctactgaggagatcgcttcaaataaccgaaaccatcccgacattggctcca',
 'ctctatattattcattcattttatcaacaattcgcacttctatcgccctaacgtcgatcaaaaaagctcatcagcaactgccgtcgagtgaaatgcgatagaatttgtctgtgaataaccaaatatcgattttccttgtatatcgtgaagaacaaatcttcatatttacgattcttcacaaattcgatgaaaatctgatagttttttcaatttcagctctataattccaaaaaaaatcttttgtcaatttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacaaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtctatgtaccccccgccgggccgtg',
 'gctgctgcagtgttttggcgactacggtacgctgttacgcaaaccgcgcaatgacaacatt',
 'attgaacagggcatgaaaggattcaatgccctgctccgatgatcttccc',
 'cacggcccggcgaaagagacgtggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacggataaaaatttttaatttggctgctaagctcatttatcttcgttttttctcgttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcaaaaaaccacgacaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagtgagtttttt',
 'tgcgaaaaactgtttaaagtatcgattttcttagtaaatatcagcatcataaaattatttaaaattattattttctgtaaattcgataaaaatccatttatttttcacaatttctgcccgaaaattaataaccagcgtttctataactaagaaggtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',
 'gttcgttcagctcaattttgtctactttaattaagttggagtagtttgttctaacttcaaaaacgtgattttcagacgtgcagctcgtcgccacaggctaatcaggaaatcagttcagaagatcgaaactcccaccatcccgacattgcttccagtagctccgcaactacatgcaggtcgtcaaattgtttcaatatggatgaaggaattctgacgagagacgtggaaatggattttagtgacgaaattccgactttttctcacaatgtaattttttttcaatatttcaaaactaatattatatgattttgcagagattctccaatcccgtgtttccaccgacaccttcacgaaactcgttcagtacaccgagggttcgaataatctccaatgttgagcaaattccgcagatcaaaatgctccgactaacgatcgaacgactgaaaaatgaaaaaatgaaatatcagaagaaatacaaaagcggaatggtgagtttgcggtgttttcggaagaaatactaatagtccgccaagaaaatatcactgtaagagcatcgaattcttcgctgcaaatggaagtatgcaatatgcgaacgatcgtcagtggtatagaaggtcgattggtgagaagctgggcagatgtgaagaaaagtggcaaagctatggttctgagtctacaaagacacgccgaagaaattgaaaacctcaaaaaatcagcttttaaagttaaattttcgaatctaagaaacaaaagttgtatcaaaaatcgatttcgcaaagctcttcacttgcttcaacacgaaatctgcctcgaggatgacgtcggacaatttattcgaaaattcacaaaattcctgaattcagaagagaatcaaagctacaaaaacaagctttcaaactttgaggccgttgactttttatcaaaatgcggattatctctaagtcagatggaaaaaatcaaaaactatttgactaattccattggctatgatttgcttccattggtaaagaacacaagagatttgtcaatccagttatcaatgatctccagtttcaaagtttccacttcgttcgacaataaaggaaaagtgattaccattgtgcagtgcatgaaaattgcagaagttttagcctatcgaatagagcttctatgcaattcaaatcaatttgtggacgatggctacacgaaaggagtaatcaagattggagtcgttggtgacgctggaggtggaagcacaaagttggcattggtaatcggaatgtatctcgaccgaattcttcgagacacgttatggtaattgcagtatatgacggatccgacaattactcgtgtctgaaaaaattcattcccgacgttcttgagcagcttggcaagctgacaaaaattcgatatttggacaaaggagttcaaaaaaccgcaaaaattgtgcaaattgcaactggagattgcaagtttcaatttgatattcttggacaccaaggacattcatcccacaatttctgcttcaagtgttttgcccagaatccacggggagctgaagagaggatgaaaatcaaggatatgaatgttgatgaaacattccacccacgaacaatcgatctttacaaagcttgttgtactccacttctgccacacgttgcgatcatgttctatttgattcctcttctgcacattatcatgggtatttttgacaaatatatcttcaatcccctctggaaatattctgtcactttggataacactacttgtttccctatcctgaaaacgcgaaaagcaactctaaaaaacgctgttaacctaatcaaaagtgcggaagaaaagtatcaagcagccaccggaaagatgaaattggagtcgcatgctgaactgaaagcacttcaatcggaaaaattgttgctcgacacaattgtgaatggtactcccggaggaacacttgaaaaaatggaacagtgttgggctaagtttggagcggacaagcaggcttggttccaatcattttgcggaaaccatttgaaattgctgctaacgccggctattgtagaggagactttcaacatttttggcccaaatttatgcccaatgttgctcggattgaaatcagcaatgggaaagctttcaactattatgtcactttcgggaaacaagtttttaaatgattccgacgtcttaatacttcaaaattcgattcgcggatttgtggaggatttgaaagtagcggttcccgaagaaacaattattttaaagctgcacttggtggtctaccatgcaccacaaatggcaaaagatgtgagaaacattggaaggatcacagaaccaggagttgaatctgtccacgcgattttcaatgcacttgagaggagattttgcaattaccgggacaagaaaaggcgttacatccatgttctgagagagctcatgtgtcgtaacatcatgaatgacatgactatggtaatttttgtacagaaaaaaatttctgcattcttaaaaacacaactatttcagaacacgtccattcccggtctttccatcgtaccaaaagatgcaacagttctgccaatttcttcgcaaaatgatcttcgggacccaaaagtagttattcgggacgtcacggctgcccaaaaaagaaaaaatttagcgaaaaaaaaaatcaattttagctccaagaaaaggcttccacaatcgacaaaatctcgaaatgctgtggtgagatttttatttttcacgaaaagtttcaaattttgaacattttcagcccgacttctcacaaaacgtacaaacttctattgcatttggaggcaattcaacgaaaattggattccgtggtctaccgagcaatgctctttctcgtccgaacccgatggtaggttcactgctttaaatcaactcaaaattgaaatcaaaaattatttcaggcaatcacacaaacttgctcgacaagccagccaatcactagaagccatccacttgttgctccaccagtatcttatcaacttccgaaatgcagtaatagtttagctgaagtgagtgtaaaggtgaaatgacaatttcaataagttaaaaattttagaaaattcgagcaagttgtgggcaagcaagaaggagtgctgcagaatcttcttcgactgaaagccacgtggacacaaattcgaagaagaaagtttacagactaaaattcgtccaaaaaagtcgtgaaattccgaagaaatagtaatatatcgattttctgttttttttattatgatttctaatatttttattttaatgcttaatttaaaatgaccgaaaataaagaattgattctcaaaaagtaatctcagagcaatatatgttgtttctttatataatttatccattttatcaacaaatcgcatttttatcgctcaaaaatcgataaaaacagctcatcagaaactttcatcgagtaaaatgagacatcattcgtctgtgaaaagctaaatatcgattttccttgaatatagtgaagaaaagatcttcatacttacgatttttcacaaattcgatgaaaatttgatagttttttttcaatttcagctctataattccgaaaaaaatcttctgataattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtccatgtaccccccgccgggccgtg',
 'ttgttcggtggatcgcgtttgcacccatctagcaactgaaccacagtg',
 'GAAGCCAATGTCGGGATGGTTTCGGTTATTTGAAGCGATTTCCTTAGTAGAATGTGGTGTGAACCGGCGtgccatctggaaatc',
 'tgccatctggaaatccttaaaaatttggtgcgaatatttcgaaaaaaagttttccaaaaatgtgttgattttccactaaaatcgaaaaaataaatatgaaatacgcgagcttgagtctcaattcttactaattcagaacaagcatttttttctccatattcgagtgcagtttcaaaaaaattaaacgttgaatttttgagaaaataatttttttgttcgaaaatgtgttgattttccactaacataaaatcgaaaaagt',
 'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaataa',
 'gtcttaatacttcaaaattcgattcgcggatttgtggaggatttgaaagtagctgttcccgaagaaacaattattttaaagctgcacttgttgatctaccatgcaccacaaatgacaaaagatctgagaaacattggaaagatcacagaacaaggagttgaatctgtccacgcgattttcaatgcacttgagaggagattttgcaattaccgggacaagaaaagacgttacatccatgttctgagagagctcatgtgtcgtaacatcatgaatggtaattttggtacagaaaaacatttctgcattcttaaaaacacaactatttcagaacacgtccattcccggtctttccatcgtaccaaaatatgcaacagttctgccaatttcttcgcaaaatgatcttcgggacccaaaagcagttatacgggacgtcacggctgcccaaaaaagaaaaaatttagcgaaaaaaaaaaatcaattttcgctccaagaaaaggcttccacaatcgacaaaatctcgaaatgctgtggtgagatttttatttttcacgaaaagttttaaattttgaaaattttcagcccgacttctcacaaaacgtacaaacttctatttcatttggaggcaattcgacgaaaattggattccgtggtctaccgagcaatgctctttctcgtccgaacccgatggtacgttcactgctttaaatccactcaaaattgaaatcaaaaattatttcagccaatcacacaaacttgctcaacaagccagcaaatcactagaagccatccacttgttgctccaccagtatcttatcaacttccgaaatgcagtaatagtttagctgaagtgagtgtaaacgcgaaatgacaatttcaataagttaaaaattttagaaaattcgagcgagttgtgggcaagcaagaaggagtgctgcagaatcttcttcgactcaaagccacgtggacacaaattcgaagaagaaagtttacagactaaaattcgttcaaaaaagtcgtgaaattccgaagaaatagtaatatatcgattttctgtttaaatatttatgatttctaatatttttattttaatgcttcatttaaaatgaccgaaaaataaagaatagattttcaaaaagtaatctcagagcaatatatgttgtttctttatataatttatccattttatcaacaaatcgcatttttatccctcaaaaatcgataaaaacagctca',
 'ttgttcggtggagcgcgtttgcacccatttagcaactgaaccaccgtg',
 'actacggtgtgcaagtacgcaaacaccgcggcggcaatttgc',
 'ttcttcaaaaaaacttcttcgaaattcaaattttgcaccaaaaa',
 'ttgttcggtggagcgcgtttgcacctatttaacaactgaaccaccgtg',
 'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',
 'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
 'aaattatacacgtttgttcggtggagcgagtttgcttccatctagcaactgaaccaccgtg',
 'gtgtgcaacttgccgccgcggtgtttgcgtacttgcacaccgtagt']

How are repeats related?

Look at the repeat family/class names for the first several repeats in the roundworm database:


In [17]:
from operator import attrgetter

'  '.join(map(attrgetter('rep_cl'), reps[:60]))


Out[17]:
'Simple_repeat  Simple_repeat  Satellite  Simple_repeat  DNA/MULE-MuDR  DNA  DNA  DNA  Simple_repeat  Unknown  Unknown  DNA/PiggyBac?  DNA  DNA  DNA  DNA  DNA  DNA  DNA  DNA/TcMar-Tc1?  DNA/TcMar-Tc1?  DNA  DNA  Simple_repeat  DNA/hAT  DNA/hAT  DNA/MULE-MuDR  DNA/hAT  DNA/TcMar-Tc1?  DNA/TcMar-Tc1?  DNA/TcMar-Tc1?  DNA/hAT  DNA/MULE-MuDR  Simple_repeat  DNA/hAT  Unknown  Unknown  Unknown  DNA  DNA  DNA/CMC-Chapaev  DNA/CMC-Chapaev  DNA/CMC-Chapaev  RC/Helitron  Simple_repeat  DNA  DNA/TcMar-Tc1?  DNA/TcMar-Tc1?  DNA/TcMar-Tc1?  DNA/TcMar-Tc1?  DNA/TcMar-Pogo  Simple_repeat  DNA  Simple_repeat  DNA  DNA  DNA/TcMar-Tc1  DNA/TcMar-Tc1  DNA/hAT  Simple_repeat'

You'll notice a few things. (1) The family names seem to have some hierarchical relationships; e.g. DNA/TcMar-Tc1 seems to be more specific than DNA, (2) some of them end in a question mark, (3) some of them are Unknown. I don't really know what these mean or what to do as a result -- you'll have to navigate that issue. Seems like you can often look up the family names on RepeatMasker's site and find more detailed info (e.g., here are the details for DNA/TcMar-Tc1).

Alternatives to RepeatMasker

Dfam is an alternative. Note that Dfam ultimately relies on Repbase for its "seed alignments." Also, the only the human genome has a pre-built Dfam database, as far as I can tell.