Repetitive DNA elements ("repeats") are DNA sequences prevalent in genomes, especially of higher eukaryotes. Repeats make up about 50% of the human genome and over 80% of the maize genome. Repeats can be categorized as interspersed, where similar DNA sequences are spread throughout the genome, or tandem, where similar sequences are adjacent (see Treangen and Salzberg). Some interspersed repeats are long segmental duplications, but most are relatively short transposons and retrotransposons. Though repeats are sometimes referred to as “junk,” they are involved in processes of current scientific interest, including genome expansion, speciation, and epigenetic regulation (see Fedoroff). Some are still actively expressed and duplicated, including in the human genome (see Witherspoon et al, Tyekucheva et al).
RepeatMasker is both a tool for identifying repeats in a genome sequence, and a database of repeats that have been found. The database covers some well known model species, like human, chimpanzee, gorilla, rhesus, rat, mouse, horse, cow, cat, dog, chicken, zebrafish, bee, fruitfly and roundworm. People often use RepeatMasker to remove ("mask out") repetitive sequences from the genome so that they can be ignored (or otherwise treated specially) in later analyses, though that's not our goal here.
It's intructive to click on some of the species listed in the database and examine the associated bar and pie charts describing their repeat content. For example, note the differences between the bar charts for human and mouse, especially for SINE/Alu and LINE/L1.
Let's obtain and parse a RepeatMasker database. We'll start with roundworm because it's relatively small (only about 2.5 megabytes compressed).
In [1]:
import urllib.request
rm_site = 'http://www.repeatmasker.org'
fn = 'ce10.fa.out.gz'
url = '%s/genomes/ce10/RepeatMasker-rm405-db20140131/%s' % (rm_site, fn)
urllib.request.urlretrieve(url, fn)
Out[1]:
('ce10.fa.out.gz', <http.client.HTTPMessage at 0x7ff3accac278>)
In [2]:
import gzip
import itertools
fh = gzip.open(fn, 'rt')
for ln in itertools.islice(fh, 10):
print(ln, end='')
SW perc perc perc query position in query matching repeat position in repeat
score div. del. ins. sequence begin end (left) repeat class/family begin end (left) ID
508 0.0 0.0 0.0 chrI 1 432 (15071991) + (GCCTAA)n Simple_repeat 1 432 (0) 1
1226 10.0 0.0 0.0 chrI 566 595 (15071828) + (GCCTAA)n Simple_repeat 1 41 (240) 2
344 22.2 0.0 0.0 chrI 596 676 (15071747) C RCS5 Satellite (41) 1387 1307 3
1226 10.0 0.0 0.0 chrI 677 846 (15071577) + (GCCTAA)n Simple_repeat 42 281 (0) 2
432 21.9 2.4 0.0 chrI 1622 1744 (15070679) + LONGPAL1 DNA/MULE-MuDR 136 261 (2330) 4
8509 0.6 0.0 0.1 chrI 2052 3026 (15069397) + PALTTTAAA3 DNA 1 974 (529) 5
4521 1.1 0.2 0.2 chrI 3124 3652 (15068771) + PALTTTAAA3 DNA 974 1502 (1) 6
Above are the first several lines of the .out.gz
file for the roundworm (C. elegans). The columns have headers, which are somewhat helpful. More detail is available in the RepeatMasker documentation under "How to read the results". (Note that in addition to the 14 fields descrived in the documentation, there's also a 15th ID
field.)
Here's an extremely simple class that parses a line from these files and stores the individual values in its fields:
In [3]:
class Repeat(object):
def __init__(self, ln):
# parse fields
(self.swsc, self.pctdiv, self.pctdel, self.pctins, self.refid,
self.ref_i, self.ref_f, self.ref_remain, self.orient, self.rep_nm,
self.rep_cl, self.rep_prior, self.rep_i, self.rep_f, self.unk) = ln.split()
# int-ize the reference coordinates
self.ref_i, self.ref_f = int(self.ref_i), int(self.ref_f)
We can parse a file into a list of Repeat objects:
In [4]:
def parse_repeat_masker_db(fn):
reps = []
with gzip.open(fn) if fn.endswith('.gz') else open(fn) as fh:
fh.readline() # skip header
fh.readline() # skip header
fh.readline() # skip header
while True:
ln = fh.readline()
if len(ln) == 0:
break
reps.append(Repeat(ln.decode('UTF8')))
return reps
In [5]:
reps = parse_repeat_masker_db('ce10.fa.out.gz')
Now let's obtain the genome for the roundworm in FASTA format. For more information on FASTA, see the FASTA notebook. As seen above, the name of the genome assembly used by RepeatMasker is ce10
. We can get it from the UCSC server. It's around 30 MB.
In [6]:
ucsc_site = 'http://hgdownload.cse.ucsc.edu/goldenPath'
fn = 'chromFa.tar.gz'
urllib.request.urlretrieve("%s/ce10/bigZips/%s" % (ucsc_site, fn), fn)
Out[6]:
('chromFa.tar.gz', <http.client.HTTPMessage at 0x7ff38f4ac518>)
In [7]:
!tar zxvf chromFa.tar.gz
chrI.fa
chrII.fa
chrIII.fa
chrIV.fa
chrM.fa
chrV.fa
chrX.fa
Let's load chromosome I into a string so that we can see the sequences of the repeats.
In [8]:
from collections import defaultdict
def parse_fasta(fns):
ret = defaultdict(list)
for fn in fns:
with open(fn, 'rt') as fh:
for ln in fh:
if ln[0] == '>':
name = ln[1:].rstrip()
else:
ret[name].append(ln.rstrip())
for k, v in ret.items():
ret[k] = ''.join(v)
return ret
In [9]:
genome = parse_fasta(['chrI.fa', 'chrII.fa', 'chrIII.fa', 'chrIV.fa', 'chrM.fa', 'chrV.fa', 'chrX.fa'])
In [10]:
genome['chrI'][:1000] # printing just the first 1K nucleotides
Out[10]:
'gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaaAAAATTGAGATAAGAAAACATTTTACTTTTTCAAAATTGTTTTCATGCTAAATTCAAAACGTTTTTTTTTTAGTGAAGCTTCTAGATATTTGGCGGGTACCTCTAATTTTGCCTGCCTGCCAACCTATATGCTCCTGTGTTtaggcctaatactaagcctaagcctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagcctaaAAGAATATGGTAGCTACAGAAACGGTAGTACACTCTTCTGAAAATACAAAAAATTTGCAATTTTTATAGCTAGGGCACTTTTTGTCTGCCCAAATATAGGCAACCAAAAATAATTGCCAAGTTTTTAATGATTTGTTGCATATTGAAAAAAACA'
Note the combination of lowercase and uppercase. Actually, that relates to our discussion here. The lowercase stretches are repeats! The UCSC genome sequences use the lowercase/uppercase distinction to make it clear where the repeats are -- and they know this because they ran RepeatMasker on the genome beforehand. In this case, the two repeats you can see are both simple hexamer repeats. Also, note that their position in the genome corresponds to the first two rows of the RepeatMasker database that we printed above.
We write a function that, given a Repeat and given a dictionary containing the sequences of all the chromosomes in the genome, outputs each repeat string.
In [11]:
def extract_repeat(rep, genome):
assert rep.refid in genome
return genome[rep.refid][rep.ref_i-1:rep.ref_f]
In [12]:
extract_repeat(reps[0], genome)
Out[12]:
'gcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaagcctaa'
In [13]:
extract_repeat(reps[1], genome)
Out[13]:
'CCTGTGTTtaggcctaatactaagcctaag'
In [14]:
extract_repeat(reps[2], genome)
Out[14]:
'cctaagcctaatactaagcctaagcctaagactaagcctaatactaagcctaagcctaagactaagcctaagactaagcct'
Let's specifically try to extract a repeat from the DNA/CMC-Chapaev family.
In [15]:
chapaevs = filter(lambda x: 'DNA/CMC-Chapaev' == x.rep_cl, reps)
In [16]:
[extract_repeat(chapaev, genome) for chapaev in chapaevs]
Out[16]:
['cacggcccggcggggggtacatggatgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtgtaaactttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgacatttagcaaaaattgaacataagatttttcggaattatgaagctcaattttcacaaaaataatgagttttttgtagaatttatgaaaaaacgtgaatatatagattttttgttcatgatattcaagaaaaagcgatttttagttcttcacagaggaatcctctcgcatttcacttgctcatgatgttttttgctccactttaggacgataaaaatgcgaattgttgataaaatgaatgaataatataaaaa',
'ggggctgctgaaaccaatgtcggcatgatgagagttccggtcttctgaatccatttcctgcgtgggctgtggcgacgagctgcacgtctgaaaatcaagtttttgtaatt',
'tttgggcgcatgatatggagctgaatcattcgattttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaatttaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcagctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaagtttcaaaaattctgagaccaatttttattgagaaaaataatttttcgctcgaattattgaattttcactaaatgcaaaaaacagtaaacttgggcccatgctacaagcctgaatctttcaaattaagaaccagcatgattttttcaatattctaggacgtttaaaaaaaatctggaccaacagtttttgaggaacgtaattttttatacaaaaatgttctgatttttcactaaactcaaaaaaatagtcaagttgggcccatgctgtacacctaaatcattaaaattcagaaccgccatgtattttttcttaccaaaggctctttaaaaaaaatctggaccaacagtttttgagatatttagaaaaacaactcacttttcgacgtttttcgccttttcgtggctcacccggttgatttttgcggcgatttgtggtctttcgctgaaaatattatttttatttcaattattaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaaacgcgaaaaaacatcgaaaaaccaccgcaacctcatgaacaaaaaaaaagcattgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg',
'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtttaaagtttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgaaatttcgcgaaaattaacagaagatttttttcggaattatagagctgaaattgaaaaaaaaactatcaaattttcatcgaatttgtgaaaaatcgtaagtatgaagatcttttcttcactatattcaaggaaaatcgatatttcgcttttcacagacgaatgatgtctcattttactcgatgaaagtttctgatgagctgtttttatcgatttttgagcgataaaaatgcgatttgttgataaaatggatcaattatataaagaaacaacatatattgctctgagattactttttgagaatcaattctttatttttcggtcattttaaattaagcattaaaataaaaatattagaaatcataataaaaaaaacagaaaatcgatatattactttttcttcggaatttcacgacttttttggacgaattttattctgtaaactttcttcttcgaatttgtgtccacgtggctttcagtcgaagaagattctgcagcactccttcttgcttgcccacaacttactcgaattttctaaaatttttaacttattgaaattgtcatttcacctttacactcacttcagctaaactattactgcatttcggaagttgataggatactggtggagcaacaagtggatggcttctagtgattggctggcttgtcgagcaagtttgtgtgattgcctgaaataatttttgatttcaattttgagttgatttaaagcagtgaacctaccaccgggttcggacgagaaagagcattactcggtagaccacggaatccaattttcgttgaattgcctccaaatgcaatagaagtttgtacgttttgtgagaagtcgggctgaaaattttcaaaatttgaaacttttcgagaaaaataaaaatctcaccacagcatttcgagattttgtcgattgtggaagccttttcctggagcgaaaattgattttttttttcgctaaattttttcttttttgggcagccgtgacgtcccgaataactgcttttgggtcccgaagatcattttgcgaagaaattggcagaactgttgcatcttttggtacgatggaaagaccgggaatggacgtgttctgaaatagttgtgtttttaagaatgcagaaatgtttttctgtaccaaaattaccatagtcatgtcattcatgatgttacgacacatgagctctctcagaacatggatgtaacgccttttcttgtcccggtaattgcaaaatctcctctcaagtgcattgaaaatcgcgtggacagattcaactccttgttctgtgatccttccaatgtttctcacatcttttgccatttgtggtgcatggtagaccaacaagtgcagctttaaaataattgtttcttcgggaaccgctactttcaaatcctccacaaatccgcgaatcgaattttgaagtattaagacgtcggaatcatttaaaaacttgtttcccgaaagtgacataatagttgaaagctttcccattgctgatttcaatccgagcaacattgggcataaatttgggccaaaaatgttgaaagtctcctctacaacagccggcgttagcagcaatttcaaatggtttccgcaaaatgattggaaccaagcctgcttgtccgctccaaacttagcccaacactgtcccattttttcaagtgttccttcgggagtaccattcacaattgtatcgagcaacaatttttccgattgaagtgctttcagttcagcatgcgactccaatttcatctttccggtggctccttgatacttttcttccgcacttttaattaggttaacagcgttttttagagttgcttttcgtgttttcaggataggaaaagaagtagtgttatccaaagtatcagaatatttccagaggggattgaagatatatttgtcaaaaatacccatgataatgtgcagaagaggaatcaaatagaacatgatcgcaacgtgtggcagaagtggagtacatcctttgcgaacacccaagtcgccattttcacaacaagctttgtaaagatcgattgttcgtgggtggaatgtttcatcaacattcatatccttgattttcatcctctcttcagctccccgtggattctgtgcaaaacatttgaagcagaaattgtgggatgaatgtccttggtgtccaagaatatcagattgaaacttgcaatctccagttgcaatttgcacaatttttgcggttttttgaactcctttgtccaaatatcaaattttcgttagcttgccaagctgctcaagaacgtccggaatgaattttttcagagacgaataattgtcggatccgtcatatactgcaattaccataacgtgtctcgaagaattcggtcgagatacgtttccgattaccaatgccaactttgtgcttccacctccagcgtcaccaacgactccaatcttgattactcctttcgtgtatccgtcgtccacaaattgatttgaattgcatagaagctctattcgataggctaaaacttctgcaattttcatgcactgcacaatggtaatcacttttcctttattgtcgaacgaagtggaaactttgaaactggagatcattgataactggattgacaaatctcttgtgttctttaccgatggaagcaaatcatagccaatggcattagtcaaatagtttttgattttttccatctgacttagagataatccgcattttgataaaaagtcaacggcctcaaagtttgaaagcttgtttttgtagctttgattctcttctgaattcaggaattttgtgaattttcgaataaattgtccgacgtcatcctcgaggcagatttcgtgttgaagcaagtgaagagctttgcgaaatcgatttttgatacaacttttgcttcttagattcgaaatattaactttaaaagctgattttttaaggttttcaacttcttcggcgtgtctttgtagactcagaaccatagctttgccacttttcttcacatctgcacagcttctcaccaatcgaccttctataccactgacgatcgttcgtatattgcatacttccatttgcagcgaagaattagatgctcttatagtgatattttcatggcggactatttgcatttcttccgaaaacaccgcaaactcatcaatccgcttttgtatttcttctgatatttcatttttttcatttttcagtcgttcgatcgttagtcggagcattttgatctgcggaatttgctcaacattggagattattcgaaccctcggtgtactgaacgagtttcgtaaaggtgtcggtggaaatacgggattggagaatctcagcaaaatcatataatattagttttgaaatattgaaaaaaattacattgtgagaaaaagtcggaatttcgtcactaaaatccatttccacgtctctcgtcagaattccttcatccatattgaaacaatttgacgacctgcatgtagttgcggagctactggaagcaatgtcgggatggtgggagtttcgatcttctgaactgatttcctgattagcctgtggcgacgagctgcacgtctgaaaatcacgtttttgaagttagaacaaactactccaacttaattaaagttgacaaaattgagctgaacgaacctccactttcgaattgttcagttcttcctcttcagtttgatcttttgaaactccattagcactgttccttgctctctgggcatttgctaaaagaaggcctgcacaagatttttcttttcttttttgtttgaagtatacttttgtcatctggaaatattgcatgaatattataagggaaacaatttttaaatatcgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttaggcgcatgatattgagctgaatgttttgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaattcaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcggctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaaggtttcaaaaattctgagaccaatttttgttgagaaaaataatttttcgttcgaattatcgatttttcacgaaatgccaaaaacagtaaacttgggcccatgctaaaagcctgaatctttcaaattaaaaaccagcatgattttttctatattctaagacgtttaaaaaaaatctggaccaacagttcttgaggaaagtaattttttatacaaaaatgtgctgatttttcactaaattcaaaaaaataatcaagttgggcccatgctatacacctaaatcattaaaattcagaaccgccatgtatgtattttttcataccataggctctttaaaaaaaatctggaccaacagtttttgagatatgtcaaaaaaaacaactcactttttgacgtttttcgccttttcgcggatgatgcggtcgatttttgcggcgatttgtggtctttcgctgaaaatattatttttatttcaatttttaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaacacgaaaaaaacgtcgaaaaactcccgcaacctcatgaaaaaaaataaagcactgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg',
'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcgtcaatatcagc',
'aattcctaaattttttattaaaatcgaaaaaaaaaaatgaaatacgtgagattgagtttcgagacttttttattcagaatcagcatatatttctccatatttgagtaggttttcagaaatattgtaccataatttttggaaaaatgtaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaaataaatatgaaatacgcgagattgaggttcaagactttttaattcggaatcagcatatatttttccatatttgagtagattttcagaaatattgtaccataatttttcgagatattttgaataataacttacttttcgacgttttttgcctttgtccggtttaatccatcgaatttcgaagcggtttgcgtagattagctgaaaacattatgcttattccacgtagtaacaagaaaaaacaagaaaaaataagaaaaaacgaagaaaa',
'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaata',
'cacggtatcacaaaaactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagtgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattaatagtttcaaggtgaaaaaaaaaataaaaaatagattttattattcaatttagtagctaacaattgagaattgaattattgaaacacagaaaaattaattttgaacagtaaaacaaagaaataatcgaaacggaagattgaaaattgaaaaatacatacaaatcgattcaaaaaacgaatttaatatgtggaatcagcctccttcttcgctttacggatgcaagttgaaatttttcttggtaaaactttcgaaaataaggaaacccggtgaaatttcttaattgcctcgactcctggtgaactttttgagataaatccatcgaaattttgctgagtagctcattgacaaggtttcaatctgaaatttcataaaatcaattattttcgaataattttaatcccaacaaaccgaaggaaatcctgaattttagcttttcgatagaatcctaaggtgtcacatcgacaattttccaagttgaaagaaaccatgtggagcatctccgcttataatatcctcgacaaaatattgatgagaaatccatcaaatttgcaccgagtatctgcttgtcaatgtttttacctgaaatttaatgaaattaataattttttaataattttaagcacaactaaccatagaaaatcccgaattttatgatattgaggaaatcccgaaattttaattgcgaaaaggttcagaattgtgtgagggagagcgctgggtggtttattgggaagacgaggcgtcctgcaaagaaaatgttacgatttgtttttttgaaaggttccacttacttcctccatatcagaaattagattgctacatgacagctccttagcaatgtacaggtatcttttccctgtgtttctaaccgagtggaagcgtctttcgaggcgattgaaaacggcatgaagcgattcgattccttgctccgaaagccttcccaaattcctttccgtttttgcgacttctactacgtgagcaaccaggacatgcaatttttgagtgattgattcttctgggtgggcagcttgtaggaattcaacgaattccttcatcgactcatccaattcgctgatgtcatcgtcagagagcaatcgattggccgacaatgacatgattttggacagcctgctcatcgcgttcttgacattcagaagcatcggagtcatgtggtttttcagatttttgaatgcagctgtgacacctttctcagacaaaatcagcttcgtgtgatttccagtgtacatttgaaaccacgctcggcgtgttgcaccaacttcttccagatcattctcaaattgtttaagatatccaccagcaagtccatccagcgtctcgtccaataacactttttcttctttcagagcagtaaactgtgctttcatctctctttttctcttaagtggtgaagcttcgaattttttgttcgcgtcttgaatcttcaaatcggcaactctttttgtctcttctttatttcttcgaatttcaaatgatgttttattgtctaaactcacaacggccatccaaatcggctcaaagatgtatttcgtgaacagtccgacaatcaagtgtagcattgcaggcaagtagtgttccaacttgacattttgaagaattggtccacttccgcatctgacgccaaagctaccgtatttagaattgagcttgtaagaattcattgttttcaagaaatatatcttttgcaagttcagatctttaagcttttgcatcagtcctcctcttggattggtttcgaaacaaaacgggcaaaaataggtggcacattgttttttatgtgacagtaaatcacatgtaaacttaaaatcaccgactactttttggacaacgccacgtgtcaccacccttccatcttccatataggtgatgctggtgaagttgttgatcttcacgataagatcagacaggtaggccatgataagttcccgcgaatcagagtcatcaaaaacagcgagaaggacgattcggtgcggcgagttcgcatgatcacaatttccgatcagaagacaaagtttcgtcgttcctcctcccgaatctccaccaattccgataacaattttgccatcggtataggaatcatggcgtaattgttttgatacagacaaacgctccagcttcggaatgacacttttctcgacatcgataattttgacaacggtcaccttttctccttttgatatatatgttgtggccttataattgtcgatagttgacattctcgttttccgttgcatcgtcaaattaatagttggcatgatttcaagatcggtgaacatcttatagttttgcttcacccgccgcaattgattgttagagaatctacatttttcttgaaaaataatcgtttgccaatgcgtcaattggacttggaaggactgcgaatcattttgttggagatatttctggaaatcaatcatgaaattacgaatatcctcttctcgactgattcgttgaagcaaatccagtgccaaatctatcctagattttccagagaacgattccttcttcgcaactctgatatccatattattctgtactttttcgcccttgacgtaatttgagcgatttgttttcttaatatcagattttcgttggttttccgttttgatgtttttcttattgaaattatctctttcctttgcatgaaaaattcatcagaaaaaacggcaaactcgtcgtctctctccatcaaccgatttagaagtgtctcttcgtcaatgtctgttcttggattgagctggaccgaatggtttgagctggacaggatggcggttctgaaatggaaaattgaataaacaatcaacaacaaagaaataaatctacctctgatatgctatccaaaatggaatcaacttctagaaatctcattcgttctctactcaagtcttctctaattcccttcacatatggtgtctcattcgaaaacacgccgacgctgtcactcgatgaatagagtggacttgacgatctttcaaaagagtgtggtgttcttggaggtgatgaagaagattccgcaatagaagaagttgatggttgtgtgtcgaacggctggaaaaattatttttaaaatattataatcgtcttaggatccgagttgtagcatggattaattcacttacaatatcagtttcaggtgttattgtcagattcaaatcgatattgcgtttcgactccacatccagattatcatcaaaatccacaaatggagacactcgtcttctctcaggagtgatctgcgcatctccgtttctgtcaagagcttctgtcaattctacattcgacaggtttaggtccagggaagcatttctttcagtatttgaattttcgtcttcttcatcttttttccatcgtcgtttccgagcatttagaagatttgaagaccatgctgattgatttgatggaggaggcatctgaaaaaaaaattttttactcaattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtacgatttgtgtatatgtttgcctttatcttgtagagaatttttagctgattctaattcggcaataaaaataaaacattttgtgatttttttcgaaaaaaatcatttttcctccattttcagtgcaaaattttagtttcgaaaaaattttcttgaagaatgtacgatttctaaaaaattctctctgtatcttgcttaaaattaacagttctttctctttgaagcaaaaaactaacacattttgtgatttttttggcaaaaaaaattattttataaattcttattacaaaaaaaaattttttcgaaaaaattttaatgaaaaattaagatttctctatgaattgagttccatcttatacacaattaaattttgatcataaaacaactataaatcgtaaagagtttttgattttcttgaaaaagggaacgtttttaaaaactattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtacgattcgtgtatatgtttgcctttatcttgtagagattttttagctgattctaattcggcaataaaaatataacattttgtgatttttttcgaaaaaaatcatttttcctccattttcagtgcaaaattttagtttcgaaaaaaaaattaatgaagaaggtacgatttctaaaaaattcggcccgtatcttgttcagaatttttagctctctctttttcaatcaaaaaaataaaatattttgtgattttttaagagaactcgtcaaaaaactcacttaattcgttttcctttctgcccacgccgacgctcctttgttttcctcgaattttttcgcttactacctgaaacaagacacactttttcgattttacaaaaaaaaataacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatcgatgccgagtgaaaggcttgaaatttaaacaattgttcgcaatagagcgtgtttgcctccatctagagattgaaccaccgtg',
'tgctgaaaattgctgaaaatcgaaatttcgtcagctgatgtcgattattctgcgcgggggtacggtacgcaagtccgcaaacactgtcacgccaaattgcgga',
'cacggtagcacagaaactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagcgaaatggtccgcaatgtgtctcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattactagtttaaagtgaataaaattaaaaaaaaaagattttattattaaatttatcaactaacaattcaaaattaatgtattaaaacacagaaaagttgattttgaacagaaaaacggagtaatcatttaaaaagacaatattaaagtgaaaaaacacgcaaatcgattgaaaaaacgaatttaatatgtggaatcggccttctttttcgcttttcggctgcaagttggaatttttcttgggaaaaactttcgaaaatgaggaaatcagtgggaacttcttaatttcctcgactcctggtgaactttttgtgttaaatccatcgaaattgtgcggaatagctcattgacaaggtttcaatctgcaatttgatgaaattctatatttttaaattattttaatcacaacaaacctgaggaaatcccgaattcgatcctttcgataaaatccagagatgtcactttgccacttttccaaaatgaaggaaaccatgtggagcttctcagctttttgagttcctggccgaaatcatgatgaaaaactatcgaaattgacttgagcagcttcttggcaaggtttttgtctgaaattttaagattttaatgatttttgaacgtttttaacacaacgaaccaaaggaaatcctgaatttcacatttctgactgtttcctgggatgttacatcggcagttttccaaaatgaaggacatcatgtagagcatctccacttattgaaattctggtgaagttcttgccgacaaatccatcgacattacgttgaacgtcttctaggcaaggtgtttatctgaaaattcatga',
'taagcagtttttgaaaagttttcgaaaaaaaAAAGAATTTCCGTTTTTTGAGATttaattttcagtgaaaaaaatttacttttggaaaatttcaagtgaaaaatgtacgattcgtgtatatgtttgcctttatcttgtagagaatttttagctgattctaattcggcaataaaaatataacattttgtgatcgttttcgaaaaaaaaatctttttctttatttttagtgcaaaattttagtttcgataatttttctatgaagaatgtacgatttctagaaaattctgcctgtatcttgctcaaaattaacagttctttctttttaaagcaaaaaattaacacattttgtgattttttggcaaaaaaaattattttataatttcttatttcaaaaaattttttttcgaaaaaatcttaatgaaaaattaagatttctctatgaatttagttccatcttatacaaaatttaatgctgatcataaaacaactataaaatgtgaagactttttgattttcttgaaaaatggaacgtttttaaaaactgttttcagtgaaaaaaatttacttttgaaaaattgtgaaaattgaaaaattgtgaataagtgaaatatgtacgatctctcaataattttgtcttcatcttgtagagaattgttagctgtttctgattcggcaagaaaaatacaacattttgtgatcgttttcgaaaaaaaaaatttttttcttaatttttagtgcaaaattttagtttcgaaaaaaaatttatgaagaaggtacgatttctagaaaattctgctcgtatcttgttcagaatttttagctctttctttttcataccaaaaaataaaatattttgtgattttttaagagaactcgtcaaaaaactcacttaattcgttttccttgccgcccacttcgacgttcctttgtttttctcgaattttttcgcttactacttgaaacaagacacactcttttgattttacaaaaaaaaattacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatcgatgccgagcgaaaggcttgaaatttaaacgattgttcgcaatagagcgtgtttaccgccatctagcgactgaaccaccgtg',
'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',
'cacggcccggcgaaagagacgtggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacgaataaacatttttaatttggctgctaagctcatttatctttgttttttctcgttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcacaaaaccgcggcaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagagtttttt',
'tgcgaaaaactgtttaaagtatcgattttcttcaatatcagcaacatacaatcctttaaaatgattattttttgtaaattcgataaaaattcatttatttttcacaacttctgcccgaaaattaccgaaataaccagcgtttctataactaagaaagtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',
'ttgttcggtggagtgcgtttgcacccatctagcaactgaaccaccgtg',
'tttttcgtgtttttttatgtttttttatgttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggacttaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttgtttt',
'aaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaactcaacacattttcgatcatttttgaacaaaaaaattgttttctgaaaaatttgacgcttaatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacactttttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatataaagaaaaacatgcttgttctgaattagtaagaattgcgactcaagctcgcgtatctcatatttatttgttcgatttcagtggaaaatcgacacatttttgggaaaatttttttttttcgaaatattcgctcctaatttttaatgatttccagatgaca',
'tctctatattattcattcattttatcaacaaacttctatcgccctaacgtcgatcaaaaaagctcatcagcaactgccgtcgagt',
'tctcgtcgagtgaaatgcgatagaatttgtctgtgaaaaaccaaatatcgattttccttgaatatcgtgaagaacaaatcttcatacttacgattcttcacaaattcgatgaaaatctgatagttttttcaattttagctctataattccgaaaaaaatcttctgttaattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaagtttacactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaatcctcgtccatgtaccccccgccgggccgtg',
'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',
'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',
'gttttttcttattttttcttgtttttttttgttactacgtggaataagcataatgttttcagctaatctacgcaaaccgcttcgaaattcgatggattaaaccggacaaaggtaagaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattatggtacaatatttctgaaaacctactcaaatatggaaaaatatatgctgattccggattgaaaagtcttgaacctcaatctcgcgtatttcttattgagtttctcgattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaagaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatagaaaaatatatgctgattctgaattagtaacatttgaaactcaatctcacgtatttcataatttttttttcgattttaatgaaaaatatattcagtgcaattttcgattttttttcaaaa',
'gctgatattgacgaattgtcgtcaaataaaatacgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt',
'tgtgggctacggtagtcaagtacgcaaacaccacgagcattttcacaattgcgtacaaaatttttttcaagcttt',
'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',
'ttttgaaaaaaagttactttttgttcgaaaatgtattgattttcacttattttcactagattcaaaaaaa',
'ttggacccatgctatactcaacatttttttggaattctgaatcagcattctcttcataaattagacaatttctaaaaaatctggaccaa',
'ttttccggtgatttgctaaaactataatttctatttcaattattaaccgagaaaaccagaaaaaa',
'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',
'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaata',
'tattaattgctaaaatttatgtggactacgatagccaagtccgcaaacaccacg',
'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcagttaatacttcccggtttcgttttacatgataatatcgttgaatttaagcaaaaaatgcagcattagtttcatgaaaaaaaaaataagaga',
'atttttcctcaaaaactagaatttttcattgaaaaatggcttaaaaatcgatttttgttcgaaaaaa',
'aaggttttcatcgataaagtcacgaatttgtcgaaatgctttggtgagttttatctttttcagaaaaaaaattcgaaaattttcag',
'cacggtggttcagttgctagatgggtgcaaacgcgct',
'attgttcggtagagcgcgtttgcactcatctagctgctgtaccaccgtg',
'agaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcg',
'ttttggaaaaatttaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgaaaaaaaatatatgaaatacgcgagattgaggttcaaggcttttcaattcggaatcagcctatatttttccatacttcaatgggttttcagaaatattgtatgtgaatttttggaaaaatgtgatttttaatttgaaattgcactgaatttcttgaatttttcaataaaatcgagaaaataaatatgaaatacgcgagattgaggttcaagactttttaattcggaatcagcatatatttttctatatttgagtgggttttcagaaatattgtatgtgaattttttgagaaaa',
'tttttcgaaaagtgttggaccagatttttttgatcttagaatatgaataaaagcatgctg',
'caaggcccggcaaaccgtacgtggctgcaaaatctctaccgtagtaaaatttggctgactgaatattcaacgtttgatactcagtgaaaagattcgtac',
'cacggtggttcagtcgctagatggcggtaaacacgctctattgcgaacaatcgtttaaatttcaagcctttcgctcggcatcgattctctccgttttttgccgattttaaccgtttttccgtattttttttgttatttttttttgcaaaattgaaaaagtgtgtcttgtttcaggtagtaatcgaaaaatttcgaggaaaacgaaggaacgtcaccgtgagcagcaaggaaaacgaattcagtgagttttttgactatttctctctgaaaaaattaaaaaatgtattactttttgattgaaaaagaaagagcaaaaaattctgaagaagatgtaacttaatttattcagaacaaaagttttttttgtctcgaaattacaatttttcgaccaaaaatagaaaatgatcattttctcgaaaatgaatcccaaatttatgtcagttttttatgaaaaaataatcagcgtaaacttctgtactagaaatactaccttttttttggattgaaagtttgtatcgtgctcaaaattttaaaagtaaaattattttacgctgaaaaatgcaaaatcataactttttttgaggagaaacccccaaatgttatcgattttgttatcaaattgaactcagcttaaaattatctacaagatgaagctaaattcattgaga',
'atattaattaagttttttgtatcaaattgtgtgttttctcaattttatattgcctttttatttcataactttccttttctgttcaaaatcaacttttttttgtgttttaacacttcaattatcaattgttagtttataaatttcataataaactctgattttttattttttttcatcttgaaactattaattctgctgtttttctgctaaaatttgcttcaaaaatctatttgccgcttcattgttttgcggactacggtacgcaagtacgcaaacaccgcaacgacacattgcggaccatttcgctgcgtacgctgcgagatctttctcaaattttacgagagatctagtttctgtgctactgtg',
'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtaaaatttagctgactgaatattcaacgtttgatactcagcgaaaagtttcatacgacattttgcggacgcggcattttaattgacgacactttcttagttatagaaacgctggttatttcggcaattttcgggcagaaattgtgaaaaataaatgaatttttatcgaatttacaaaaaataataattttaaaggattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca',
'aaaaaaaactcacttttttcgaattttcctccttccagttggcggttgagtgctgttttgccgcggttttgtgattttctgcaacaaaaaaaatatttttatcgataaaaatgagaaaaaacgagaaaaaacgaagataaatgagcttagcagtcaaattaaaaatgtttattcgttcaaaaatcttaaccataggaggcggtggcctagccggcgcagctctcgcggccacgtctctttcgccgggccgtg',
'ttttcttcgttttttcttattttttcttgttttttcttgttactacgtggaataagcataatgttttcagctaaactacgcaaatcgcttcgaaattcgatggattaaaccggacaaaggcaaaaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattctggtacaatatttctgaaaacccactgaaatatggatatgctgatttcgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttattttctcgattttattgaaaaatttgagaaattcagtacaatttcgaattaaaaattaaatttttccaaaaattctggtacaatatttctgaaaacccactgaaatatggaaaaatatgctgatttcgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttattttctcgattttattgaaaaattcaagaaattcagtgcaatttcaaattaaaaatcacatttttccacaa',
'gctgatattgacgaattgtcgtcaaataaaatgcgcgtacgcaatgtacacaatagtgtaaaaaactttatcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt',
'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactttgcgtacgcgcattttatttgacgacaattcgtcaatatcagc',
'aaaatgttgaagagaaaccagagaaattgatcgagtagattcttggcaagttttgaaattatatggttttaataagttttgaacatttttaaatacaactaaccatgga',
'ttgtttggtggagcgcgtttgcaccaatctagcaactgaaccaccgtg',
'tttttttaaatttttttcttggctgctttactgatgtttttttctcaattttttcttgttttctttgttactaatttaaattaaaaaaactattttcagcttatcacagcaaatcggagcgaaactcgaccgcgataacaggaaaaagtcgaaaagtgagttttttgccaaaatatctcgaaaaactcatattttgttttgaaaacagatgcaaataaaaagaaatacat',
'atgtattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',
'tttcgtaatgtttttttcgagtttttgattgttttttctcatttttttttgtttttctttattattagttaaaatataaaaactattttaagctaatcaacgcaaatcgaggcgaaaaccgatcgcagaaagaggaaaagtcgaaaagtgagtttttttgcaaaaatatttca',
'acgtggctgaagaaatttctacagtagtcccatttggctgactgaatattcaacgcgaataagttttgtacactattgcgtactctgcgtacgcgcattttatttgacgacaattcgttaatatcagc',
'ttcattaaaatcgaaaaaaaaattatgaaatacgtgagattgagtttcaaatgttactaattcagaatcagcatatatttttctatatttgagtgggttttcagaaatattgtaccataatttttggaaaaattaaattttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaactaaatatgaaatacgcgaaattgaggttcaagacttttaattcagaatcagcatatatttttctatatttgagtgggttttcagaaatattgtaccataatttttggaaaaatttaatttttaattcgaaattgcactgaatttctcgaatttttcactaaaatcgagaaactaaatatgaaatacgcgagattgagattcaagacttttaaattcggaatcagcacatatttttccatatttgagtaggttttcagaaatattgtaccatattttttcgagatattttgaataataacttacttttcgacgttttttgcctttgtccggtttaatccatcgaatttcgaagcggtttgcgtagattagctgaaaacattatgcttattcca',
'gcaaacatactcttttgcgaataagcgatttttttgttttttttttttggtgttttccgttttttgcttgttttcaccgtttcccctctttttttttgtttttttttgtcaaatcgagaaagagtgtgtttttttttcaggtgttaaacaagatttgcgagcaaaacgagggcacaccatcgtaagaagcgaagaaaacgagaaaagtgagttttttgaagattcctctttaaaaaatagggaaatgttttagttttgagccaaaaaagaaagagctgaatttttcaaacaagatacatgc',
'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtgaaatttggctgactgaatattcaacgtttgatactcagcgaaaagtttcgtacgacattttgcggacgcggcattttaattgacgacactttcttagttatagaaacgctggttatttcggcaatttcgggcagaaattgtgaaaaataaatggatttttatcgaatttacagaaaataattatttgaaagtattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca',
'aaaaaactcacttttttcggattttccgcctcccagttggcggttgagtgctgttttgtcgcggttttttaattttctgcaacaaaatgtatatttttatcgataaaaatgagaaaaaacgagaaaaaac',
'ttgttcggtggagcgcgtttgcacccatctagcagctgaaccaccgtg',
'cacggtggttcacttgctagatgggtgcaaacgcgctccactgaacaa',
'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',
'gcgaaattctgcattttgtcgtgagatccgcggtgtttgcgtacttctggggctaccgtaacccggaaaa',
'tcgagttttacattgaaaaaaaatggccaaaaatcggagaaaaatgggcaaaaaacggagagaattgatgacaaatcaaag',
'tcgagttttacattgaaaaaaaatggccaaaaatcggagaaaaatgggcaaaaaacggagagaattgatgacaaatcaaag',
'ccttaaaaggaagaaatttggtggaaaaatacaattttcgctctaaaaaattccgtaaattcgagaatttatgaaaaatactttggttttttat',
'gcaaaattctgcaatatgtcgtcaaattcggtgtttgcgtattttcgacgctaccgtaccccgcggaa',
'ttttcttcgttttttcttattttttcttgttttttcttgttactacgtggaataagcataatgttttcagctaatctacgcaaaccgcttcgaaattcgatggattaaaccggacaaaggtaaaaaacgtcgaaaagtaagttattattcaaaatatctcgaaaaattatggtacaatatttctgaaaacctactcaaatatggaaaaatatatgctgattccggattgaaaagtcttgaacctcaatctcgcgtatttcttattgagtttctcgattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatggaaaaatatatgctgattctgaattaaaaagtcttgaacctcaatctcgcgtatttcatatttagtttctcaattttagtgaaaaattcgagaaattcagtgcaatttcgaattaaaaattaaatttttccaaaaattatggtacaatatttctgaaaacccactcaaatatagaaaaatatatgctgattctgaattagtaacatttgaaactcaatctcacgtatttcataatttttttttcgattttaatgaaaaatctaagaattcagtgcaattttcgattttttttcaaaa',
'gctgatattgacgaattgtcgtcaaataaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcagccacgt',
'actacggtgcgcaagtactcaaacactgcgacgtcagagcgcagac',
'tttagacgtatttttcttttctctgctcttatgatcgattttcgcagaggtttttgattatccggtaaatattactagttattctaatttttcattaaaaaattacatcgaaaataacgaaaaaacatcgaaaaacgcgaaagatcaacgaaaccaattcatgaattaattcgaatttataattcagtacaaaagcgattcggtcgcgggactagattttgcaacttcctaggccatttccaatttgcagtgc',
'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
'ggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',
'ttgttcggtggagctcgtttgcacccatctagcaactgaaccaccgtg',
'cacggtgcttcagttgctagatgggtgcgaacgcgctccaccgaacaa',
'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',
'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
'gcatggggcgtggccgaaaattctctactaccgtttaccaatttggctaatttgccaatcaacgttgaaaagttttgtacatcg',
'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttgactgactgcgtgctcaacgttgagtactcagtttaaagtttcgtacaccgttgcgtactacacagcgcgcattttaattgacgacatttcgcgaaaattaacagaagattttttcggaattatagagctgaaattgaaaaaaaactatcaaattttcatcgaatttgtgaaaaatcgtaagtatgaagatcttttcttcactatattcaaggaaaatcgatatttagcttttcacagacgaatgatgtctcattttact',
'gcctcattttactcgatggaagtttctgatgagctgtttttatcgatttttgagcgataaaaatgcgatttgttgataaaatggataaattatataaagaaacaacatatattgctctgagattactttttgagaatcaattctttatttttcggtcattttaaattaagcattaaaataaaaatattagaaatcataataaaaaaaacagaaaatcgatatattactatttcttcggaatttcacgacttttttggacgaattttagtctgtaaactttcttcttcgaatttgtgtccacgtggctttcagtcgaagaagattctgcagcactccttcttgcttgcccacaacttgctcgaattttctaaaatttttaacttattgaaattgtcatttcacctttacactctcttcagctaaactattactgcatttcggaagttgataagatactggtggagcaacaagtggatggcttctagtgattggctggcttgtcgagcaagtttgtgtgattgcctgaaataatttttgatttcaattttgagttgatttaaag',
'gatttaaagcagtgaacctaccatcgggttcggacgagaaagagcattgctcggtagaccacggaatccaattttcgttgaattgcctccaaatgcaatagaagtttgtacgttttgtgagaagtcgggctggaaattttcaaaatttgaaacttttcgtgaaaaataaaaatctcaccacagcatttcgagattttgtcgattgtggaagccttttcttggagctaaaattgattt',
'tacgatggaaagaccgggaatggacgtgttctgaaatagttgtgtttttaagaatgcataaatttttttctgtaccaaaattaccatagtcatgtcattcatgatgttacgacacatgagctctctcagaacatggatgtaacgccttttcttgtcccggtaattgcaaaatctcctctcaagtgcattgaaaatcgcgtggacagattcaactccttgttctgtgatccttccaatgtttctcacatcttttgccatttgtggtgcatggtagaccaacaagtgcagctttaaaataattgtttcttcgggaaccgctactttcaaatcctccacaaatccgcgaatcgaattttgaagtattaagacgtcggaatcatttaaaaacttgtttcccgaaagtgacataatagttgaaagctttcccattgctgatttcaatccgagcaacattgggcataaatttgggccaaaaatgttgaaagtctcctctacaacagccggcgttagcagcaatttcaaatggtttccgcaaaatgattggaaccaagcctgcttgtccgctccaaacttagcccaacactgttccattttttcaagtgttcctccgggagtaccattcacaattgtgtcgagcaacaatttttccgattgaagtgctttcagttcagcatgcgactccaatttcatctttccggtggctgcttgatacttttcttccgcacttttgattaggttaacagcgttttttagagttgcttttcgtgttttcaggatagggaaacaagtagtgttatccaaagtgacagaatatttccagaggggattgaagatatatttgtcaaaaatacccatgataatgtgcagaagaggaatcaaatagaacatgatcgcaacgtgtggcagaagtggagtacatcctttgcgaacacccaagtcgccattttcacaacaagctttgtaaagatcgattgttcgtgggtggaatgtttcatcaacattcatatccttgattttcatcctctcttcagctccccgtggattctgtgcaaaacatttgaagcagaaattgtgggatgaatgtccttggtgtccaagaatatcagattgaaacttgcaatctccagttgcaatttgcacaatttttgcggttttttgaactcctttgtccaaatatcgaattttcgttagcttgccaagctgctcaagaacgtccggaatgaattttttcagagacgagtaattgtcggatccgtcatatactgcaattaccataacgtgtctcgaagaattcggtcgagatacgtttccgattaccaatgccaactttgtgcttccacctccagcgtcaccaacgactccaatgttgattactcctttcgtgtatccgtcgtccacaaattgatttgaattgcatagaagctctattcgataggctaaaacttctgcaattttcatgcactgcacaatggtaatcacttttcctttattgtcgaacgaagtggaaactttgaaactggagatcattgataactggattggcaaatctcttgcgttctttaccgatggaagcaaatcatagccaatggcattagtcaaatagtttttgattttttccatctgacttagagataatccgcattttgataaaaagtcaacggcctcaaagtttgaaagcttgtttttgtagctttgattctcttctgaattcaggaattttgtaaattttcgaataaattgtccgacgtcatcctcgaggcagatttcgtgttgaagcaagtgaagagctttgcgaaatcgatttttgatacaacttttgtttcttagattcgaaaatttaactttaaaagctgattttttaaggttttcaacttcttcggcgtgtctttgtagactcagaaccatagctttgccacttttcttcacatctgcacagcttctcaccaatcgaccttctataccactgacgatcgttcgtatattgcatacttccatttgcagcgaagaattagatgctcttatagtgatattttcatggcggactatttgtatttcttccgaaaacaccgcaaacgcatcattctgcttttgtatttcttctgatatttcatttttttcatttttcagtcgttcgatcgttagtcggagcattttgatctgcggaatttgctcaacattggagattattcgaaccctcggtgtactgaacgagtttcgtgaaggtgtcggtggaaatacgggattggagaatctctgcgaaatcatataatataatattagttttgaaatattgaaaaaaattacattgtgagaaaaagtcggaatatcgtcactaaaatccatttccacgtctctcgtcagaattccttcatccatattgaaacaatttgacgacctgcatgtagttgcggagctactggaagcaatgtcgggatggtgggagtttcgatcttctgaactgatttcctgattagcctgtggcgacgaactgcacgtctgaaaatcacgtttttgaagttagaacaaactactccaacttaattaaagtagacaaaattgagctgaacgaacctccactttcgaattgttcagttcttcctcttcagtttgatcttttgaaactccattagcactgttccttgctctctgggcatttgctaaaagaaggcctgcacaagatttttcttttcttttttgtttgaagtatacttttgtcatctggaaatattgcatgaatattataagggaaacaatttttaaatatcgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggaccaacagttttcgaaaaaattcaatttttgttcagaaatgtgaatattcactaaatcgaaaaaaataattgcaaaatccgtcggctgagcattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaagatttcaaaaattctgagaccaatttttgttgagaaaaataatttttcgttcgaattatcgatttttcacgaaatgccaaaaacagtaaacttgggcccatgctaaaagcctgaatctttcaaattaaaaaccaacatgattttttctatattctaagacgtttaaaaaaaatctggaccaacagttcttgaggaaagtaattttttatacaaaaatgtgctgatttttcactaaattcaaaaaaatagtcaagttgggcccatgctatacacctaaatcattaaaattcagaaccgccatgtattttttcataccataggctctttaaaaaaaatctggaccaacagtttttgagagatgtcaaaaaaacaactcacttttcgacgtttttcgtgtttccccggatgatgcggtcgatttttgctgcgatttgtggtctttcgctgaaaatattatttttatttcaatttttaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaacacgaaaaaaacgtcgaaaaactcccgcaacctcatgaaaaaaaataaagcactgcagccgcgggactagttttcgcaactttctaggccatgtcccgttcgccgtgtcgtg',
'aaaaaaactcacttttcgactttttcctgtttctgcgatcgggttttgcgtcgatttgtggtaattagctgaaaatataaactatagtttttatattttaactattaataaagaaaacaagagaaaagtgagaaaaaacaatcaaaaactcgaaaaa',
'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',
'TGCGAAAAACTGTTTAaagtatcgattttcttcaatatcagcaacatacaattctttaaaatgattattttttgtaaattcgataaaaattaatttatttttcacaatttctgcccgaaaattgccgaaatgaccagcgtttctagaactaaaacaagtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',
'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaacaa',
'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaattttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgttaattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacataaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaatcttctaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctctcgacagaaacggagatgcgcagatcactcctgagagaagacgagtgtctccatttgtggattttgatgataatctggatgtggagtcgaaacgcaatatcgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaatgatttttccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcattacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgccaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttcacaaagccgaaacgttgtcacgcttcccaagtataccatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttttctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaatcgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaagcttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggaaatcacacgaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggagctgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttacctagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg',
'ttgttcggtggagcgcgtttgcacccatctagcaactgaaccaccgtg',
'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaattttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgttaattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacataaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaatcttctaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctctcgacagaaacggagatgcgcagatcactcctgagagaagacgagtgtctccatttgtggattttgatgataatctggatgtggagtcgaaacgcaatatcgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaatgatttttccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcattacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgccaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttcacaaagccgaaacgttgtcacgcttcccaagtataccatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttttctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaatcgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaagcttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggaaatcacacgaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggagctgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttacctagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg',
'cacggcccggcggggggtacatggacgagaattctctaccgtattccaatttggctgactgcgtgctcaacgttgaatactcagtgtaaactttcgtacaccgttgcgtactgcacagcgcgcattttaattgacgacatttagcaaaaattgaacagaagatttttcggaattatgaagctcaattttcacaaaaataatgagttttttgtagaatttatgaaaaaacgtgaatatatagattttttgttcatgatattcaagaaaaatcgatttttagttcttcacagagtaatcctatcgcatttcacttgctcatgatgtttttgctcgactttaggacgataaaaatgcgaattgttgataaaatgaatgaacaatataaagaa',
'ggggctgctggaaccaatgtcggcatgacgagagttccggtcttctggatccatttcctgcgtgggctgtggcgacgagctgcacgtctgaaaatcaagtttttgtaatt',
'tttgggcgcatgatatggagctgaatcattcgattttagaatcagcaagcttttattcatattttaggatctttttaaaaaatctggaccaacagtttttgaaaaaaaatacttttcgttcagaaatgtactgattttccactgattttcacgaaatttgaaaaaatcaataatttgggcgcatgatattgagctgaatgtttcgaatttagaatcagcatgcttttattcatattttaggatctttttaaaaaatctggcccaacagttttcgaaaaaatttaatttttgttcagaaatgtgaatattcacgaaatcgaaaaaaataattgcaaaatccgtcagctgaacattcaaaacttatcaatttgaaatcagcatatttcagtgtataattaaaaaaggtttcaaaaattctgagaccaatttttattgagaaaaataatttttcgctcgaattattgaattttcactaaatgcaaaaaacagtaaacttgggcccgtgctacaagcctgaatctttcaaattaaaaaccagcatgattttttcaatattctaggacgtttaaaaaaaatctggaccaacagtttttgaggaacgtaattttttatacaaaaatgtactgatttttcactaaactcaaaaaaatagtcaagttgggcccatgctatacacctaaattattaaaattcagaaccgccatgtattttttcatactataggctctttaaaaaaaatctggaccaacagtttttgagatatttagaaaaacaactcacttttcgacgtttttcgccttttcgcggctcacccggtcgatttttgcggcgatttgtgttctttcgctgaaaatattatttttatttcaattattaacgaagaaaacaagaaaaaacgacgagaaaacatcaaaaaaacgcgaaaaaacatcgaaaaaccaccgaaacctcatgaaaaaaataaagcattgcagccgcgggattagttttcgcaactttctaggccatgtcccgttcgccgtgccgtg',
'aactagatctctcgtaaaatttgagaaagatctcgcaggtacgcagcgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacattgcagcggcaaatagatttttgaagcaaattttagcagaaaaaaggcagaattaatagtttcaaggtgaaaaaaaaaaaaatagattttattattcaatttattagctaacaattgagaattgaattatcaaaacacagaaaaattaattttgaacagtaaaacaaagaaataatcgaaacggtagattgaaaattgaaaaatacatacaaaacgattcaaaaaacgaattaatatgtggaatcggcctccttcttcgctttacggatgcaagttgagatttttcttggaaaaactttcgaaaataaggaaatcagtgggaacttcttatttcctcgactcctgcaggatcctggtgaactttttctgttaaatccatcgaaattgtgcggagtagctcattgacaaggtttcaatctgaaattttgtgaaattttatatttttgaataattttaatcacagcaaacctagggaaatcccgaattcgagcctttcgataaaatccagagatgtcacatcgccacttttccaaaatgaaggaaaccaggtggagcttctcagctttttggcttcctggtcgaaatcttgatgaaaaaaccatcgaaatttacttgagcagcttcttggcaaggtttttgtctgaaattttaggattttaatgatttttaacatttttaaacacaactaaccataaacaatccggattttttcggttttgactgaatccttggatttatgtagaaaacatgcccagaaatcaaggaacgaggtggaacatctcatttttttgaaattctggtgtaattcttgatgaaaaatccatcgacattacgttgaacgtcttcttggcaaggtgttttcttctgaaaattcatga',
'ctgtaacatctaagcagtttttgaaaagttttcgaaaaaaaaataaatttcagtttttgagatttaattttcagtgaaaaaaatttacttttggaaaattttaagtgaaaaatgtaccgtttctgaaaatgtttgcttttatcgtgtagagaatttttagctggttctaatccggcaagaaaaacagaacattttgtgatcgttttcgaaaaaaaaatttttttctttaattttaagtgcaaaattttagtttcgataatttttctgtgaagaatgtacgatttctagaaaattctgcctgtatcttgcttaaaatgaacagttctttctttttaaagcaaaaaactaacacattttgtgattttttttggcaaaaaaaattattttataatttcttatttcaaaaaatgttttttcgaaaaaattttaatgaaaaattaatatttctttatgaacttagttccgtcttatacaaaatttaatgctgattataaaataactataaaacgtgaaga',
'cacggcccggcgaaagagacttggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacgaataaacatttttaatttggctgctaagctcatttattttcgttttttctcgtttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcaaaaaaccgcgacaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagtgagtttttt',
'tgcgaaaaactgtttaaagtatcgattttcttcaatatcagcaacatacaatcctttaaaatgattattttttgtaaattcgataaaaattaatttatttttcacaatttctgcccgaaaattgccgaaataaccagcgtttctataactaaaacaagtgtcgtcaattaaaatgccgcatccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',
'ttgttcagtggagcgtatttgcataaatctagcaactgaaccaccatg',
'attgaccaaaatcgagaaacattgcgaaaaactgtttaaagtgtcgattttcttcaatatcagcaacatacaatactttcaaatgattattttctgtaaattcgataaaaatccatttatttttcacaatttctgcccgaaaattgccgaaataaccagcgtttctataactaagaaagcgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagaatttacagccacgtacggttcgccgggccgtg',
'cacggcccggcgaaccgtacgtggctgtaaaatctctaccgtagtaaaatttggctgactgaatattcaacgtttgatactcagcgaaaagtttcgtacgacattttgcggacgcggcattttaattgacgacacttgttttagttatagaaacgctggttatttcggcaattttcgggcagaaattgtgacaaataaatgaatttttatcgaatttacaaaaaataatcattttaaaggattgtatgttgctgatattgaagaaaatcgatactttaaacagtttttcgca',
'aaattgtagtcagtatcactgcagatgctggagcaggaatcacaaagttttgtctgattatcgagaattgt',
'ttttcgagatattttggcaaaaacctcacttttcgtcgttttcctcctactgcgatcgattttcgccccgatgattagctgaaaataattttatatgttagttagtaacaaagaaaataagaaaaattgagaaaaaacaatcaaaaactcgagaaaa',
'cgaattgtcgtcaaattaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttcttcgcgttgaatattcagtcgcgactagtcagccaaatgggactactgtagaaatttcctcggccacgttccaaacgccgctccgtg',
'ttgttcgatggagcgcgtttgaacccatctagcaactgaaccaccgtg',
'cacggtggttcagttgctagttgggtgcaaacgcgctccaccgaacaa',
'cacggtggttcagttgctagttgggtgcaaacgcgctccaccgaacaa',
'cacggtagcacagaaactagatctctcgtaaaatttgagaaagatctcgcagcgtacgcagcgaaatggtccgcaatgtgtcatcgcggtgtttgcgtacttgcgtaccgtagtccgcaaaacaatgaagcggcaaatagatttttgaagcaaattttaacagaaaaacagcagaattaatagtttcaagatgaaaaaaaaaataaaaaatcagagtttattatgaaatttataaactaacaattgataattgaagtgttaaaacacaaaaaaagttgattttgaacagaaaaggaaagttatgaaataaaaaggcaatataaaattgagaaaacacacaatttgatacaaaaaacttaattaatat',
'tgcataaattcaatttcatcttgcccagaattttaagctgattcgaattcatatcaaaatctgaagattcttcaattttttttatcgaaaaatttttttttctaaattttgtgaaaaatttt',
'tttagcttcatcttgtagataactttaagctgagttcaatttgataacaaaatcgataacatttggggatttctccacataaaaaattacgatttttaattttttagtgaaaaatatttttactttcgaaattttgagcatgacacaaactttcaatcgaaaaaaaagcgtacatatctggtacagaattttatgtagattattttttcataaaaaactgatccaattttgggattcgttttcgagatagtgatcattttctatttttggtctaaaatttttgatttcgaggcaaaaaaaaattattctgaataaattaagttacatcttattcagaatttttagcttaatttctagctcgaattaaataaaaaatctgaagattcttcaatttttttttaccaaaaacagttttttttctaaactttgtgaaaaaatttttttaattcaaatggatctgacaaatttttttcaatctcaattattttagctttctcttgtacagaattttaagctgttttcaattcaataacaaaatcgataacatttgggggtttctcctcaaaaaaagttatgattttgcatttttcagcgtaaaataattttacttttaaaattttgagcacgatacaaactttcaatccaaaaaaaggtagtatttctagtacagaattttacgctgattattttttcataaaaaacttacataaatttgggattcattttcgagaaaatgatgattttctatttttggccgaaaaattgtaatttcgagacaaaaaaaaaccttttgttctgaataaattaagttatatcttcttcagaattttttgctctttctttttcaatcgaaaagtaatacattttttaattttttcaaagagaaatagtcaaaaaactcactgaattcgttttccttgttgctcacggtgacgttccttcgttttcctcgaaatgtttcgattactacctaaaacaagacacactttttcaattttgcaaaaaaaaataacaaaaaagatacggaaaaacggttaaaatcggcaaaaaacggagagaatccatgccgagcgaaaggcttgaaatttaaacgattgttcgcaatagagcgtgtttgccgccatctagcgactgaaccaccgtg',
'tcagttttcacaaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcattcggtgtataaattgtaaac',
'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
'tttgctggttacggtaaaaaagtacgcaaacaccaaacgtgaagtgcagacattgcgttttaccgtacttccgtgttcttttt',
'cacggcacggcgaacgggacatggcctagaaagttgcgaaaactagtcccgcggctgcaatgctttatttttttcatttggttgcggtgggcttttcgatgttttttcgtgttttttgaagttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttttttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaa',
'tagaaagatttgagactcaagctcgcgtatttcagcttatttttttcatttggttgcggtgggcttttcgatgttttttcgtgttttttgaagttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagtaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttttttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaataagaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttcggaattagaaagatttgagactcaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaataagaagaaaagcatgattaaccagaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacagttgagacccaagctcgcgtatttcatattttttttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttttttttttgaaactgcactcaaatatgaagaaaagcatgattaaccagaaaaagtaacatttgagactcaagctcgcgtatttcattttttttttcgattttagcacaaaatcaacactcttttgaaaaaaatattcgaaatattcgctcctaatttttaatgatttccagatgacaagctggttcacaccacattctactgaggaaatcgcttcaaataaccgaaatcatcccgacattggctccagcagctccgcaactacctacagctcgtcaaattgtttgaatatggatgaaggaattctgacgagagacgttgaaatggctttcagtgatgaaattccgactttttctaaaaccgtaatttttttaaaaatttcaaaaataacattatatgattttgcagaggacctccaattccgtgtttcctcctgcaccttatcgaaattcgttcagt',
'ttctttacattattcattcattttatcaacaattcgcacttctatcgctctaaagtcgatcaaaaaagcttatcagcaactgccgtcgagtgaaatgcgatacaatttgtctgtgaaaaaacaaatatcgattttccttgaatatcgtgaagaacaaatcttcatacttacgattattcacaaattcgatgaaaatctgatagttttttccaatttcagctctataattccaaaaaaaaaccttctgttaattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtccatgtaccccccgccgggccgtg',
'gctgatattgacgaattgtcgtcgaataaaatgcgcgtacgcaaagtacgcaatagtgtacaaaacttattcgcgttgaatattcagtcagccaaatgggactactgtagaaatttcttcggccacgt',
'cacggtggttcaatcgctagatggaggcaaacacgctctattgcgaacaattgtttaaatttcaagcctttcactcggcatcgattctctccgttttttgccgattttaaccgtttttccgtatcttttttgttatttttttttgtaaaatcgaaaaagtgtgtcttgtttcaggtagtaagcgaaaaaattcgaggaaaacaaaggagcgtcggcgtgggcagcaaggaaaacgaattaagtgagttttttgacgagttctcttaaaaaatcacaaaatattttatttttttgattgaaaaagagagagctaaaaattctgaacaagatacgggccgaattttttagaaatcgtaccttcttcattaatttttttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttatatttttattgccgaattagaatcagctaaaaaatctctacaagataaaggcaaacatatacacgaatcgtacatttttcacttaaaattttccaaaagtaaattattttcactgaaaatagtttttaaaaacgttccctttttcaagaaaatcaaaaactctttacgatttatagttgttttatgatcaaaatttaattgtgtataagatggaactcaattcatagagaaatcttaatttttcattaaaattttttcgaaaaaatttttttttgtaataagaatttataaaataattttttttgccaaaaaaatcacaaaatgtgttagttttttgcttcaaagagaaagaactgtttattttaagcaagatacagagagaattttttagaaatcgtacattcttcaagaaaattttttcgaaactaaaattttgcactgaaaatggaggaaaaatgatttttttcgaaaaaaatcacaaaatgttttatttttattgccgaattagaatcagctaaaaattctctacaagataaaggcaaacatatacacaaatcgtacatttttcacttaaaattttccaaaagtaaatttttttcactgaaaattgaggaaaaaatttttttttcagatgcctcctccatcaaatcaatcagcatggtcttcaaattttttaaatgctcggaaacgacgatggaaaaaagatgaagaagacgaaaattcaaatactgaaagaaatgcttccctggacctaaacctgtcgaatgtagaattgacagaagctcttgacagaaacggagatgcgccgatcactcctgagagaagacgagtgtctccatttgtggattttggtgataatctggatgtggagtcgaaacgcaatattgatttgaatctgacaataacacctgaaactgatattgtaagtgaattaatccatgctacaactcggatcctaagacgattataatattttaaaaataattttcccagccgttcgacacacaaccatcaacttcttctattgcggaatcttcttcatcacctccaagaacaccacactcttttgaaagatcgtcaagtccactctattcatcgagtgacagcgtcggcgtattttcgaatgagacaccatatgtgaagggaattagagaagacttgagtagagaacgaatgagattgctagaagttgattccattttggatagcatatcagaggtagatttatttctttgttgttgattgtttattcaattttccatttcagaaccttgacgccatcctgtccagctcaaaccattcggtccagctcaatccaagaacagacattgacgaagagacacttctaaatcggttgatggagagagacgacgagtttgccgttttttctgatgaatttttcatgcaaaggaaagagataatttcaataagaaaaacatcaaaacggaaaaccaacgaaaatctgatattaagaaaacaaatcgctcaaattacgtcaagggcgaaaaagtacagaataatatggatatcagagttgcgaagaaggaatcgttctcaaagccgaaacgttgtcacgcttcccaagtatatcatcgaatggagcaagctcgaaaataagagctctggaaaatctaggatagatttggcactggatttgcttcaacgaatcagtcgagaagaggatattcgtaatttcatgattgatttccagaaatatctccaacaaaatgattcgcagtccttccaagtccaattgacgcattggcaaacgattatttttcaagaaaaatgtagattctctaacaatcaattgcggcgggtgaagcaatactataagatgttcaccgatcttgaaatcatgccaactattaatttgacgatgcaactgaaaacgagaatgtcaactatcgacaattataaggccacaacatatatatcaaaaggagaaaaggtgaccgttgtcaaaattatcgatgtcgagaaaagtgtcattccgaagctggagcgtttgtctgtatcaaaacaattacgccatgattcctataccgatggcaaaattgttatcggaattggtggagattcgggaggaggaacgacgaaactttgtcttctgatcggaaattgtgatcatgcgaactcgccgcaccgaattgtccttctcgctgtttttgatgactctgattcgcgggaacttatcatggcctacctgtctgatcttatcgtgaagatcaacaacttcaccagcatcacctatatggaagatggaagggtggtgacacgtggcgttgtccaaaaagtagtcggtgattttaagtttacatgtgatttactgtcacataaaaaacaatgtgccacctatttttgcccgttttgtttcgaaaccaatccaagaggaggactgatgcaaaacttaaagatctgaacttgcaaaagatatatttcttgagaacaatgaattcttacaagctcaattctaaatacggtagctttggtgtcagatgcggaagtggaccaattcttcaaaatgtcaagttggaacactacttgcctgcaatgctacacttgattgtcggactgttcacgaaatacatctttgagccgatttggatggccgttgtgagtttagacaataaaacatcatttgaaattcgaagaaataaagaagagacaaaaagagttgccgatttgaagattcaagacgcgaacaaaaaattcgaagcttcaccacttaagagaaaaagagagatgaaagcacagtttactgctctgaaagaagaaaaagtgttattggacgagacgctggatggacttgctggtggatatcttaaacaatttgagaatgatctggaagaagttggtgcaacacgccgagcgtggtttcaaatgtacactggagctgattttgtctgagaaagctgattttgtctgagaaaggtgtcacagctgcattcaaaaatctgaaaaaccacatgactccgatgcttctgaatgtcaagaacgcgatgagcaggctgtccaaaatcatgtcattgtcggccaatcgattgctctctgacgatgacatcagcgaattggatgagtcgatgaaggaattcgttgaattcctacaagctgcccacccagaagaatcaatcactcaaaaattgcatgtcctggttgctcacgtagtagaagtcgcaaaaacggaaaggaatttgggaaggctttcggagcaaggaatcgaatcgcttcatgccgttttcaatcgcctcgaaagacgcttccactcggttagaaacacagggaaaagatacctgtacattgctaaggagctgtcatgtagcaatctaatttctgatatggaggaagtaagtggaacctttcaaaaaaacaaatcgtaacattttctttgcaggacgcctcgtcttcccaataaaccacccagcgctctccctcacacaattctgaaccttttcgcaattaaaatttcgggatttcctcaatatcataaaattcgggattttctatggttagttgtgcttaaaattattaaaaaattattaatttcattaaatttcaggtaaaaacattgacaagcagatactcggtgcaaatttgatggatttctcatcaatattttgtcgaggatttcataagcggagatgctccacatggtttctttcaacttggaaaattgtcgatgtgacaccttaggattctatcgaaaagctaaaattcaggatttccttcggtttgttgggattaaaattattcgaaaataattgattttatgaaatttcagattgaaaccttgtcaatgagctactcagcaaaatttcgatggatttatctcaaaaagttcaccaggagtcgaggcaattaagaaatttcaccgggtttccttattttcgaaagttttaccaagaaaaatttcaacttgcatccgtaaagcgaagaaggaggctgattccacatattaaattcgttttttgaatcgatttgtatgtatttttcaattttcaatcttccgtttcgattatttctttgttttactgttcaaaattaatttttctgtgtttcaataattcaattctcaattgttagctactaaattgaataataaaatctattttttatttttttttcaccttgaaactattaattctgccttttttctgctaaaatttgcttcaaaaatctatttgccgctgcaatgttttgcggactacggtacgcaagtacgcaaacaccgcgatgacacattgcggaccatttcgctgcgtacctgcgagatctttctcaaattttacgagagatctagtttttgtgataccgtg',
'atgtcgtcaactaaaatgccgcggtacacgtccgcagtcggtgaacaaaacttttcgttgaatactcagtcagccaaatttaactactgtagaaatttctccccacacgtcgcgattgccgctccgtg',
'ttgtttaaaataagtttccttttttttgaatacgcaaagaacttgatttttccaaaaaaaaaaattgttttcgaatttttatgataaaaaaaaatttttt',
'cacggcacggcgaacgggacatggcctagaaagttgcgaaaactagtcccgcggctgcaatgctttattttttcatttggttgcggtgggtttttcgatgttttttcgtgtttttttaatgttttttcgtcgttttttcttgttttcttcgttaataattgaatttaaaataatattttcagcaaaaggccacaaatcgccgcaaaaatcgaccgcgtgagccgcgaaacggcggaaaacgtctaaaagtgagttgtttttctaaatatctcaaaaatttgacgcttatttttttcgaaaaagctctcaaatatggagaaaaacatgtttgttctgaattagaaagatttgagactcaagctcgcgtatttcaaatttattttttcgattttagcgaaaaatcaacacattttcgatcatttttgaacaaaaaaattgttttctcaaaaatttgacgcttattttttttgaaactgcactcaaatatgaagaaaattttgattaaccggaaaaagtaacatttgagacccaagctcgcgtatttcatatttattttttcgattttagcgaaaaatcaacacattttcgatcattttcgaacaaaaaattcattttctcaaaaattcaacgcttattttttttgaaactgcactcaaatatgaagaaaaacatgcttgttctgaattagtaagaattgcgactcaagctcgcgtatctcatatttatttgttcgatttcagtggaaaatcaacacatttttgggaaaaattttttttcgaaa',
'ttctaatttttaatgattttcagatgacacgccggttcacaccacattctactgaggagatcgcttcaaataaccgaaaccatcccgacattggctcca',
'ctctatattattcattcattttatcaacaattcgcacttctatcgccctaacgtcgatcaaaaaagctcatcagcaactgccgtcgagtgaaatgcgatagaatttgtctgtgaataaccaaatatcgattttccttgtatatcgtgaagaacaaatcttcatatttacgattcttcacaaattcgatgaaaatctgatagttttttcaatttcagctctataattccaaaaaaaatcttttgtcaatttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacaaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtctatgtaccccccgccgggccgtg',
'gctgctgcagtgttttggcgactacggtacgctgttacgcaaaccgcgcaatgacaacatt',
'attgaacagggcatgaaaggattcaatgccctgctccgatgatcttccc',
'cacggcccggcgaaagagacgtggccgcgagagctgcgccggctaggccaccgcctcctatggttaagatttttgaacggataaaaatttttaatttggctgctaagctcatttatcttcgttttttctcgttttttctcatttttatcgataaaaatatattttttgttgcagaaaatcaaaaaaccacgacaaaacagcactcaaccgccaactgggaggaggaaaatccgaaaaaagtgagtttttt',
'tgcgaaaaactgtttaaagtatcgattttcttagtaaatatcagcatcataaaattatttaaaattattattttctgtaaattcgataaaaatccatttatttttcacaatttctgcccgaaaattaataaccagcgtttctataactaagaaggtgtcgtcaattaaaatgccgcgtccgcaaaatgtcgtacgaaacttttcgctgagtatcaaacgttgaatattcagtcagccaaattttactacggtagagattttacagccacgtacggttcgccgggccgtg',
'gttcgttcagctcaattttgtctactttaattaagttggagtagtttgttctaacttcaaaaacgtgattttcagacgtgcagctcgtcgccacaggctaatcaggaaatcagttcagaagatcgaaactcccaccatcccgacattgcttccagtagctccgcaactacatgcaggtcgtcaaattgtttcaatatggatgaaggaattctgacgagagacgtggaaatggattttagtgacgaaattccgactttttctcacaatgtaattttttttcaatatttcaaaactaatattatatgattttgcagagattctccaatcccgtgtttccaccgacaccttcacgaaactcgttcagtacaccgagggttcgaataatctccaatgttgagcaaattccgcagatcaaaatgctccgactaacgatcgaacgactgaaaaatgaaaaaatgaaatatcagaagaaatacaaaagcggaatggtgagtttgcggtgttttcggaagaaatactaatagtccgccaagaaaatatcactgtaagagcatcgaattcttcgctgcaaatggaagtatgcaatatgcgaacgatcgtcagtggtatagaaggtcgattggtgagaagctgggcagatgtgaagaaaagtggcaaagctatggttctgagtctacaaagacacgccgaagaaattgaaaacctcaaaaaatcagcttttaaagttaaattttcgaatctaagaaacaaaagttgtatcaaaaatcgatttcgcaaagctcttcacttgcttcaacacgaaatctgcctcgaggatgacgtcggacaatttattcgaaaattcacaaaattcctgaattcagaagagaatcaaagctacaaaaacaagctttcaaactttgaggccgttgactttttatcaaaatgcggattatctctaagtcagatggaaaaaatcaaaaactatttgactaattccattggctatgatttgcttccattggtaaagaacacaagagatttgtcaatccagttatcaatgatctccagtttcaaagtttccacttcgttcgacaataaaggaaaagtgattaccattgtgcagtgcatgaaaattgcagaagttttagcctatcgaatagagcttctatgcaattcaaatcaatttgtggacgatggctacacgaaaggagtaatcaagattggagtcgttggtgacgctggaggtggaagcacaaagttggcattggtaatcggaatgtatctcgaccgaattcttcgagacacgttatggtaattgcagtatatgacggatccgacaattactcgtgtctgaaaaaattcattcccgacgttcttgagcagcttggcaagctgacaaaaattcgatatttggacaaaggagttcaaaaaaccgcaaaaattgtgcaaattgcaactggagattgcaagtttcaatttgatattcttggacaccaaggacattcatcccacaatttctgcttcaagtgttttgcccagaatccacggggagctgaagagaggatgaaaatcaaggatatgaatgttgatgaaacattccacccacgaacaatcgatctttacaaagcttgttgtactccacttctgccacacgttgcgatcatgttctatttgattcctcttctgcacattatcatgggtatttttgacaaatatatcttcaatcccctctggaaatattctgtcactttggataacactacttgtttccctatcctgaaaacgcgaaaagcaactctaaaaaacgctgttaacctaatcaaaagtgcggaagaaaagtatcaagcagccaccggaaagatgaaattggagtcgcatgctgaactgaaagcacttcaatcggaaaaattgttgctcgacacaattgtgaatggtactcccggaggaacacttgaaaaaatggaacagtgttgggctaagtttggagcggacaagcaggcttggttccaatcattttgcggaaaccatttgaaattgctgctaacgccggctattgtagaggagactttcaacatttttggcccaaatttatgcccaatgttgctcggattgaaatcagcaatgggaaagctttcaactattatgtcactttcgggaaacaagtttttaaatgattccgacgtcttaatacttcaaaattcgattcgcggatttgtggaggatttgaaagtagcggttcccgaagaaacaattattttaaagctgcacttggtggtctaccatgcaccacaaatggcaaaagatgtgagaaacattggaaggatcacagaaccaggagttgaatctgtccacgcgattttcaatgcacttgagaggagattttgcaattaccgggacaagaaaaggcgttacatccatgttctgagagagctcatgtgtcgtaacatcatgaatgacatgactatggtaatttttgtacagaaaaaaatttctgcattcttaaaaacacaactatttcagaacacgtccattcccggtctttccatcgtaccaaaagatgcaacagttctgccaatttcttcgcaaaatgatcttcgggacccaaaagtagttattcgggacgtcacggctgcccaaaaaagaaaaaatttagcgaaaaaaaaaatcaattttagctccaagaaaaggcttccacaatcgacaaaatctcgaaatgctgtggtgagatttttatttttcacgaaaagtttcaaattttgaacattttcagcccgacttctcacaaaacgtacaaacttctattgcatttggaggcaattcaacgaaaattggattccgtggtctaccgagcaatgctctttctcgtccgaacccgatggtaggttcactgctttaaatcaactcaaaattgaaatcaaaaattatttcaggcaatcacacaaacttgctcgacaagccagccaatcactagaagccatccacttgttgctccaccagtatcttatcaacttccgaaatgcagtaatagtttagctgaagtgagtgtaaaggtgaaatgacaatttcaataagttaaaaattttagaaaattcgagcaagttgtgggcaagcaagaaggagtgctgcagaatcttcttcgactgaaagccacgtggacacaaattcgaagaagaaagtttacagactaaaattcgtccaaaaaagtcgtgaaattccgaagaaatagtaatatatcgattttctgttttttttattatgatttctaatatttttattttaatgcttaatttaaaatgaccgaaaataaagaattgattctcaaaaagtaatctcagagcaatatatgttgtttctttatataatttatccattttatcaacaaatcgcatttttatcgctcaaaaatcgataaaaacagctcatcagaaactttcatcgagtaaaatgagacatcattcgtctgtgaaaagctaaatatcgattttccttgaatatagtgaagaaaagatcttcatacttacgatttttcacaaattcgatgaaaatttgatagttttttttcaatttcagctctataattccgaaaaaaatcttctgataattttcgcgaaatgtcgtcaattaaaatgcgcgctgtgcagtacgcaacggtgtacgaaactttaaactgagtattcaacgttgagcacgcagtcagccaaattggaatacggtagagaattctcgtccatgtaccccccgccgggccgtg',
'ttgttcggtggatcgcgtttgcacccatctagcaactgaaccacagtg',
'GAAGCCAATGTCGGGATGGTTTCGGTTATTTGAAGCGATTTCCTTAGTAGAATGTGGTGTGAACCGGCGtgccatctggaaatc',
'tgccatctggaaatccttaaaaatttggtgcgaatatttcgaaaaaaagttttccaaaaatgtgttgattttccactaaaatcgaaaaaataaatatgaaatacgcgagcttgagtctcaattcttactaattcagaacaagcatttttttctccatattcgagtgcagtttcaaaaaaattaaacgttgaatttttgagaaaataatttttttgttcgaaaatgtgttgattttccactaacataaaatcgaaaaagt',
'cacggtggttcagttgctagatgggtgcaaacgcgctccaccgaataa',
'gtcttaatacttcaaaattcgattcgcggatttgtggaggatttgaaagtagctgttcccgaagaaacaattattttaaagctgcacttgttgatctaccatgcaccacaaatgacaaaagatctgagaaacattggaaagatcacagaacaaggagttgaatctgtccacgcgattttcaatgcacttgagaggagattttgcaattaccgggacaagaaaagacgttacatccatgttctgagagagctcatgtgtcgtaacatcatgaatggtaattttggtacagaaaaacatttctgcattcttaaaaacacaactatttcagaacacgtccattcccggtctttccatcgtaccaaaatatgcaacagttctgccaatttcttcgcaaaatgatcttcgggacccaaaagcagttatacgggacgtcacggctgcccaaaaaagaaaaaatttagcgaaaaaaaaaaatcaattttcgctccaagaaaaggcttccacaatcgacaaaatctcgaaatgctgtggtgagatttttatttttcacgaaaagttttaaattttgaaaattttcagcccgacttctcacaaaacgtacaaacttctatttcatttggaggcaattcgacgaaaattggattccgtggtctaccgagcaatgctctttctcgtccgaacccgatggtacgttcactgctttaaatccactcaaaattgaaatcaaaaattatttcagccaatcacacaaacttgctcaacaagccagcaaatcactagaagccatccacttgttgctccaccagtatcttatcaacttccgaaatgcagtaatagtttagctgaagtgagtgtaaacgcgaaatgacaatttcaataagttaaaaattttagaaaattcgagcgagttgtgggcaagcaagaaggagtgctgcagaatcttcttcgactcaaagccacgtggacacaaattcgaagaagaaagtttacagactaaaattcgttcaaaaaagtcgtgaaattccgaagaaatagtaatatatcgattttctgtttaaatatttatgatttctaatatttttattttaatgcttcatttaaaatgaccgaaaaataaagaatagattttcaaaaagtaatctcagagcaatatatgttgtttctttatataatttatccattttatcaacaaatcgcatttttatccctcaaaaatcgataaaaacagctca',
'ttgttcggtggagcgcgtttgcacccatttagcaactgaaccaccgtg',
'actacggtgtgcaagtacgcaaacaccgcggcggcaatttgc',
'ttcttcaaaaaaacttcttcgaaattcaaattttgcaccaaaaa',
'ttgttcggtggagcgcgtttgcacctatttaacaactgaaccaccgtg',
'tattaattgctaaaatttatgtggactacggtagtcaagtccgcaaacaccacg',
'cgtggtgtttgcggacttgactaccgtagtccacataaattttagcaattaata',
'aaattatacacgtttgttcggtggagcgagtttgcttccatctagcaactgaaccaccgtg',
'gtgtgcaacttgccgccgcggtgtttgcgtacttgcacaccgtagt']
In [17]:
from operator import attrgetter
' '.join(map(attrgetter('rep_cl'), reps[:60]))
Out[17]:
'Simple_repeat Simple_repeat Satellite Simple_repeat DNA/MULE-MuDR DNA DNA DNA Simple_repeat Unknown Unknown DNA/PiggyBac? DNA DNA DNA DNA DNA DNA DNA DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA DNA Simple_repeat DNA/hAT DNA/hAT DNA/MULE-MuDR DNA/hAT DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/hAT DNA/MULE-MuDR Simple_repeat DNA/hAT Unknown Unknown Unknown DNA DNA DNA/CMC-Chapaev DNA/CMC-Chapaev DNA/CMC-Chapaev RC/Helitron Simple_repeat DNA DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Tc1? DNA/TcMar-Pogo Simple_repeat DNA Simple_repeat DNA DNA DNA/TcMar-Tc1 DNA/TcMar-Tc1 DNA/hAT Simple_repeat'
You'll notice a few things. (1) The family names seem to have some hierarchical relationships; e.g. DNA/TcMar-Tc1
seems to be more specific than DNA
, (2) some of them end in a question mark, (3) some of them are Unknown
. I don't really know what these mean or what to do as a result -- you'll have to navigate that issue. Seems like you can often look up the family names on RepeatMasker's site and find more detailed info (e.g., here are the details for DNA/TcMar-Tc1).
Dfam is an alternative. Note that Dfam ultimately relies on Repbase for its "seed alignments." Also, the only the human genome has a pre-built Dfam database, as far as I can tell.
Content source: BenLangmead/comp-genomics-class
Similar notebooks: