Exercise 2:

Modules, I/O, Files

1

Guessing game: Generate a random number and accept input for user to guess and output if their guess is correct or not.


In [ ]:
import random #module random
n = 20
to_be_guessed = int(n * random.random()) + 1
guess = 0
while guess != to_be_guessed:
    guess = int(input("New number: "))
    if guess > 0:
        if guess > to_be_guessed:
            print("Number too large")
        elif guess < to_be_guessed:
            print("Number too small")
    else:
        print("Sorry that you're giving up!")
        break
else:
    print("Congratulation. You made it!")

Might add one exercise to work with Math Module

3

Write a Function to read input from file (Sample1.fasta) line by line and store it to list


In [1]:
def file_read(fname):
        with open (fname, "r") as myfile:
                data=myfile.readlines()
                print(data)
file_read('Sample1.fasta')


['>YP_008320337.1 terminase small subunit [Paenibacillus phage phiIBB_Pl23]\n', 'MKGGEPEMAVPTSKLIREYLGETYEESDEQLIQLYIETHQFYRRLQKEIKNSELMYEYTNKAGATNLVKNPLSIELTKTV\n', 'QTLNNLLKSLGLTPAQRKKVVSEDDDDFDDF\n', '>YP_008320338.1 terminase large subunit [Paenibacillus phage phiIBB_Pl23]\n', 'MTMTSTTSNLPGILSQPSSELLTNWYAEQVVQGHILASHKVMLAGKRHLDDLKRQGSKDFPYVFDEEKGHRPIVFIERFC\n', 'KPSKGKFKQMIMQPWQHFILGNLYGWVHKETGLRRFTEGLIFIARKNGKSGLASGISIYGCTKDGERGADVYVLANSMKQ\n', 'VRKTIFDECKKMIKASPQLKKKMKALRDVIEYKQTNSIIEPQASDSEKLDGLNTHLAVFDEIHEYKNYDLINIIKNSTDT\n', 'REQPLLLYITTAGYQLDGPLVDYYELGADVLEGVVSDERTFYYMAELDSEEEIDNPDMWGKANPNLGVTYDLEKLKNAWE\n', 'KRKNIPAERSDMIVKRFNIFVKADEMSFIDFNTLRKNNKHLDIDSLNGKTAIGSFDLSESEDFTSACLEFPLDTGEIFVL\n', 'SHSWIPRKKVLANNEKIPYMQFVEDGSLTVCEAEYVEYEMIYDWFVNHSKTFSIEKIAYDRAKAFRLVKALESYGFQTEI\n', 'VRQGAETLTKPLSDLKEMFYDGKVITNENKLLRWYINNVKLTQDRNRNWHPTKQNRYRKIDGFAALLNAHVFVMEKLVAP\n', 'KGNGNIEFLSVGDLFH\n', '>YP_008320339.1 portal protein [Paenibacillus phage phiIBB_Pl23]\n', 'MKWFGKMKSAVRGAISGWKGGSGDFSTWFGRRFWGIDNTKLATNETIFSVVSRLANALSCLPLKLYKDYDIQMNETADML\n', 'IHHPNPNMSGFEWLNKMEVSRNETGNGYAVIMRDIRLQPEALIPIDPVYVTPILNQDDGHLWYEVRGIDGTYYLHNMNMF\n', 'HVKHITGAARWKGISPIEVLKNTLEYDKAVQEFSLSEMQKKDSFILEYGASVDTEKRQRIVDDFKRFYKENGGILFQEPG\n', 'VTVTNMERKYVASDTLASEKITRSRVANVFNLPVNFLNEEGQGSHAEQMMIQFVQMTLTPTVRQYEQEMNRKLLTSEERQ\n', 'AGYYFKFNLGALLRGDTAARTQFYQMMLRSAGMKPDEVRMYEDLPPEGGKAAELWISGDMYPLNMDPAERKGVKERGETK\n', 'KEHVLGDEDVG\n', '>YP_008320340.1 Clp protease-like protein [Paenibacillus phage phiIBB_Pl23]\n', 'MSADDSSSADIFIYGDIVTYQWDEVDTSATSFKEDLDRLGDVSNLNLYINSPGGSVFEGIAIHNMLKRHKAKVNVYVDAL\n', 'AASIASVIAMAGDTIYMPKNSMLMIHNPWTYAWGNASEMRKIADDLDRIGNSSKQVYLQKAGDKLSDEKLQEMLDAETWL\n', 'SADEAFEYGLCDVVQEANTMAASISDACMNRYKNVPKQLISQQQTPISAGDMAKRQQIADESKAHAAYIQTILGGIFE\n', '>YP_008320341.1 major capsid protein [Paenibacillus phage phiIBB_Pl23]\n', 'MKTLYELKQNLATIGQQLQKTESDLAAKAIDPSTTMEAIQALQKSKEDLKMRFDVVKQQHDALEAEQAAKLKADKGIQNT\n', 'ADPVQKKIQAKAELIRATMQKQAVTQDVFQALGDNDTTGGNKFLPKTVSTDILVEPTVKNPLRQLSSVTQITNLEIPKLH\n', 'FTLDDDDFIADTETAKEMKADGDTVTFGRNKFKVLAGVSETVINGSDANLVSYVETALQSGVAAKEKKVAFATKPKTGEE\n', 'HMSFYKSGIKEIVAENMFDAITDAIADLHEDYRENATIVMRYQDYKNIIKILANGSATLYTAQPEQVLGKPVVFCDSAES\n', 'PVIGDFAYSHFNYDLNALYDREKDVKTGIEQFVVTAWFDHQIKLKSAFRIAKVQTP\n', '>YP_008320342.1 hypothetical protein IBBPl23_06 [Paenibacillus phage phiIBB_Pl23]\n', 'MPKYKLPNPSPGEPEQPEKPEKPEQPEQPEQPKKESEEAKPAGTPKSKRSKADDVNG\n', '>YP_008320343.1 head-tail connector protein [Paenibacillus phage phiIBB_Pl23]\n', 'MANISLEEVKEYLRVDDDAGDQTLAILLESAKEYLANAGVTESNHALYKLAVMVWVAIHYEMDDRTLHKLKQSLQTMILQ\n', 'LREVSAT\n', '>YP_008320344.1 head-tail joining protein [Paenibacillus phage phiIBB_Pl23]\n', 'MNPGKLNKRITIKKPSPNPDGAGGYDDGLADVATIWANIRPLRGREYWQSQQTQAEVTHSIMIRYRKDIDRSHVVSYSGR\n', 'LFDIQHIINVDEANRTLILHCVEKI\n', '>YP_008320345.1 hypothetical protein IBBPl23_09 [Paenibacillus phage phiIBB_Pl23]\n', 'MANIQVLGVPETVRKIGLFEMERKQAAIVLVKKTATSIQKEGKSLAPSSPAGRKKSKGKPGDLKRSIRPKYMEGGLSATV\n', 'VPRKPKGAHRHLVEYGTRQRKNKKGANRGKMPKKPFMSIAEKHAEGRYNKELERIFSRDETI\n', '>YP_008320346.1 hypothetical protein IBBPl23_10 [Paenibacillus phage phiIBB_Pl23]\n', 'MTKLYEVQEAVYRRLTSDTALMPMIKGVYDYVPEKTLLPYVTFSRVYSEPFETKTSTGEIVTLTLDVFSEAKGKKESIHI\n', 'LKQIEASLTPELEVEGAFLMDQSVVSREVQEIAESLYQATIEYKIKLDWSE\n', '>YP_008320347.1 major tail protein [Paenibacillus phage phiIBB_Pl23]\n', 'MATKLAGMKCKLFVGSAKEKGKILAGQRSATISRSAETIDATSKDTEGYWKESLQGFKEWSIDADGVFVESDQAYKELED\n', 'AWLNSENVKIYIELPSGRRYAGEATITDASLEMPYDDLVTYSLSFQGSGALQMIETIPGKGETKE\n', '>YP_008320348.1 hypothetical protein IBBPl23_12 [Paenibacillus phage phiIBB_Pl23]\n', 'MKKVTEFHLEDAVYKIRITYSTLLKMRDDGIDMMTEKGSAEIKKDPGKLAKIFWFGLNGVKGQEYTFEQAMDILDDILSE\n', 'IYMEDFMEILQDSVQIKSRQAEEQIAKKKKRNNDR\n']

3

Write a script to do the following to "Python1.txt"

  • Open and read the contents.
  • Uppercase each line
  • Print each line to the STDOUT

In [16]:
with open('Python1.txt', 'r') as song:
  for line in song:
    print(line.upper(), end='')


RUNNIN' DOWN A DREAM
BY TOM PETTY
IT WAS A BEAUTIFUL DAY, THE SUN BEAT DOWN
I HAD THE RADIO ON, I WAS DRIVIN'
TREES FLEW BY, ME AND DEL WERE SINGIN' LITTLE RUNAWAY
I WAS FLYIN'
YEAH RUNNIN' DOWN A DREAM
THAT NEVER WOULD COME TO ME
WORKIN' ON A MYSTERY, GOIN' WHEREVER IT LEADS
RUNNIN' DOWN A DREAM
I FELT SO GOOD LIKE ANYTHING WAS POSSIBLE
I HIT CRUISE CONTROL AND RUBBED MY EYES
THE LAST THREE DAYS THE RAIN WAS UNSTOPPABLE
IT WAS ALWAYS COLD, NO SUNSHINE
YEAH RUNNIN' DOWN A DREAM
THAT NEVER WOULD COME TO ME
WORKIN' ON A MYSTERY, GOIN' WHEREVER IT LEADS
RUNNIN' DOWN A DREAM
I ROLLED ON AS THE SKY GREW DARK
I PUT THE PEDAL DOWN TO MAKE SOME TIME
THERE'S SOMETHING GOOD WAITIN' DOWN THIS ROAD
I'M PICKIN' UP WHATEVER'S MINE
YEAH RUNNIN' DOWN A DREAM
THAT NEVER WOULD COME TO ME
WORKIN' ON A MYSTERY, GOIN' WHEREVER IT LEADS
RUNNIN' DOWN A DREAM

4

Modifiy the script in the previous problem to write the contents to a new file called "Python1_uc.txt"


In [17]:
song_uppercase=open('Python1_uc.txt', 'w')
with open('Python1.txt', 'r') as song:
  for line in song:
    print(line.upper(), end='', file=song_uppercase)
song_uppercase.close()

5

Write a list of content to file and print content of file to STDOUT
Input: top-10 most-studied genes are: TP53; TNF; EGFR; VEGFA; APOE; IL6; TGFBI; MTHFR; ESR1; AKT1.


In [18]:
color = ['TP53','TNF','EGFR','VEGFA']
with open('genes.txt', "w") as myfile:
        for c in color:
                myfile.write("%s\n" % c)

content = open('genes.txt')
print(content.read())


TP53
TNF
EGFR
VEGFA

6

Given Dictionary with restriction enzyme and motifs:

Accept enzyme name from the command line and print the value(motif) of the enzyme from the dictionary. Maybe you want to print out all the keys to the user so that they know what to pick from.
Hint: use input()


In [5]:
enzymes = { 'EcoRI':'GAATTC','AvaII':'GGACC', 'BisI':'GCATGCGC' , 'SacII': r'CCGCGG','BamHI': 'GGATCC'}
keyOptions = []

for key in enzymes.keys():
    keyOptions.append(key)

print(keyOptions)

chosen = input("From the list of enzymes above, Type enzyme name to modify the motif:\n")

if chosen in keyOptions:
    enzyme = input("What is your favorite thing in the category you chose?\n")
    enzymes[chosen] = enzyme

print(enzymes)

print('\nDone.\n')


['EcoRI', 'AvaII', 'BisI', 'SacII', 'BamHI']
From the list of enzymes above, Type enzyme name to modify the motif:

{'EcoRI': 'GAATTC', 'AvaII': 'GGACC', 'BisI': 'GCATGCGC', 'SacII': 'CCGCGG', 'BamHI': 'GGATCC'}

Done.

6

Open and print the reverse complement of each sequence in "Python1_Seqeunces.txt". Each line is the following format: seqName\tsequence\n. Make sure to print the output in fasta format including the sequence name and a note in the description that this is the reverse complement. Print to STDOUT and capture the output into a file with a command line redirect '>'.

Remember is is always a good idea to start with a test set for which you know the correct output.


In [10]:
with open('Python1_Seq.txt', 'r') as sf:
  for line in sf:
    identifier, sequence = line.rstrip().split("\t")
    sequence_RC = sequence.replace('A', 't')
    sequence_RC = sequence.replace('T', 'a')
    sequence_RC = sequence.replace('C', 'g')
    sequence_RC = sequence.replace('G', 'c')
   # sequence_RC = sequence_RC.upper()[::-1]
    print(">{}_reverse_complement\n{}".format(identifier,sequence))


>c0_g1_i1_reverse_complement
GAACTCCAAAAATGAAAACATAGTAGCAATCAAAGCATCCCACTATTTTTTGTCTCTCGTTTCATTAGCGTTGTAAATTACTGATACCCTACTATACCTCTACAAGGCCTTTGTCATCTTTTTACTCAAGTGTGAAATCATCACTTATTGTATGAAGGATGAGCTTTCCGTTCGCTAGTTTGCTGAAAAGGCCTTCTGCAATAAGCTCTCTATTATCTTTAAAAAAACCTGGTTCCTGGTCTTCCATTCTGCTAAAAGCTGTAGGGGTTTTATCACGAGATTCCCGTTGGCATTCTGACTTATTAAAAATGCTTACAGAAGAAATGGATTCTTTAAATGGTCAAATTAATACGTGGACAGATAATAATCCTTTATTAGATGAAATTACGAAGCCATACAGAAAATCTTCAACTCGTTTTTTTCATCCGCTTCTTGTACTTCTAATGTCTAGAGCATCAGTAAATGGGGATCCACCGAGTCAGCAACTATTTCAAAGGTACAAACAACTTGCCCGTGTAACAGAATTGATTCATGCTGCCAATATAATTCATATTAATATTGGAGAAGAACAAAGCAACGAACAGATTAAACTTGCAACGTTGGTTGGAGATTATTTACTCGGAAAGGCGTCTGTTGATTTAGCACATTTAGAAAACAACGCTATTACAGAAATTATGGCTTCTGTTATTGCAAACTTAGTTGAAGGGCACTTCGGAAGCCGACAAAATGGCTCTGTTGGTTTGTCAAACGAACGAACCATCCTTCTGCAATCAGCCTTTATGCCAGCAAAGGCATGTTTATGCGCAAGCATATTGAATAACTCATCACAATACATTAATGATGCGTGTTTCAATTATGGAAAATTTCTAGGCTTATCGCTGCAACTGGCCCATAAGCCTGTATCTCCTGACGCCCAAGTTTTGCAAAAGAATAATGACATTTTGAAAACATATGTTGAGAATGCCAAGAGCTCATTGTCTGTTTTCCCCGATATAGAGGCTAAGCAAGCTCTCATGGAAATCGCTAATAGTGTTTCGAAGTAATCGACAGGTATTGTATCCTGGATTAATATTAGGGTGGCTCATGCATGCTCGTGCAATCGTAACAAATATGTCTTTCTTTTACGAATTTTAACGCTTCAATATAAATCATATTTTTCCTCA
>c1_g1_i1_reverse_complement
ACGAAACGTTGAATTGATTTTATATCAATAATATCGATCATTTTTATTCTATTATTTGTTTGTTTGCTTGGCTTTCATCTATTTCTACAGACTATCTTTCCCTAATGTCTATTGCAAAAGGAAAAAATGCATTGCTTGTTGCCAGCAGTTATTATGGCCCATTTTATCCAGATGGTAAAAACACTGGAGTCCATTTTTCAGAGCTTTTAATCCCTTACAATGTTTTCAAAAAAGCAGGTTTTAACGTGCAATTCGTTTCAGAAAATGGCTCTTACAAATTTGACGATCATTCCATTGAGGAGTCAAAATTAGGGGACTTTGAAAGAAAAGTATTTAATGATAAAAACGACGATTTTTGGACTAATCTTAACAATATGAAAAAGGCTTCGGACATAGTTGGAAAAGACTATCAGCTTTTATTTGTGGCAGGTGGGCATGCTGCGATGTTTGACTTACCCAAAGCCACGAATTTACAGGCGGTAGCAAGAGAAGTGTTTACAAATGGTGGTGTTTTATCGGCAGTCTGTCATGGCCCTGTTTTGCTTGCCAATGTAAAGAATCCACAATCAGTTGAAGGCAAAACAGTCGTGTATCATAAGCATGTTACAGCTTTTAATAAGGCTGGAGAGGAAAAAATGGGTGTTATGGATGAATTGAAAAAACGAGGGATGAAATCCTTAAATGAAATATTTGCTGAAGCAGGGGCAACTTTCATTGATCCACCAAACCCCAATGTCAACTTTACTCAGATCGATGGAAAAATTGTAACAGGGGTAAATCCGCA
>c3_g1_i1_reverse_complement
ACGAATCGTACCAAAAATTAAAATTTGCAGCAGACAGATTCATATTACCTTGCAAGGCCAAAAATTTATTAATTTTATTTTTTTTATTTATGTTTATAAGTGCTTGTTAACGAAAGAGATATTTTGTTTTTGTTTCTGGCATTATTCCATTCTATTCCTACTACAGTGATCATAGCAATTAACAATACATCATCACTGTGCTCTTGTAATTTGATTGAATCGTTTTTGACACAATCTTTCATTCTTTAAAATAAAATAAATTTATATATTTCTACATTACATTCTTATTTGTTAGGTAGAAATGGGATTGGGATTTTCTTCGAAAAAGCAATTACCTGCCTACTGTGGTCCCTTGCCTGTGGGGTCTTTAGTCTTAGAGCTGAGTGTTCCAGAGGAATTTCGATGTGAATATAAAACCATTGAACACAAACTAAGAACCGTAAAAGTACGCATTTTTTATCCATTGGACCCAACAAAAGATGTAGAGCCTCGCACCGATGAGCTTTGGTTACCATTCCATGAGGGTATACCCGAAGTTGCGAAAGGTTTTCGTTGGTGGCTTCTTCGTGCGTTCGCTTCTGGTTTGACCAACTTAGCTTTGCCCGTCTACAAAGGAGAGTTGTTTCATCCACCGAACAATGGGAAACTGCCAGTATTTATCTTCAGCCATGGATTGGTTGGCTCGAGAAATGTGTATTCTTCGTTATGTGGTACAATCGCTTCCTATGGTATCGTCGTCTTGGCCATGGAGCATAGAGATAACTCGGCCATCATATCTACAGTGCGTGATCCATTACATCCTGAAGAACCCCCGTACGTTGTTCAGTATCGCGAGATAAGCGACTTTTATGCAGACGCTACGGTTGTGCTTCAGAATGAACGACTTTTATTTCGACAGCAGGAAATCCAAATAGCCCTCCAGATGATTCGAAATATCAATGACCTTGGAACTCCGGACGAAAACTTACCCTTTCTTTGCTCTGTGGACTCTTCTTTTTATAATTCTGTTTTCCAATCCATGAAGGGTAATTTGAATACCGCTCAAGGAGAATTGATTGTTGCTGGTCATTCTTTTGGTGCCGCTACTTGCGCATTCATTTCCGGTTCTTCTACCAAGTCCTTATATAATGACTATATGTTTCACACTGAGTTTAAGTGCTCCATTTTATACGATATATGGATGCTTCCCGTACGCCAGCTCCATTTGAGTACGATGAGGTATCCTACGCTCATGATTATATCTTACGAATTTCGTCGGTTTGTCGACAATTTTCAAGCTCTCGAAAGTTGGCTTGTAAACAAGGATTCAGAAAACCAAAACGCAGGCGAATCCGCTGATGAGAAAATGTCTGTCGTTCCTTTAAAAAAATATTCCCACGTGTTTGTTTATGATGGAACAGTACATGCTAACCAAAGCGATTTACCGATTTTGCTCCCTCGCATGGTACTAAGGGTATTGAAAGGAAAATTTGAAGCAGACCCTTATGAAGCTTTACGCATTAATACTCGTTCATCTGTACAATTCCTTCGTGAAAATCATGTTGAAAATGTTCAGGGAGATAATGATCCTTCGTCGTTGCAAACAAATATTATTCCTGGGTGGGAAAGAATTATGTAGTATTTACCACTTTTACAAAGAGCTAAAAACTCTTAAAAACAAAACTCTTTTGACTTTTCTTGTCTTTTTATCTATCGCCGATGCTAATTATAATATTTATTGACTGTCATGCTAATGAGTTAACTGTTTGGCACAAATTTCTTATAGCTTATAAAGAAGATTTTCGAAAGCTTACCTGTTAATGTGGATTGAATATGGTAACTATTACTTTTTGGGAGCCTGTGCTTAGGACCGCGATGATTAAGGACGAACATTTACGCTAAG
>c4_g1_i1_reverse_complement
CACTAACACCAAACCAGCGACGTTCGGGTACAAACTTCGGAATTCCTTGCTCAAGCATCTATTATAGTGATTTTTGAGATTTTATACAGCTCAAATATTAAATTTCCCCAAATAATGATCCGTTTTGCTCAATATGCTCGATATCCGGTAATTTCAAGGTTGATGAAGCCAACAGTTATTTCACCATTTCAGGCTCAAGCGTTTTCTAGCTCTTCAGTTATGTTAAAGACATTAAATCAAACGATAAGAAATAAAGAGAAACGTCCTGAAAAAACAAATAAGCAATCTGTTGCTTTAGAGGGAAGTCCGTTTCGACGGGGTGTGTGCACGAGGGTCTTTACAGTAAAACCCAAAAAACCTAATTCGGCAGTAAGAAAGGTTGCCCGTGTACGCCTTTCAACCGGCCGTTCTGTTACTGCTTACATTCCAGGTATCGGTCATAATGCACAGGAGCATGCTGTGGTATTGTTAAGAGGCGGCAGAGCACAGGATTGTCCAGGAGTCCAATATCATGTTGTTCGGGGTGTCTATGACATTGCTGGTGTAGCGGGACGTGTCACATCCCGAAGCAAGTATGGAGTAAAAAAACCGAAAGCAGCCTAAACTTCATGACCTCAAAGACTTTCTTTTTCGAAAACCATGATATTCAGAAACGTTATCTTTATAATTATTCCACCTCCTCTTTTGATATCGATTCACAAGGACCTTATTAAAATGACTGAAACAAAAATTTAAATGAGAAAAAAGTTGAGTCAACCCATTTGTCATTTCTCGCAACCTTGCTTCTTGTTGCAACTATACATATTCAATAATCATTTACTCTTGCA
>c5_g1_i1_reverse_complement
GGCGGGCAGCAGTCAAGGTGAAACGTTTGTTAGGATTTTTTTTGTTTACACTGGAGAGAGACAACACAACTCATTATTAATTTTTTCATCTTCGTCTCTCTCGTCTCTTATTATATCATCTCATCTCACCACTCGGTTTCTAGCCCAACTTAACTCAGCTCAACTCAGTTCTACTTAGAACCGATTTGCTCCTCTTTCTTAAAAACTTACTTCAGCTAGGATTCACCTGGCCAGCTTATTCAACTTCATTCCCCGTTTCTTGTTGATTCAGGTTTCGTTTACTCCATCATGAAAAAAGATCTCGACGAAATCGACACTGACATTGTCACACTTTCCTCCTTTATTTTGCAGGAACAGCGCCGATACAATCAGAAGCATAAAAATGAAGAAGGAAAACCTTGCATTATTCAAGAGGCTAGTGGGGAATTGTCATTACTGCTAAACAGTTTACAATTCAGCTTCAAATTCATCGCCAACACCATTCGTAAAGCAGAGTTGGTAAATCTCATTGGTTTATCGGGAATCGTCAATTCCACTGGCGATGAACAAAAGAAGCTAGATAAAATTTGCAATGATATCTTCATTACTGCGATGAAGTCGAACGGATGCTGCAAACTCATCGTTAGTGAAGAAGAGGAGGATCTTATCGTAGTTGATAGCAACGGTTCCTATGCCGTCACTTGCGATCCCATCGATGGCTCTTCCAACATTGACGCCGGTGTTAGTGTAGGTACCATATTTGGTATCTATAAACTGCGACCCGGCAGTCAAGGCGATATTAGCGATGTCTTAAGGCCTGGTAAAGAAATGGTTGCTGCTGGCTATACTATGTATGGTGCTTCGGCTCATTTATTGCTTACCACAGGTCATCGTGTCAATGGCTTCACCTTGGACACTGATATTGGCGAATTCATCTTGACCCATCGAAACATGAAAATGCCTTTGCAACATAGTATTTATTCCATTAATGAGGGTTACACTGCATTTTGGGATGAAAAAATCGCTCGTTTTATTGCTCATCTCAAAGAAAGCACACCAGACAAAAAACCATACTCTGCTCGATACATTGGTTCTATGGTCGCCGATATGCACCGTACCATTCTTTATGGTGGGTTGTTTGCTTATCCATGCTCAAAAGGAAATAACGGAAAGTTGAGACTTCTTTACGAATGTTTCCCCATGGCCTTCCTTGTTGAACAAGCAGGAGGTATTGCGGTAAACGACAAAGGAGATCGCATTCTGGATTTGGTACCCAAAACATTACACGGAAAAAGCAGTATTTGGCTTGGAAGTAAACATGAAGTCGAGGAATATATTAATTTCATAAAATAGTAATTCCTAGTTCTTTGAAGTCCTCTCCGTTACTTTTTATCTTCTTATCACCAAACATGGTTATTCCATTCCAATAATGCAGAAACTGAAACTTTTGTCAATTATTCTCTGCATCATCATAAACATGGATCTC
>c6_g1_i1_reverse_complement
ACGAAGCTTTACCAATTTAATTTAATCTCCGTTTGCATTATTAAAATATGTCTAAAGGTCCAGGAGATTTTAAGAAGTCTTGGAATGGGTTTGCTGCTCAAACGCCACAAAATACTCCATCTTCAGATGTACATCTTAGTAAGGCCGCATTAGAAAAAGCTAGACAAACCTTGCAATCACAACTGGAGGACAAATCAGCACATGATGAAGTTAGCGGGTTATTGCGCAACAATCCGGCCATGTTATCAATGATTGAGGGACGCCTTAGCTCGTTGGTTGGAAAGTCGTCGGGATATATAGAGAGCCTTGCGCCAGCAGTCCAAAATCGTATTACAGCATTGAAAGGCCTTCAAAAAGATTGTGATGCAATTCAATATGAATTTCGTCAAAAAATGTTAGATCTGGAAACAAAATACGAGAAAAAATATCAACCTATTTTCTCTAGAAGAGCCGAAATTATCAAGGGTGTTTCTGAACCAGTGGACGACGAATTAGATCACGAGGAAGAAATATTTCAAAATAATCTTCCAGATCCTAAGGGAATTCCCGAATTTTGGCTGACTTGTCTTCATAATGTTTTCTTGGTTGGCGAAATGATAACACCAGAAGATGAAAATGTCTTGCGATCACTCAGCGACATTCGTTTTACCAACCTTTCGGGAGATGTTCATGGTTACAAACTAGAATTCGAATTTGACTCAAATGATTACTTTACAAACAAGATTTTAACGAAAACTTATTACTATAAAGATGATTTAAGTCCTTCGGGTGAGTTTTTGTATGATCATGCTGAGGGTGATAAGATTAATTGGATCAAGCCTGAAAAAAATTTAACGGTTAGAGTGGAGACAAAAAAACAACGTAACCGTAAAACAAATCAAACTAGATTGGTCCGTACTACGGTCCCTAATGACAGTTTTTTCAACTTTTTCTCACCTCCTCAGTTGGACGATGATGAATCTGATGATGGTTTGGATGACAAAACAGAGCTTTTAGAACTCGACTATCAATTGGGAGAAGTCTTTAAAGATCAAATCATACCCTTGGCAATTGACTGTTTCTTAGAAGAAGGTGACTTGAGCGATTTTAACCAAATGGATGAAGAAGACTCTGAAGACGCATATACTGATGAGGAGGATCTGTCTTCAGATGACGAAGAAATTTTGAGTAGCGAAATTAGTGATTGATATTTATATTGTGTTCGTTAGATTATGAGTAAGTTCAATGTGAGCAAAAATGAGAAATTGACGCTGAATGATAATTTCGCTTATAGTTTGGTAAA
>c7_g1_i1_reverse_complement
TAAGGCAAGTGTAAAATTTTATCAAGTTTTGAGAGAAAACTTCAAATCTAGTACTCAAGGTTATGTTTATACCCCCAGGGACTTAACGCGTTGGCTAATTTCTTTTAAAAACTATGCGGAGTCATATGCAGAAACCAATAACTTGAGTTTAATAAAAGTTTGGTATCACGAAGCGTGTCGTGTTTTATTAGATCGTCTCGTTTCACAAAAGGAATGTTCATGGGGCATGACCGAATTGCAAAAAGTTATTGTGACAGATTTTGGAGAGTTTGAAGTATCCGTGATTTTCGAAAAACAGATCATTTTTACTGATATATTAAAAAATGGTTTGGAAT
>c8_g1_i1_reverse_complement
CTGTCTTTATTCCATCTTTTTTAATACTTTTTCTAAGTTTTCTACTTATTTCCCTACATTTTTTTTTTAAATTTCCCCATCTTTGTTTTGTAAACATTTTAAGCTTCAGTTTTCTATATACTGCGTTTGATTTACTTTTGACACATTTATTTATCTTAAATATTTCTTGTTATATATATACGCATCAGACTGGCTAACTTCTCGTTTGTAGGCTTTACAATGCAGAGGCTAATGTGATTGTGCTATATGGAAACTCGATGGACACATCGGCACGATTATCAGGTATCGTAGTTCTTTCTGTATCTTCCCCCATTCGGGTAAAGAATATTAAATTACGGCTAAGTGGCCGTTCTTTCGTTTGTTGGGCTGATGAGTCCCGTCATGCATCGCCTGGTAACAGAATTCGACGTCAAGTTGTACAAATTCTGGATAAAAGCTGGTCTTTCCTTGCTCCCAACGAGTCAGCTAAAGTAATTGACCAAGGAAACTACGAGTATCCTTTTTATTACGAGCTCCCTCCAGATATTCCTGATTCAATTGAAGGAATTCCAGGATGTCATATTATCTATACACTCACTGCTTCTTTGGAGCGGGCAACTCAGCCTCCTACCAATTTAGAAACTGCTCTGCAGTTTCGTGTTATTAGGACTATTCCTCCAAATAGTCTTGATCTTATGCATTCAGTCAGCGTAAGTGATATTTGGCCCCTTAAAGTTAATTATGAAACAAGTATTCCATCTAAGGTGTATGCAATTGGCTCAGAAATACCTGTTAACATTACTTTGTATCCTTTATTAAAAGGCCTTGATGTAGGAAAAGTGACTTTAGTTCTTAAAGAATACTGTACGCTTTTTATTACTTCCAAGGCCTATTCTTCTACATGTAGAAAGGAGTTTAAGCGTGCATTAGTAAAGAAGACCATTCCGGGATTACCAATGGTGGATGATTATTGGCAAGACCAAATTATGGTAAAGATTCCAGATTCCCTTGGCGAATGTACTCAAGACTGTGACCTAAATTGTATCCGAGTCCATCACAAACTAAGGTTGTCCATTTCGTTATTGAACCCTGATGGCCATGTATCAGAACTACGGAATTCTTTACCTTTAAGTCTTGTTATATCCCCAGTGATGTTTGGTGCTCGACCCACTGAAGGAGTATTCACTGGTGATCACAATTCCTATGTGAATGAAAATATTCTTCCAAGTTACGACAAGCATGTATTTGATGTATTGTGGGATGGCATACCCTCGGAAAATCCACAGTTGCAAAGCGGGTTTACTACTCCTAACTTAAGTCGTAGGAATTCTTCCGATTTCGGTCCAAACAGTCCAGTGAATATTCATTCTAATCCAGTTCCTATTTCTGGACAACAACCGTCTTCTCCCGCTTCAAATTCCAATGCCAATTTTTTCTTTGGAAGCTCTCCACAGTCTATGTCTAGTGAGCAGACAGATATGATGTCGCCAATAACTTCTCCATTAGCACCTTTTTCAGGAGTTACGCGGAGAGCCGCTAGGACGAGAGCTAACAGTGCCTCTTCTGTTTTCAATTCTCAGTTGCAACCATTACAAACGGATTTGCTTTCACCACTTCCTTCTCCAACCAGCAGTAATTCGAGGTTACCTCGTGTACGTAGCGCATGTACCCTTAATGTACAGGAATTGAGCAAAATTCCTCCGTACTACGAAGCTCACAGTGCGTTTACTAATGTCTTGCCGCTTGATGGGCTTCCCCGTTATGAAGAAGCTACAAGACCTTCAAGTCCTACTGAGTCTGTCGAGATTCCTAGTAATACGACGACTATAGCGCCATCTCCTGTACCCACTATCATCGCACCAGCGCTTCCTTCAACTCCTGCGCCTCCGCTTCCTTCCCATCCTATGGCAACTAGGAAAAGTCTTTCTTCTACAAATCTCGTTCGAAGGGGAGTTCGATGAGCGTCTGTTAAAGTGGTTTATTTTCATTCCCTATGATCTAAGATCATTAATTTTAATGATTTTAAAAGTAAATTGGGTTACAAGCATGTTTGTTTTAAGCATTGATCTTTCTGTTTGTGAAAATGCGTGACGCCTTTTTTTCCAAAATATGGATTGTCTGAATTGAATCCAAAAACACTCAAAGTTCGCATTCTATGTTTTCAATTGCATATTATTTGCTAACATTGAAATTATCTTACCGAAATATATAATCTGAAAAAGCATGAAAAATTATGATTACAGCATGTCAAGCATACAAATTATTAGCGAA
>c8_g1_i2_reverse_complement
CCCTATTCAGTCCCGCAACCGTTTTGGTTGAAAATTGCTACATTCCTTCCGATTTTTTGGATTCAAATCTGTTCGTAGATATTCAGAGATACAAAATCTGGTACAGCACCAGAACTTATTATGAATGAATAAGATTAAAGGTGGCCGGCACAAAAGCCCCGTAAAGCTCTTTGAAGTTCGGCTTTACAATGCAGAGGCTAATGTGATTGTGCTATATGGAAACTCGATGGACACATCGGCACGATTATCAGGTATCGTAGTTCTTTCTGTATCTTCCCCCATTCGGGTAAAGAATATTAAATTACGGCTAAGTGGCCGTTCTTTCGTTTGTTGGGCTGATGAGTCCCGTCATGCATCGCCTGGTAACAGAATTCGACGTCAAGTTGTACAAATTCTGGATAAAAGCTGGTCTTTCCTTGCTCCCAACGAGTCAGCTAAAGTAATTGACCAAGGAAACTACGAGTATCCTTTTTATTACGAGCTCCCTCCAGATATTCCTGATTCAATTGAAGGAATTCCAGGATGTCATATTATCTATACACTCACTGCTTCTTTGGAGCGGGCAACTCAGCCTCCTACCAATTTAGAAACTGCTCTGCAGTTTCGTGTTATTAGGACTATTCCTCCAAATAGTCTTGATCTTATGCATTCAGTCAGCGTAAGTGATATTTGGCCCCTTAAAGTTAATTATGAAACAAGTATTCCATCTAAGGTGTATGCAATTGGCTCAGAAATACCTGTTAACATTACTTTGTATCCTTTATTAAAAGGCCTTGATGTAGGAAAAGTGACTTTAGTTCTTAAAGAATACTGTACGCTTTTTATTACTTCCAAGGCCTATTCTTCTACATGTAGAAAGGAGTTTAAGCGTGCATTAGTAAAGAAGACCATTCCGGGATTACCAATGGTGGATGATTATTGGCAAGACCAAATTATGGTAAAGATTCCAGATTCCCTTGGCGAATGTACTCAAGACTGTGACCTAAATTGTATCCGAGTCCATCACAAACTAAGGTTGTCCATTTCGTTATTGAACCCTGATGGCCATGTATCAGAACTACGGAATTCTTTACCTTTAAGTCTTGTTATATCCCCAGTGATGTTTGGTGCTCGACCCACTGAAGGAGTATTCACTGGTGATCACAATTCCTATGTGAATGAAAATATTCTTCCAAGTTACGACAAGCATGTATTTGATGTATTGTGGGATGGCATACCCTCGGAAAATCCACAGTTGCAAAGCGGGTTTACTACTCCTAACTTAAGTCGTAGGAATTCTTCCGATTTCGGTCCAAACAGTCCAGTGAATATTCATTCTAATCCAGTTCCTATTTCTGGACAACAACCGTCTTCTCCCGCTTCAAATTCCAATGCCAATTTTTTCTTTGGAAGCTCTCCACAGTCTATGTCTAGTGAGCAGACAGATATGATGTCGCCAATAACTTCTCCATTAGCACCTTTTTCAGGAGTTACGCGGAGAGCCGCTAGGACGAGAGCTAACAGTGCCTCTTCTGTTTTCAATTCTCAGTTGCAACCATTACAAACGGATTTGCTTTCACCACTTCCTTCTCCAACCAGCAGTAATTCGAGGTTACCTCGTGTACGTAGCGCATGTACCCTTAATGTACAGGAATTGAGCAAAATTCCTCCGTACTACGAAGCTCACAGTGCGTTTACTAATGTCTTGCCGCTTGATGGGCTTCCCCGTTATGAAGAAGCTACAAGACCTTCAAGTCCTACTGAGTCTGTCGAGATTCCTAGTAATACGACGACTATAGCGCCATCTCCTGTACCCACTATCATCGCACCAGCGCTTCCTTCAACTCCTGCGCCTCCGCTTCCTTCCCATCCTATGGCAACTAGGAAAAGTCTTTCTTCTACAAATCTCGTTCGAAGGGGAGTTCGATGAGCGTCTGTTAAAGTGGTTTATTTTCATTCCCTATGATCTAAGATCATTAATTTTAATGATTTTAAAAGTAAATTGGGTTACAAGCATGTTTGTTTTAAGCATTGATCTTTCTGTTTGTGAAAATGCGTGACGCCTTTTTTTCCAAAATATGGATTGTCTGAATTGAATCCAAAAACACTCAAAGTTCGCATTCTATGTTTTCAATTGCATATTATTTGCTAACATTGAAATTATCTTACCGAAATATATAATCTGAAAAAGCATGAAAAATTATGATTACAGCATGTCAAGCATACAAATTATTAGCGAA
>c9_g1_i1_reverse_complement
CACATAAGGTACTTATATTCGTAACGCATTCATCCAAACGTGTAGCTCTTTTAGATTCTATTAGTTTCTTTTTTTTAGAAGATAGTTGTGAATGATTTTTTTACGCGTTTTCACTAAGAATTGTTTTGCTGTACAACCTGACTTGAAAGCTTAAAGATCTTGAAGACATGGATAGTATAGGGGATAATGTAACGGAAATGGATGCTTTTGTTCGAGATCTTACTATGAAACCGTATGCAGAACTGGAGAAACGAAAAGCCGAATTGCATGCTCAAAAATTAAAATTGGTGCAAAAACGCCAGCAACTGTTACGAGACAATTACAATGTATTAGTCGATTATGCTAGGAATCAAGATGCCTTTTATCAACTTCTCGAGAACAGTAGACACGATTTTAAAGAGCTTGTTTTACATACAAATCAGCTATATGAACCTGTCAAGAGAAGCCAAAACTTTCTAACTTCAATTTCAGAGCATTACAGAGATGCAAAACTAATGCATCAAGTTCAGCCGCAATTAAGTAGTATACTGGAATTACCAGAGTTGATGAACGCTTGTATCGAAAGGAATTACTTTTCCGAAACGTTGGAATTTCAAGCACTTGCGTATCGATTAAAAGATCGATTTGGTACAAACTCTATCATTCAAGAACTAATTACTCAAGTTGAAACGCTGGTGGTTAAACTTACTGAGAAACTTATTTTACAGCTACAGAAGCCATTAAAGCTTTACTCATTAATCAAAGTAGTGACCTATCTCCGGGTAACTGCAAAACTGTCCGAAGCCCAATTGAAATACGTTTTTCTTTATTTTTCATGGAAACAACTACAAACAAGCCTTCGAAATTTAGTGCCATTGCTTGACTACAATAATCCTGAGTTGTACTTACGGAGATATATACAGGTCATTCGTGATCGCGCATTTTCTCTCCTTTTTCAATATCAAAGTGTATTCGGTGAGTCATCAAATGATCGTTTAAACGCTGCAGGTACAGTAGATATACCTAATTCCACCAGCACTTCAGCGTCACCTTTTGAAATGGATCCCGAAGGCTTCAATAACTTTGGTCAAAATATTCTTTCGTCCTTCGTTCGTAAACTACAACTTGAAATTTGCTATGTCTTACAAAAATTTATGCCCAATGTTAAAGATTCTACTTCCAAATTCTCACTACTCCTTCAGCTTTATTATTGCAATCAAAGCTTAACGAAAGTGGGAACAGACATTTCTAT
>c12_g1_i1_reverse_complement
GGGAACGAGGCTCATAATCCGATATCATGGGGAACGTACAAGTTCCATCACCGGTTCTTAGGCCTCCCAACTTATGCAATAAATAGTGGCTTTCTCGCATCTTTTTAAACCTGTCTTCCTCCATAACTCTACGTTGCAAGGTAAAGAAACTCGAGATAGCCTTTTTCTCTTTGATTCATCGCCAGTCTTCTTTGTGAACTAAACCCATTACCACTCGCATTAGCATTAGGAAATTGGAATTCAGAAATGTCTATCCCTACTCGTAAAATTGGAAACGATACTGTGCCTGCAATCGGCTTTGGTTGCATGGGTTTGCACGCCATGTATGGGCCTTCTTCCGAAGAAGCCAACCAAGCTGTCTTGACTCATGCCGCTGACTTGGGTTGCACCTTTTGGGACAGCTCTGACATGTACGGATTTGGTGCCAATGAGGAGTGCATTGGTCGTTGGTTCAAGCAAACTGGACGCCGTAAGGAGATTTTCTTGGCTACCAAGTTTGGCTATGAGAAGAATCCCGAGACTGGCGAGCTCTCTTTGAACAACGAGCCTGATTACATTGAAAAGGCACTTGATCTTTCTTTGAAGCGTTTGGGTATTGATTGCATTGATCTTTACTACGTCCATCGCTTCAGCGGTGAGACTCCTATTGAAAAGATCATGGGCGCCCTTAAGAAGTGCGTTGAAGCTGGAAAGATTCGTTACATTGGTCTTTCTGAATGCAGTGCTAACACCATTCGTCGTGCAGCCGCTGTTTATCCGGTCAGTGCCGTTCAAGTTGAGTATTCTCCCTTCTCTTTGGAGATTGAGCGCCCTGAAATTGGTGTCATGAAGGCTTGCCGCGAAAACAATATCACCATTGTTTGCTATGCCCCTCTTGGCCGTGGTTTCTTGACTGGTGCATACAAATCCCCTGATGATTTCCCTGAGGGCGATTTCCGCAGAAAGGCCCCTCGTTATCAAAAGGAAAACTTCTACAAGAACTTGGAGCTTGTTACAAAGATTGAAAAAATTGCCACTGCCAACAATATCACACCTGGCCAGCTTAGTTTGGCTTGGTTGTTAGCCCAAGGTGACGATATTCTTCCCATTCCTGGTACCAAACGTGTTAAATACTTGGAGGAGAACTTTGGCGCTCTCAAGGTTAAATTATCCGATGCCACTGTTAAGGAAATTCGTGAAGCTTGTGACAACGCTGAGGTAATTGGTGCTAGATATCCTCCTGGTGCTGGATCCAAGATTTTCATGGATACCCCTCCTATGCCCAAGTAGATTTAGTCTTTCTCGTATATGGTTTTAAAATTGTGCATGTTTATAGTTACCTTCATTTATGCTGGAAGAAATGAAATGAATTTTTGTTAGTTGTGC

7

Create script to open the FASTQ file "Python1_FASTQ.fastq" and go through each line of the file. Count the number of lines and the number of characters per line. Have your program report the: total number of lines total number of characters average line length


In [19]:
n_lines=0
n_chars=0

with open('Python1_FASTAQ.fastq', 'r') as fq:
  for line in fq:
    line=line.rstrip()
    n_lines+=1
    n_chars+=len(line)
print("There are {} lines and {} characters in the file.\nThe average line length is {}".format(n_lines, n_chars, n_chars/n_lines))


There are 120 lines and 7800 characters in the file.
The average line length is 65.0