Lesson 11 Individual Assignment

Individual means that you do it yourself. You won't learn to code if you don't struggle for yourself and write your own code. Remember that while you can discuss the general (algorithmic) way to solve a problem, you should not even be looking at anyone else's code or showing anyone else your code for an individual assignment.
Review the Group Work guidelines on Cavas and/or ask an instructor if you have any questions.

Background info

As you have seen in Lesson 10, computers are powerful tools for analyzing DNA sequence. One common computerized task in genome analysis is to have a computer predict open reading frames (ORFs), make codons (the subject of some of Lesson 10 Individual), then predict the amino acid sequence of the protein made by the predicted ORFs. While today we will only be analyzing how to translate ORFs, computers are used for many more steps to predict the structure and function of these predicted proteins. (some of you will be familiar with the BLAST algorithm for comparing DNA and protein sequences to a database of known sequences.

Here is a dictionary of DNA codons and their corresponding amino acids (this is the same information as in Table 1 in Lesson 10 but in a more computer readable format) that you will find useful for this assignment:

translation={'TTT':'Phe','TTC':'Phe','TTA':'Leu','TTG':'Leu','TCT':'Ser',
             'TCC':'Ser','TCA':'Ser','TCG':'Ser','TAT':'Tyr','TAC':'Tyr',
             'TAA':'Stop','TAG':'Stop','TGT':'Cys','TGC':'Cys','TGA':'Stop',
             'TGG':'Trp','CTT':'Leu','CTC':'Leu','CTA':'Leu','CTG':'Leu',
             'CCT':'Pro','CCC':'Pro','CCA':'Pro','CCG':'Pro','CAT':'His',
             'CAC':'His','CAA':'Gln','CAG':'Gln','CGT':'Arg','CGC':'Arg',
             'CGA':'Arg','CGG':'Arg','ATT':'Ile','ATC':'Ile','ATA':'Ile',
             'ATG':'Met','ACT':'Thr','ACC':'Thr','ACA':'Thr','ACG':'Thr',
             'AAT':'Asn','AAC':'Asn','AAA':'Lys','AAG':'Lys','AGT':'Ser',
             'AGC':'Ser','AGA':'Arg','AGG':'Arg','GTT':'Val','GTC':'Val',
             'GTA':'Val','GTG':'Val','GCT':'Ala','GCC':'Ala','GCA':'Ala',
             'GCG':'Ala','GAT':'Asp','GAC':'Asp','GAA':'Glu','GAG':'Glu',
             'GGT':'Gly','GGC':'Gly','GGA':'Gly','GGG':'Gly'}

Programming Practice

Be sure to spell all function names correctly - misspelled functions will lose points (and often break anyway since no one is sure what to type to call it). If you prefer showing your earlier, scratch work as you figure out what you are doing, please be sure that you make a final, complete, correct last function in its own cell that you then call several times to test. In other words, separate your thought process/working versions from the final one (a comment that tells us which is the final version would be lovely).

Every function should have at least a docstring at the start that states what it does (see Lesson3 Team Notebook if you need a reminder). Make other comments as necessary.

Make sure that you are running test cases (plural) for everything and commenting on the results in markdown. Your comments should discuss how you know that the test case results are correct.

preA. Copy and paste the codon function from your Lesson 10 individual notebook here. You might have to modify it depending on how you are returning results. The version you want here should:

  • take a DNA sequence string as a parameter
  • return all of the ORFs in the DNA sequence
  • the return should be a list of codon lists (e.g. [['ATG',..., 'TAG'], ['ATG,..., 'TAA'], ...])

In [ ]:

Test your codon function with at least four test cases. One test case should be an empty string for DNA input. Make sure you also include expected output for all test cases.

When you are done testing, write a brief interpretation of the test results.


In [ ]:

A. Define a translate function that takes a string of DNA sequence, calls codon, and uses the output/return of the codon function to translate the ORF and return the translated amino acid sequence as a list of amino acids (e.g. ['Met', 'Phe', ...]).
The translate function should:

  • take a string of DNA sequence as its parameter
  • call the codon function to get ORFs as a list of codons
  • translate the list of codons into amino acids
  • return the correct list of amino acids for that ORF

In [ ]:

Test your translate function with at least four test cases. Be sure that at least one test case is an empty list []. Make sure you also include expected output for all test cases.

When you are done testing, write a brief interpretation of the test results.


In [ ]:

B. Start with your pseudocode from the Lesson 11 Team Notebook (question 14a).
Create a convert_to_one function that:

  • takes a list of amino acids, each represented by a three letter string (e.g. ['Met', 'Phe', ...]) as its single parameter
  • convert the list of three letter amino acid codes to a list of one letter codes (e.g. ['M', 'F', ...])
  • return the list of single letter code amino acids

Note: All the amino acids and their corresponding one letter codes are given in columns 2 and 3 in Table 1 from Lesson 10. Also note that stop codons are * in the single letter code.


In [ ]:

Test your convert_to_one function with at least four test cases. Be sure that at least one test case is an empty list []. Make sure you also include expected output for all test cases.

When you are done testing, write a brief interpretation of the test results.


In [ ]:

C. Use your pseudocode from Lesson 11 Team Notebook (question 15) to define a reverse_complement function that:

  • takes a single string of DNA sequence (5' →3') as its parameter
  • and returns the reverse complement sequence string in the conventional 5'→3' orientation.

Note: This function should use an appropriate dictionary to create the complement.


In [ ]:

Test your reverse_complement function with this sequence (from Lesson 10 Team Notebook):

TACGTATGATCGGCTATAGCCGATGCATTAGCTAGTGCTGATACTGATCG

predicted output (see below for possible tool for finding it):


In [ ]:

Add at least two more test cases of your own. Make sure that you define the correct output using another tool (suggestion) before you test your function.

When you are done testing, write a brief interpretation of the test results.


In [ ]:

Additional question ideas:

  • add a final function that incorporates these functions together (like we did with stats) and include a dictionary as output?
  • or dictionary question - something about proteins and AA seqs, searching through dict to find proteins with specific sub sequences ?

In [ ]: