In [ ]:
def trim(s):
# implement this function
pass
# test case
import Bio.Seq as BS
s = BS.Seq("ACGCGGCGTG")
print(s, "has length", len(s))
# write a piece of code here which will
# print the translated sequence 'TRR'
# without any errors
Write a function GC(s)
which calculates the GC content of a Seq
sequence s
and returns it as a real number in the range 0-1. The GC content is the proportion of "G" and "C" characters in the sequence. Make sure that your function works correctly also for lower-cased sequences.
In [ ]:
def GC(s):
# implement this function
pass
# test case
import Bio.Seq as BS
s = BS.Seq("ACGATTAA")
print("GC content of", s, "is", GC(s))
In [ ]:
def hamming(s1, s2):
# implement this function
pass
# test case
import Bio.Seq as BS
s1 = BS.Seq("ACGCAGTTGCAGTAG")
s2 = BS.Seq("ACGCACTTGCAGAAG")
s3 = BS.Seq("AAAAAAAAAA")
print("Hamming distance of", s1, "and", s2, "is", hamming(s1,s2))
print("Hamming distance of", s1, "and", s3, "is", hamming(s1,s3))
DNA sequences are translated into protein (amino acid sequences) three letters at a time. So every three letters of the DNA sequence produce a single letter of the protein sequence. The translation is governed by a coding table, of which there are many. See https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi .
Let us say you have a DNA sequence and its translation, and you would like to find out which coding table was used in the translation. Given a sequence s_dna
and its translation s_protein
, find the coding table(s) which could have produced it. Print the number of the table (or one number per line if there are many possible tables).
The tables are defined in Bio.Data.CodonTable
, and they are numbered. In particular, generic_by_id
has a mapping from these numbers to the tables.
In [ ]:
import Bio.Seq as BS
s_dna = BS.Seq("ATGGTCGATGACCTGTGAACTTAA")
s_protein = BS.Seq("MVDDLCT*", BS.IUPAC.protein)
# Hint: Bio.Data.CodonTable.generic_by_id
# print the number(s) of the table(s) which
# could have produced s_protein from s_dna
In [ ]:
def clean(s):
# implement this function
pass
# test case
print("ACGTHABJHHBAGATGATB")
print(clean("ACGTHABJHHBAGATGATB"))
In [ ]:
def sort_by_unknown(s):
# implement this function
pass
# test case
import Bio.Seq as BS
s = [BS.Seq('NGTACCTTGCTACTC'),
BS.Seq('NCGTGNN'),
BS.Seq('NNNNN'),
BS.Seq('ACGGT'),
BS.Seq('ANNTGGT'),
BS.Seq('ACGNGT'),
BS.Seq('AACGTCCGTNNN'),
]
print(s)
print(sort_by_unknown(s))
Consider the following hypothetical experiment and sketch an analysis to reach the given objectives. At this stage of the course, it is enough to outline the overall flow of the analysis and propose methods that could be used in the analysis. The more details you can write down, the better.
Researchers have found a new prokaryote species that can thrive in unexpectedly harsh environmental conditions. To understand how the species can survive, the researchers have obtained RNA molecules expressed by the cells. Given a list of RNA sequences, find out what functions the proteins (possibly) have and how the proteins differ from the corresponding proteins in other species.