DNA, or deoxyribonucleic acid, is the molecule that carries most of the genetic instructions used by living organisms. It is often modeled as a sequence of the letters G, C, A and T. These letters represent strands of Guanine, Cytosine, Adenine and Thymine, the four nucleotides that form the basis of DNA. Watch the following video:
In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo("MvuYATh7Y74",width=640,height=360)
A string of DNA is very long. No one has developed the technology to measure a single DNA (or RNA) molecule sequence in one pass. We can sequence smaller pieces ("sequences") of DNA, however. The trick then becomes to figure out a way to use small-sequence technology to measure the entire big sequence (i.e., the entire DNA molecule).
A solution that seems to work quite well is to take DNA (actually, a lot of DNA molecules) and cut it up into short pieces that we can read. We can read these small pieces easily, but then we need to figure out how to put them together to get the original long DNA sequence. This is challenging because there are many little pieces that overlap with each other, and reassembling this into the correct sequence may take quite a bit of effort. This type of assembly is often called "shotgun sequencing."
Whatch the following video about shotgun sequencing:
In [ ]:
from IPython.display import YouTubeVideo
YouTubeVideo("vg7Y5EeZsjk",width=640,height=360)
The goal of shotgun sequencing is to take all of the short reads and assemble them back into the original genome.
Question 1: Describe the model that is being used in shotgun sequencing.
//put your answer here
Question 2: What are some of the limitations of this model?
//put your answer here
Question 3: Write some high level pseudocode or create a flowchart that explains how you would develop your own shotgun sequencer.
//put your answer here or reference an attached file.
In [ ]:
st = 'This is a string'
print(st)
A string is a list of characters. You can access individual characters using brackets and an index. For example, the following prints out charicters 4 though 9:
In [ ]:
print(st[4:10])
Here's a neat trick. You can access the string using both positive and negative indexes. A positive index counts from the beginnning of the string and a negative index counts from the end of the string. For example:
In [ ]:
print(st[4:-4])
Remember indexing starts at zero (0) and the left index is "inclusive" while the right index is "excluded". So, to print the second-to-last letter in a string, all we need to do is the following:
In [ ]:
print(st[len(st)-2])
Question 4: Write a loop that loops over the string backwards using negative indexing. Have the loop print each character backwards. It should look something like this:
g
n
i
r
t
s
a
s
i
s
i
h
T
In [ ]:
#put your code here
for c in range(0,len(st)):
print(st[-(c+1)])
Question 5: What questions do you have, if any, about any of the topics discussed in this assignment after watching the video and reading the links?
//put your answer here
Question 6: Do you have any further questions or comments about this material, or anything else that's going on in class?
//put your answer here
Now, you just need to submit this assignment by uploading the notebook to the course Desire2Learn web page. Go to the "Pre-class assignments" folder, find the dropbox link for Day 17, and upload it there.
In class we are going to have you use what you have learned to write your own assembler. See you there!