Individual means that you do it yourself. You won't learn to code if you don't struggle for yourself and write your own code. Remember that while you can discuss the general (algorithmic) way to solve a problem, you should not even be looking at anyone else's code or showing anyone else your code for an individual assignment.
Review the Group Work guidelines on Cavas and/or ask an instructor if you have any questions.
Downloading, parsing files, extracting, and processing necessary data is essential to modern science. Here we continue our examination of files from Biological sequence databases, but most scientific data is housed in a database that has a file format and it is a very common task to write computer code to open, extract, process, and otherwise manipulate records or datafiles in an automated way. Here are some examples from other disciplines:
Chemistry:
https://www.rsc.org/merck-index
http://www.chemspider.com/
Physics:
http://ned.ipac.caltech.edu/
Neuroscience:
https://ida.loni.usc.edu/
Environmental Science:
https://edg.epa.gov/metadata/catalog/main/home.page
Be sure to spell all function names correctly - misspelled functions will lose points (and often break anyway since no one is sure what to type to call it). If you prefer showing your earlier, scratch work as you figure out what you are doing, please be sure that you make a final, complete, correct last function in its own cell that you then call several times to test. In other words, separate your thought process/working versions from the final one (a comment that tells us which is the final version would be lovely).
Every function should have at least a docstring at the start that states what it does (see Lesson3 Team Notebook if you need a reminder). Make other comments as necessary.
Make sure that you are running test cases (plural) for everything and commenting on the results in markdown. Your comments should discuss how you know that the test case results are correct.
A. Pseudocode Copy and paste your high-level pseudocode for your readXYZfromPDB
function from Lesson 13 Team Notebook. High-level pseudocode is an outline - for example, you might just say "loop" instead of a while-loop with the condition and how the variable is incremented inside the loop.
Next, either:
readXYZfromPDB
function from Lesson 13 Team Notebook, orB. You should have a PDB file 2LDJ.pdb
from the pre-activity. Make sure that either you have the PDB file in the same directory as this Jupyter notebook, or you provide a complete file path.
Next download 1L2Y.pdb
(download from PDB http://www.rcsb.org/pdb/home/home.do) as a second test case.
For each of the two files, list the first three x coordinates, the first three y coordinates, and the first three z coordinates.
2LDJ.pdb:
1L2Y.pdb:
Next, define a readXYZfromPDB
function that:
To clarify the output, for the PDB file specified in the argument, your function should return three lists that each contains the floating-point numbers of the x, y, or z coordinates, respectively. For example, the x-coordinate list for 2LDJ.pdb
would be [11.030, 9.640, 8.650, 8.185, ...]
.
In [ ]:
Test your readXYZfromPDB
function with the files 2LDJ.pdb
and 1L2Y.pdb
as test cases. Make sure you compare your function's output to your predictions from part A.
When you are done testing, write a brief interpretation of the test results.
In [ ]:
C. You are about to define a plain_coord
function that extracts the coordinates from a PDB file. It should:
Example output in the file would look like this:
6 -1.415689945221 0.020083000883 0.255733013153
6 -0.670023024082 1.240668058395 -0.075638003647
6 0.656983017921 1.203186035156 -0.208253994584
1 1.132277011871 -2.126605033875 -0.221330001950
1 1.316627025604 -0.007073000073 1.437821984291
1 0.618321001530 -0.923030018806 -1.357839941978
Before writing any Python code, provide pseudocode that shows how the plain_coord
function will work with the readXYZfromPDB
function from Part A.
Now define the plain_coord
function:
In [ ]:
Test your plainCoord
function with at least with the files 2LDJ.pdb
and 1L2Y.pdb
. Make sure you also predict and report at least a part of the lists as expected output for these test cases.
When you are done testing, write a brief interpretation of the test results.
In [ ]:
D.
First write high-level pseudocode for a function called multi_translate
that will read in a single DNA sequence in FASTA file format (see below and Lesson 13 Pre-Activity) and output all of the translated protein sequences(s) (ORFs) from that DNA in FASTA format. You have already written the "guts" of this in Lesson 10 and Lesson 11 Individual - you should have functions that make codons and translate - the new parts are the input and output files and parsing/outputting FASTA format.
Your function should:
More info on FASTA format here.
Examples:
(input file test.fasta
, contents below)
>short test seq
TCATGTTGGTGAAGGCGCCACGACTCGAGATGCAGTAGGAGG
(expected output - file name test_ORFS.fasta
)
>ORF1
MLVKAPRLEMQ
In [ ]:
Test your multi_translate
function with the following files (from Canvas):
test.fasta
test2.fasta
EU285557.fasta
The actual predicted output from test.fasta
is included above.
When you are done testing, write a brief interpretation of the test results.
In [ ]: