Exercise 2.1 Database search

a) Print the number of hits for protein name "p53" in the NCBI Protein database with human as the organism.

b) Print the ids of those proteins (one per line).



In [ ]:

Exercise 2.2 Database entry

a) Fetch the entry NP_004955 from the NCBI Protein database and save it to a file named NP_004955.gb. (Use the "gb" format.)

b) Load the file using Bio.SeqIO and find the following information about the protein:

id
name
description
organism
number of features



In [ ]:

Exercise 2.3 Phosphorylation sites

a) How many phosphorylation sites does the entry NP_004955 contain? (Use the file you acquired in the previous exercise.)

b) Print the phosphorylated amino acids and their positions (one per line).



In [ ]:

Exercise 2.4 Gene names

The entry 223468685 in NCBI Nucleotide database contains several genes. Print them in the order of appearance.



In [ ]:

Exercise 2.5 Protein sequence

Print the protein sequence produced by the gene named "BTK" in the entry 223468685 in NCBI Nucleotide database.



In [ ]:

Exercise 2.6 Protein weight

Calculate and print the molecular weight of the BTK gene. You can use a function available in Biopython or write your own function. The monoisotopic masses of amino acid residues are given the table below.

Amino acid residue	Monoisotopic mass
A	71.03711
C	103.00919
D	115.02694
E	129.04259
F	147.06841
G	57.02146
H	137.05891
I	113.08406
K	128.09496
L	113.08406
M	131.04049
N	114.04293
P	97.05276
Q	128.05858
R	156.10111
S	87.03203
T	101.04768
V	99.06841
W	186.07931
Y	163.06333



In [ ]:

    
# monoisotopic masses of amino acid residues
masses = {'A': 71.03711,
          'C': 103.00919,
          'D': 115.02694,
          'E': 129.04259,
          'F': 147.06841,
          'G': 57.02146,
          'H': 137.05891,
          'I': 113.08406,
          'K': 128.09496,
          'L': 113.08406,
          'M': 131.04049,
          'N': 114.04293,
          'P': 97.05276,
          'Q': 128.05858,
          'R': 156.10111,
          'S': 87.03203,
          'T': 101.04768,
          'V': 99.06841,
          'W': 186.07931,
          'Y': 163.06333}
# input peptide
import Bio.Seq as BS
import Bio.Alphabet as BA
protein = BS.Seq('MAAVILESIFLKRSQQKKKTSPLNFKKRLFLLTVHKLSYYEYDFERGRRGSKKGSIDVEKITCVETVVPEKNPPPERQIPRRGEESSEMEQISIIERFPYPFQVVYDEGPLYVFSPTEELRKRWIHQLKNVIRYNSDLVQKYHPCFWIDGQYLCCSQTAKNAMGCQILENRNGSLKPGSSHRKTKKPLPPTPEEDQILKKPLPPEPAAAPVSTSELKKVVALYDYMPMNANDLQLRKGDEYFILEESNLPWWRARDKNGQEGYIPSNYVTEAEDSIEMYEWYSKHMTRSQAEQLLKQEGKEGGFIVRDSSKAGKYTVSVFAKSTGDPQGVIRHYVVCSTPQSQYYLAEKHLFSTIPELINYHQHNSAGLISRLKYPVSQQNKNAPSTAGLGYGSWEIDPKDLTFLKELGTGQFGVVKYGKWRGQYDVAIKMIKEGSMSEDEFIEEAKVMMNLSHEKLVQLYGVCTKQRPIFIITEYMANGCLLNYLREMRHRFQTQQLLEMCKDVCEAMEYLESKQFLHRDLAARNCLVNDQGVVKVSDFGLSRYVLDDEYTSSVGSKFPVRWSPPEVLMYSKFSSKSDIWAFGVLMWEIYSLGKMPYERFTNSETAEHIAQGLRLYRPHLASEKVYTIMYSCWHEKADERPTFKILLSNILDVMDEES', BA.IUPAC.protein)



In [ ]:

Exercise 2.7 Query history

Fetch the sequences of those NCBI Protein database entries that concern Saccharomyces boulardii and save them to a file in the FASTA format. Use the history feature of the NCBI service to fetch 100 sequences per query.



In [ ]: