a) Print the number of hits for protein name "p53" in the NCBI Protein database with human as the organism.

b) Print the ids of those proteins (one per line).


In [ ]:

Exercise 2.2 Database entry

a) Fetch the entry NP_004955 from the NCBI Protein database and save it to a file named NP_004955.gb. (Use the "gb" format.)

b) Load the file using Bio.SeqIO and find the following information about the protein:

  • id
  • name
  • description
  • organism
  • number of features

In [ ]:

Exercise 2.3 Phosphorylation sites

a) How many phosphorylation sites does the entry NP_004955 contain? (Use the file you acquired in the previous exercise.)

b) Print the phosphorylated amino acids and their positions (one per line).


In [ ]:

Exercise 2.4 Gene names

The entry 223468685 in NCBI Nucleotide database contains several genes. Print them in the order of appearance.


In [ ]:

Exercise 2.5 Protein sequence

Print the protein sequence produced by the gene named "BTK" in the entry 223468685 in NCBI Nucleotide database.


In [ ]:

Exercise 2.6 Protein weight

Calculate and print the molecular weight of the BTK gene. You can use a function available in Biopython or write your own function. The monoisotopic masses of amino acid residues are given the table below.

Amino acid residue Monoisotopic mass
A 71.03711
C 103.00919
D 115.02694
E 129.04259
F 147.06841
G 57.02146
H 137.05891
I 113.08406
K 128.09496
L 113.08406
M 131.04049
N 114.04293
P 97.05276
Q 128.05858
R 156.10111
S 87.03203
T 101.04768
V 99.06841
W 186.07931
Y 163.06333

In [ ]:
# monoisotopic masses of amino acid residues
masses = {'A': 71.03711,
          'C': 103.00919,
          'D': 115.02694,
          'E': 129.04259,
          'F': 147.06841,
          'G': 57.02146,
          'H': 137.05891,
          'I': 113.08406,
          'K': 128.09496,
          'L': 113.08406,
          'M': 131.04049,
          'N': 114.04293,
          'P': 97.05276,
          'Q': 128.05858,
          'R': 156.10111,
          'S': 87.03203,
          'T': 101.04768,
          'V': 99.06841,
          'W': 186.07931,
          'Y': 163.06333}
# input peptide
import Bio.Seq as BS
import Bio.Alphabet as BA
protein = BS.Seq('MAAVILESIFLKRSQQKKKTSPLNFKKRLFLLTVHKLSYYEYDFERGRRGSKKGSIDVEKITCVETVVPEKNPPPERQIPRRGEESSEMEQISIIERFPYPFQVVYDEGPLYVFSPTEELRKRWIHQLKNVIRYNSDLVQKYHPCFWIDGQYLCCSQTAKNAMGCQILENRNGSLKPGSSHRKTKKPLPPTPEEDQILKKPLPPEPAAAPVSTSELKKVVALYDYMPMNANDLQLRKGDEYFILEESNLPWWRARDKNGQEGYIPSNYVTEAEDSIEMYEWYSKHMTRSQAEQLLKQEGKEGGFIVRDSSKAGKYTVSVFAKSTGDPQGVIRHYVVCSTPQSQYYLAEKHLFSTIPELINYHQHNSAGLISRLKYPVSQQNKNAPSTAGLGYGSWEIDPKDLTFLKELGTGQFGVVKYGKWRGQYDVAIKMIKEGSMSEDEFIEEAKVMMNLSHEKLVQLYGVCTKQRPIFIITEYMANGCLLNYLREMRHRFQTQQLLEMCKDVCEAMEYLESKQFLHRDLAARNCLVNDQGVVKVSDFGLSRYVLDDEYTSSVGSKFPVRWSPPEVLMYSKFSSKSDIWAFGVLMWEIYSLGKMPYERFTNSETAEHIAQGLRLYRPHLASEKVYTIMYSCWHEKADERPTFKILLSNILDVMDEES', BA.IUPAC.protein)

In [ ]:

Exercise 2.7 Query history

Fetch the sequences of those NCBI Protein database entries that concern Saccharomyces boulardii and save them to a file in the FASTA format. Use the history feature of the NCBI service to fetch 100 sequences per query.


In [ ]: