Exercise 3.1 UniProt entry

a) Download the entry P0A183 from UniProt in the XML format and save it to a file named P0A183.xml.

b) Print the following information about P0A183:

  • name
  • peptide length
  • number of $\alpha$-helices
  • number of metal ion-binding sites

In [ ]:

Exercise 3.2 UniProt ids to FASTA

a) Create manually a file containing the UniProt ids below, one id per line. (You can use Notepad, for example.)

P00915
P00918
P43166
P07451
Q8N1Q1
P35219
P35218
Q9Y2D0
P23280
O75493
Q9NS85
Q16790
O43570
Q9ULX7
P22748
Q99N23

b) Write a program that reads the file and produces a FASTA file containing the sequences of the corresponding proteins.


In [ ]:

Exercise 3.3 Summary of found entries

a) Search UniProt for manually annotated entries that concern proteins produced by genes named "merR". Fetch and print the following information in a tabular format.

  • entry name
  • organism
  • peptide length

b) Parse the table into a dictionary such that the keys are entry names and the values are named tuples containing the fetched data.


In [ ]:

Exercise 3.4 The 10 longest human proteins in UniProt

a) Download the 10 longest manually annotated human proteins from UniProt. Store them to a file in the XML format. (You can first do the search in a browser to solve the sorting and then transfer the URL in the browser to a query in Python.)

b) Parse the file and print the ids and lengths of the fetched proteins.


In [ ]:

Exercise 3.5 UniProt id mapping service

Map the following UniProt protein ids to RefSeq Protein ids using the UniProt id mapping service. The abbrevations of the available databases (to be used as query arguments) are listed at https://www.uniprot.org/help/api_idmapping . Produce a dictionary where the keys are UniProt ids and the values are lists of RefSeq Protein ids. (Note that one UniProt entry may be mapped to several RefSeq Protein entries.)

CC106_HUMAN FBX3_HUMAN BTBDA_HUMAN C2C4B_HUMAN FRG2_HUMAN ANR12_HUMAN PP1R7_HUMAN GID8_HUMAN CC110_HUMAN FA50A_HUMAN FA60A_HUMAN CCNJ_HUMAN IN35_HUMAN BRAT1_HUMAN TCP1L_HUMAN NP11_HUMAN NP8_HUMAN NPIL2_HUMAN EP2A2_HUMAN TEX37_HUMAN CYTSB_HUMAN DCNP1_HUMAN RSBNL_HUMAN DET1_HUMAN CROC4_HUMAN SYAP1_HUMAN BTBD8_HUMAN ST3L2_HUMAN FRG2B_HUMAN PTMA_HUMAN F10C1_HUMAN DGCR6_HUMAN AHNK_HUMAN LDOC1_HUMAN SYC2L_HUMAN RNH2C_HUMAN TAC2N_HUMAN CDCA4_HUMAN INT8_HUMAN TDIF1_HUMAN SSF1_HUMAN CK001_HUMAN HESRG_HUMAN UBN2_HUMAN F192A_HUMAN FA50B_HUMAN NEMF_HUMAN FAM9B_HUMAN RED_HUMAN NP12_HUMAN MYCT1_HUMAN NARF_HUMAN SNURF_HUMAN ANKR7_HUMAN PCIF1_HUMAN ACRC_HUMAN F220A_HUMAN INT6_HUMAN C2C4A_HUMAN ST3L3_HUMAN FRG2C_HUMAN NUB1_HUMAN COMD5_HUMAN LUZP4_HUMAN HN1_HUMAN MAGAA_HUMAN CTF8A_HUMAN FA71B_HUMAN CCNJL_HUMAN KLDC2_HUMAN IMUP_HUMAN BUD31_HUMAN NP10_HUMAN TEX35_HUMAN HIRP3_HUMAN RSBN1_HUMAN AKIR1_HUMAN PWP1_HUMAN FA53A_HUMAN FAM9C_HUMAN INT4_HUMAN CA174_HUMAN ST3L1_HUMAN TEX19_HUMAN THYN1_HUMAN NARG2_HUMAN WDR13_HUMAN WDR75_HUMAN DGC6L_HUMAN GPS2_HUMAN AHNK2_HUMAN


In [ ]:

Exercise 3.6 SignalP and TMHMM services

Run the sequences from Exercise 3.2 through the SecretomeP and TMHMM services (preferrably programmatically, but please don't crash those services). Collect the results into nicely-structured dictionaries where keys are ids and values of predictions.


In [ ]: