Download http://purl.obolibrary.org/obo/go.obo. Write a function which will parse the file and return a dictionary with GO term ids as keys and named tuples (id, name, namespace) as values. If necessary, read again the week 3 lecture notes to see how named tuples are created.
In [ ]:
Print the human-friendly names of the GO terms within the cellular component
namespace associated to the 10 longest human proteins (see Exercise 3.4). Use the function written in Exercise 4.1 to find which of the GO annotations belong to the given namespace and what are their human-friendly names. Your output could look like this:
P35555: microfibril, extracellular region, proteinaceous extracellular matrix, basement membrane, extracellular space, extracellular matrix, extracellular exosome
P50851: lysosome, endoplasmic reticulum, Golgi apparatus, plasma membrane, endomembrane system, membrane, integral component of membrane, cytoplasmic, membrane-bounded vesicle, extrinsic component of membrane
In [ ]:
In [ ]:
Filter the hits obtained in Exercise 4.3 such that only the HSPs with E-value $<10^{-4}$ remain. Sort the hits by E-value in ascending order. Print the E-values along with the corresponding hit ids.
(If a hit has more than one HSP, only consider the one with the smallest E-value.)
In [ ]:
Print the following pieces of information for each HSP obtained in Exercise 4.4:
In [ ]: