For some analyses, it is possible to only use the is_a
definitions given in the Gene Ontology.
However, it is important to remember that this isn't always the case. As such, GOATOOLS
includes the option to load the relationship definitions also.
In [1]:
import os
from goatools.obo_parser import GODag
if not os.path.exists('go-basic.obo'):
!wget http://geneontology.org/ontology/go-basic.obo
go = GODag('go-basic.obo', optional_attrs=['relationship'])
So now, when looking at an individual term (which has a relationship defined in the GO) these are listed in a nested manner. As an example, look at GO:1901990
which has a single regulates
relationship.
In [2]:
eg_term = go['GO:1901990']
In [3]:
eg_term
Out[3]:
These different relationship types are stored as a dictionary within the relationship attribute on a GO term.
In [4]:
print(eg_term.relationship.keys())
In [5]:
print(eg_term.relationship['regulates'])
One example use case for the relationship terms, would be to look for all functions which regulate pseudohyphal growth (GO:0007124
). That is:
A pattern of cell growth that occurs in conditions of nitrogen limitation and abundant fermentable carbon source. Cells become elongated, switch to a unipolar budding pattern, remain physically attached to each other, and invade the growth substrate.
Source: https://www.ebi.ac.uk/QuickGO/GTerm?id=GO:0007124#term=info&info=1
In [6]:
term_of_interest = go['GO:0007124']
First, find the relationship types which contain "regulates":
In [7]:
regulates = frozenset([typedef
for typedef in go.typedefs.keys()
if 'regulates' in typedef])
print(regulates)
Now, search through the terms in the tree for those with a relationship in this list and add them to a dictionary dependent on the type of regulation.
In [8]:
from collections import defaultdict
regulating_terms = defaultdict(list)
for t in go.values():
if hasattr(t, 'relationship'):
for typedef in regulates.intersection(t.relationship.keys()):
if term_of_interest in t.relationship[typedef]:
regulating_terms['{:s}d_by'.format(typedef[:-1])].append(t)
Now regulating_terms
contains the GO terms which relate to regulating protein localisation to the nucleolus.
In [9]:
print('{:s} ({:s}) is:'.format(term_of_interest.name, term_of_interest.id))
for regulate_desc, goterms in regulating_terms.items():
print('\n - {:s}:'.format(regulate_desc))
for goterm in goterms:
print(' -- {:s} {:s}'.format(goterm.id, goterm.name))
for gochild in goterm.children:
print(' -- {:s} {:s}'.format(gochild.id, gochild.name))