This tutotial shows:
networkx
using the obonet
package.The notebook is written for Python 3.6, but obonet
itself works with Python 3.4+.
In [1]:
import networkx
import obonet
Learn more about the Gene Ontology (GO) downloads here. Note how we can read the OBO file from a URL. obonet.read_obo
automically detects whether it's passed a local path, URL, or open file. In addition, obonet.read_obo
will automtically decompress files ending in .gz
, .bz2
, or .gz
.
In [2]:
%%time
url = 'http://purl.obolibrary.org/obo/go/go-basic.obo'
graph = obonet.read_obo(url)
In [3]:
# Number of nodes
len(graph)
Out[3]:
In [4]:
# Number of edges
graph.number_of_edges()
Out[4]:
In [5]:
# Check if the ontology is a DAG
networkx.is_directed_acyclic_graph(graph)
Out[5]:
In [6]:
# Retreive properties of phagocytosis
graph.node['GO:0006909']
Out[6]:
In [7]:
# Retreive properties of pilus shaft
graph.node['GO:0009418']
Out[7]:
Note that for some OBO ontologies, some nodes only have an id and not a name (see issue).
In [8]:
id_to_name = {id_: data.get('name') for id_, data in graph.nodes(data=True)}
name_to_id = {data['name']: id_ for id_, data in graph.nodes(data=True) if 'name' in data}
In [9]:
# Get the name for GO:0042552
id_to_name['GO:0042552']
Out[9]:
In [10]:
# Get the id for myelination
name_to_id['myelination']
Out[10]:
In [11]:
# Find edges to parent terms
node = name_to_id['pilus part']
for child, parent, key in graph.out_edges(node, keys=True):
print(f'• {id_to_name[child]} ⟶ {key} ⟶ {id_to_name[parent]}')
In [12]:
# Find edges to children terms
node = name_to_id['pilus part']
for parent, child, key in graph.in_edges(node, keys=True):
print(f'• {id_to_name[child]} ⟵ {key} ⟵ {id_to_name[parent]}')
In [13]:
sorted(id_to_name[superterm] for superterm in networkx.descendants(graph, 'GO:0042552'))
Out[13]:
In [14]:
sorted(id_to_name[subterm] for subterm in networkx.ancestors(graph, 'GO:0042552'))
Out[14]:
In [15]:
paths = networkx.all_simple_paths(
graph,
source=name_to_id['starch binding'],
target=name_to_id['molecular_function']
)
for path in paths:
print('•', ' ⟶ '.join(id_to_name[node] for node in path))
In [16]:
graph.graph
Out[16]: