BEL to Natural Language

Author: Charles Tapley Hoyt

Estimated Run Time: 5 seconds

This notebook shows how the PyBEL-INDRA integration can be used to turn a BEL graph into natural language. Special thanks to John Bachman and Ben Gyori for all of their efforts in making this possible.

To view the interactive Javascript output in this notebook, open in the Jupyter NBViewer.

Imports


In [1]:
import sys
import time

import indra
import indra.util.get_version
import ndex2
import pybel

from indra.assemblers.english_assembler import EnglishAssembler
from indra.sources.bel.bel_api import process_pybel_graph

from pybel.examples import sialic_acid_graph
from pybel_tools.visualization import to_jupyter

Environment


In [2]:
print(sys.version)


3.6.3 (default, Oct  9 2017, 09:47:56) 
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]

In [3]:
print(time.asctime())


Thu Mar 15 13:55:44 2018

Dependencies


In [4]:
pybel.utils.get_version()


Out[4]:
'0.11.2-dev'

In [5]:
indra.util.get_version.get_version()


Out[5]:
"1.5.0-b'dcf2f45592f9c96b58622f42ba58ca342157488d'"

Data

The Sialic Acid graph is used as an example.


In [6]:
to_jupyter(sialic_acid_graph)


Out[6]:

Conversion

The PyBEL BELGraph instance is converted to INDRA statments with the function process_pybel_graph. It returns an instance of PybelProcessor, which stores the INDRA statments.


In [7]:
pbp = process_pybel_graph(sialic_acid_graph)


INFO: [2018-03-15 13:55:44] indra/pybel_processor - Unable to get identifier information for node: a(CHEBI:"sialic acid")
INFO: [2018-03-15 13:55:44] indra/pybel_processor - Unable to get identifier information for node: a(CHEBI:"sialic acid")
Unhandled namespace with identifier: CHEBI: sialic acid (a(CHEBI:"sialic acid"))
Unhandled namespace with identifier: CHEBI: sialic acid (a(CHEBI:"sialic acid"))

A list of INDRA statements is extracted from the BEL graph and stored in the field PybelProcessor.statements. Note that INDRA is built to consider mechanistic information, and therefore excludes most associative relationships.


In [8]:
stmts = pbp.statements
stmts


Out[8]:
[Phosphorylation(CD33(activity), CD33()),
 Activation(CD33(mods: (phosphorylation), activity), PTPN6(), phosphatase),
 Activation(CD33(mods: (phosphorylation), activity), PTPN11(), phosphatase),
 Inhibition(PTPN6(activity), SYK()),
 Inhibition(PTPN11(activity), SYK()),
 Activation(SYK(activity), TREM2()),
 Activation(SYK(activity), TYROBP())]

The list of INDRA statements is converted to plain english using the EnglishAssembler.


In [9]:
asm = EnglishAssembler(stmts)
print(asm.make_model(), sep='\n')


Active CD33 leads to the phosphorylation of CD33. Active phosphorylated CD33 activates PTPN6. Active phosphorylated CD33 activates PTPN11. Active PTPN6 inhibits SYK. Active PTPN11 inhibits SYK. Active SYK activates TREM2. Active SYK activates TYROBP.

Conclusion

While knowledge assembly is indeed difficult and precarious, the true scientific task is to use them to generate mechanistic hypotheses. By far, the most common way is for a scientist to use their intution and choose an explanatory subgraph or pathway. This notebook has demonstrated that after this has been done, the results can be serialized to english prose in a precise manner.