This notebook demonstrates how BioThings Explorer can be used to answer the following query:
"What biosamples are associated with diseases related to gene SLC15A4"
To experiment with an executable version of this notebook, .
Background: BioThings Explorer can answer two classes of queries -- "EXPLAIN" and "PREDICT". EXPLAIN queries are described in EXPLAIN_demo.ipynb, and PREDICT queries are described in PREDICT_demo.ipynb. Here, we describe PREDICT queries and how to use BioThings Explorer to execute them. A more detailed overview of the BioThings Explorer systems is provided in these slides.
In the first stage of the query, BTE will first call all APIs which can provide association data between SLC15A4 and diseases, including:
In the second stage of the query, BTE will first call all APIs which can provide association data between diseases and biosamples through Stanford Biosample API.
Install the biothings_explorer packages, as described in this README. This only needs to be done once (but including it here for compability with ).
In [ ]:
!pip install git+https://github.com/biothings/biothings_explorer#egg=biothings_explorer
In [1]:
from biothings_explorer.user_query_dispatcher import FindConnection
from biothings_explorer.hint import Hint
In this step, BioThings Explorer translates our query string "SLC15A4" into BioThings objects, which contain mappings to many common identifiers. Generally, the top result returned by the Hint module will be the correct item, but you should confirm that using the identifiers shown.
Search terms can correspond to any child of BiologicalEntity from the Biolink Model, including DiseaseOrPhenotypicFeature (e.g., "lupus"), ChemicalSubstance (e.g., "acetaminophen"), Gene (e.g., "CDK2"), BiologicalProcess (e.g., "T cell differentiation"), and Pathway (e.g., "Citric acid cycle").
In [2]:
ht = Hint()
SLC15A4 = ht.query("SLC15A4")['Gene'][0]
SLC15A4
Out[2]:
In this section, we find all paths in the knowledge graph that connect SLC15A4 to any entity that is a biosample. To do that, we will use FindConnection. This class is a convenient wrapper around two advanced functions for query path planning and query path execution.
In [3]:
fc = FindConnection(input_obj=SLC15A4, output_obj='Biosample', intermediate_nodes=['DiseaseOrPhenotypicFeature'])
In [4]:
fc.connect(verbose=True)
In [5]:
fc.display_table_view()
Out[5]: