Introduction

This notebook demonstrates basic usage of BioThings Explorer, an engine for autonomously querying a distributed knowledge graph. BioThings Explorer can answer two classes of queries -- "PREDICT" and "EXPLAIN". PREDICT queries are described in PREDICT_demo.ipynb. Here, we describe EXPLAIN queries and how to use BioThings Explorer to execute them. A more detailed overview of the BioThings Explorer systems is provided in these slides.

EXPLAIN queries are designed to identify plausible reasoning chains to explain the relationship between two entities. For example, in this notebook, we explore the question:

         "Why does imatinib have an effect on the treatment of chronic myelogenous leukemia (CML)?"

Later, we also compare those results to a similar query looking at imatinib's role in treating gastrointestinal stromal tumors (GIST).

To experiment with an executable version of this notebook, .

Step 0: Load BioThings Explorer modules

First, install the biothings_explorer and biothings_schema packages, as described in this README. This only needs to be done once (but including it here for compability with ).


In [1]:
%%capture
!pip install git+https://github.com/biothings/biothings_explorer#egg=biothings_explorer

Next, import the relevant modules:

  • Hint: Find corresponding bio-entity representation used in BioThings Explorer based on user input (could be any database IDs, symbols, names)
  • FindConnection: Find intermediate bio-entities which connects user specified input and output

In [1]:
# import modules from biothings_explorer
from biothings_explorer.hint import Hint
from biothings_explorer.user_query_dispatcher import FindConnection

Step 1: Find representation of "chronic myelogenous leukemia" and "imatinib" in BTE

In this step, BioThings Explorer translates our query strings "chronic myelogenous leukemia" and "imatinib" into BioThings objects, which contain mappings to many common identifiers. Generally, the top result returned by the Hint module will be the correct item, but you should confirm that using the identifiers shown.

Search terms can correspond to any child of BiologicalEntity from the Biolink Model, including DiseaseOrPhenotypicFeature (e.g., "lupus"), ChemicalSubstance (e.g., "acetaminophen"), Gene (e.g., "CDK2"), BiologicalProcess (e.g., "T cell differentiation"), and Pathway (e.g., "Citric acid cycle").


In [2]:
ht = Hint()
# find all potential representations of CML
cml_hint = ht.query("chronic myelogenous leukemia")
# select the correct representation of CML
cml = cml_hint['Disease'][0]
cml


Out[2]:
{'MONDO': 'MONDO:0011996',
 'DOID': 'DOID:8552',
 'UMLS': 'C1292772',
 'name': 'chronic myelogenous leukemia',
 'MESH': 'D015464',
 'OMIM': '608232',
 'ORPHANET': '521',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0011996'},
 'display': 'MONDO(MONDO:0011996) DOID(DOID:8552) OMIM(608232) ORPHANET(521) UMLS(C1292772) MESH(D015464) name(chronic myelogenous leukemia)',
 'type': 'Disease'}

In [3]:
# find all potential representations of imatinib
imatinib_hint = ht.query("imatinib")
# select the correct representation of imatinib
imatinib = imatinib_hint['ChemicalSubstance'][0]
imatinib


Out[3]:
{'DRUGBANK': 'DB00619',
 'CHEBI': 'CHEBI:45783',
 'name': 'imatinib',
 'primary': {'identifier': 'CHEBI',
  'cls': 'ChemicalSubstance',
  'value': 'CHEBI:45783'},
 'display': 'CHEBI(CHEBI:45783) DRUGBANK(DB00619) name(imatinib)',
 'type': 'ChemicalSubstance'}

Step 2: Find intermediate nodes connecting imatinib and chronic myelogenous leukemia

In this section, we find all paths in the knowledge graph that connect imatinib and chronic myelogenous leukemia. To do that, we will use FindConnection. This class is a convenient wrapper around two advanced functions for query path planning and query path execution. More advanced features for both query path planning and query path execution are in development and will be documented in the coming months.

The parameters for FindConnection are described below:


In [5]:
help(FindConnection.__init__)


Help on function __init__ in module biothings_explorer.user_query_dispatcher:

__init__(self, input_obj, output_obj, intermediate_nodes, registry=None)
    Find relationships in the Knowledge Graph between an Input Object and an Output Object.
    
    Args:
        input_obj (required): must be an object returned from Hint corresponding to a specific biomedical entity.
                            Examples:
                Hint().query("Fanconi anemia")['DiseaseOrPhenotypicFeature'][0]
                Hint().query("acetaminophen")['ChemicalSubstance'][0]
    
        output_obj (required): must EITHER be an object returned from Hint corresponding to a specific biomedical
                            entity, OR be a string or list of strings corresponding to Biolink Entity classes.
                            Examples:
                Hint().query("acetaminophen")['ChemicalSubstance'][0]
                'Gene'
                ['Gene','ChemicalSubstance']
    
        intermediate_nodes (required): the semantic type(s) of the intermediate node(s).  Examples:
                None                         : no intermediate node, find direct connections only
                []                           : no intermediate node, find direct connections only
                ['BiologicalEntity']         : one intermediate node of any semantic type
                ['Gene']                     : one intermediate node that must be a Gene
                [('Gene','Pathway')]         : one intermediate node that must be a Gene or a Pathway
                ['Gene','Pathway']           : two intermediate nodes, first must be a Gene, second must be a Pathway.
                ['Gene',('Pathway','Gene')]  : two intermediate nodes, first must be a Gene, second must be a Pathway or Gene.
                                                **NOTE**: queries with more than one intermediate node are currently not supported

Here, we formulate a FindConnection query with "CML" as the input_ojb, "imatinib" as the output_obj. We further specify with the intermediate_nodes parameter that we are looking for paths joining chronic myelogenous leukemia and imatinib with one intermediate node that is a Gene. (The ability to search for longer reasoning paths that include additional intermediate nodes will be added shortly.)


In [4]:
fc = FindConnection(input_obj=cml, output_obj=imatinib, intermediate_nodes='Gene')

We next execute the connect method, which performs the query path planning and query path execution process. In short, BioThings Explorer is deconstructing the query into individual API calls, executing those API calls, then assembling the results.

A verbose log of this process is displayed below:


In [5]:
# set verbose=True will display all steps which BTE takes to find the connection
fc.connect(verbose=True)


==========
========== QUERY PARAMETER SUMMARY ==========
==========

BTE will find paths that join 'chronic myelogenous leukemia' and 'imatinib'. Paths will have 1 intermediate node.

Intermediate node #1 will have these type constraints: Gene


==========
========== QUERY #1 -- fetch all Gene entities linked to 'chronic myelogenous leukemia' ==========
==========

==== Step #1: Query path planning ====

Because chronic myelogenous leukemia is of type 'Disease', BTE will query our meta-KG for APIs that can take 'Disease' as input and 'Gene' as output

BTE found 10 apis:

API 1. semmed_disease(15 API calls)
API 2. cord_disease(1 API call)
API 3. mgi_gene2phenotype(1 API call)
API 4. scibite(1 API call)
API 5. hetio(1 API call)
API 6. mydisease(1 API call)
API 7. DISEASES(1 API call)
API 8. pharos(1 API call)
API 9. scigraph(1 API call)
API 10. biolink(1 API call)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 6.1: http://mydisease.info/v1/query?fields=disgenet.genes_related_to_disease.gene_id (POST -d q=C1292772,C0023473&scopes=mondo.xrefs.umls, disgenet.xrefs.umls)
API 2.1: https://biothings.ncats.io/cord_disease/query?fields=associated_with (POST -d q=DOID:8552&scopes=doid)
API 1.9: https://biothings.ncats.io/semmed/query?fields=disrupted_by (POST -d q=C1292772,C0023473&scopes=umls)
API 1.11: https://biothings.ncats.io/semmed/query?fields=negatively_regulates (POST -d q=C1292772,C0023473&scopes=umls)
API 1.5: https://biothings.ncats.io/semmed/query?fields=physically_interacts_with (POST -d q=C1292772,C0023473&scopes=umls)
API 1.3: https://biothings.ncats.io/semmed/query?fields=prevented_by (POST -d q=C1292772,C0023473&scopes=umls)
API 1.8: https://biothings.ncats.io/semmed/query?fields=derives_from (POST -d q=C1292772,C0023473&scopes=umls)
API 1.4: https://biothings.ncats.io/semmed/query?fields=positively_regulates (POST -d q=C1292772,C0023473&scopes=umls)
API 1.7: https://biothings.ncats.io/semmed/query?fields=positively_regulated_by (POST -d q=C1292772,C0023473&scopes=umls)
API 1.13: https://biothings.ncats.io/semmed/query?fields=disrupts (POST -d q=C1292772,C0023473&scopes=umls)
API 1.15: https://biothings.ncats.io/semmed/query?fields=negatively_regulated_by (POST -d q=C1292772,C0023473&scopes=umls)
API 10.1: https://api.monarchinitiative.org/api/bioentity/disease/MONDO:0011996/genes?rows=200
API 1.1: https://biothings.ncats.io/semmed/query?fields=affects (POST -d q=C1292772,C0023473&scopes=umls)
API 1.14: https://biothings.ncats.io/semmed/query?fields=caused_by (POST -d q=C1292772,C0023473&scopes=umls)
API 1.10: https://biothings.ncats.io/semmed/query?fields=affected_by (POST -d q=C1292772,C0023473&scopes=umls)
API 7.1: https://pending.biothings.io/DISEASES/query?fields=DISEASES.associatedWith (POST -d q=DOID:8552&scopes=DISEASES.doid)
API 3.1: https://pending.biothings.io/mgigene2phenotype/query?fields=_id&size=300 (POST -d q=DOID:8552&scopes=mgi.associated_with_disease.doid)
API 1.2: https://biothings.ncats.io/semmed/query?fields=coexists_with (POST -d q=C1292772,C0023473&scopes=umls)
API 1.6: https://biothings.ncats.io/semmed/query?fields=treated_by (POST -d q=C1292772,C0023473&scopes=umls)
API 1.12: https://biothings.ncats.io/semmed/query?fields=related_to (POST -d q=C1292772,C0023473&scopes=umls)
API 8.1: https://automat.renci.org/pharos/disease/gene/MONDO:0011996
API 5.1: https://automat.renci.org/hetio/disease/gene/MONDO:0011996
API 9.1: https://automat.renci.org/cord19_scigraph_v2/disease/gene/MONDO:0011996
API 4.1: https://automat.renci.org/cord19_scibite_v2/disease/gene/MONDO:0011996


==== Step #3: Output normalization ====

API 3.1 mgi_gene2phenotype: 8 hits
API 1.1 semmed_disease: No hits
API 1.2 semmed_disease: No hits
API 7.1 DISEASES: 3 hits
API 1.3 semmed_disease: 7 hits
API 1.4 semmed_disease: No hits
API 1.5 semmed_disease: No hits
API 1.6 semmed_disease: 133 hits
API 1.7 semmed_disease: No hits
API 8.1 pharos: 7 hits
API 1.8 semmed_disease: No hits
API 9.1 scigraph: 3 hits
API 1.9 semmed_disease: 26 hits
API 6.1 mydisease: 52 hits
API 2.1 cord_disease: 146 hits
API 1.10 semmed_disease: 65 hits
API 4.1 scibite: 12 hits
API 5.1 hetio: No hits
API 1.11 semmed_disease: No hits
API 1.12 semmed_disease: 550 hits
API 1.13 semmed_disease: No hits
API 1.14 semmed_disease: 90 hits
API 10.1 biolink: 4 hits
API 1.15 semmed_disease: No hits

After id-to-object translation, BTE retrieved 759 unique objects.


==========
========== QUERY #2 -- fetch all Gene entities linked to 'imatinib' ==========
==========

==== Step #1: Query path planning ====

Because imatinib is of type 'ChemicalSubstance', BTE will query our meta-KG for APIs that can take 'ChemicalSubstance' as input and 'Gene' as output

BTE found 10 apis:

API 1. mychem(3 API calls)
API 2. ctd(2 API calls)
API 3. semmed_chemical(13 API calls)
API 4. chembio(1 API call)
API 5. scibite(1 API call)
API 6. cord_chemical(1 API call)
API 7. pharos(1 API call)
API 8. scigraph(1 API call)
API 9. hmdb(1 API call)
API 10. dgidb(2 API calls)


==== Step #2: Query path execution ====
NOTE: API requests are dispatched in parallel, so the list of APIs below is ordered by query time.

API 3.4: https://biothings.ncats.io/semmedchemical/query?fields=disrupted_by (POST -d q=C0935989&scopes=umls)
API 3.12: https://biothings.ncats.io/semmedchemical/query?fields=affected_by (POST -d q=C0935989&scopes=umls)
API 2.2: http://ctdbase.org/tools/batchQuery.go?inputType=chem&report=genes_curated&format=json&inputTerms=C097613
API 3.5: https://biothings.ncats.io/semmedchemical/query?fields=related_to (POST -d q=C0935989&scopes=umls)
API 3.2: https://biothings.ncats.io/semmedchemical/query?fields=positively_regulates (POST -d q=C0935989&scopes=umls)
API 3.13: https://biothings.ncats.io/semmedchemical/query?fields=positively_regulated_by (POST -d q=C0935989&scopes=umls)
API 3.8: https://biothings.ncats.io/semmedchemical/query?fields=produces (POST -d q=C0935989&scopes=umls)
API 3.9: https://biothings.ncats.io/semmedchemical/query?fields=produced_by (POST -d q=C0935989&scopes=umls)
API 3.10: https://biothings.ncats.io/semmedchemical/query?fields=negatively_regulates (POST -d q=C0935989&scopes=umls)
API 3.3: https://biothings.ncats.io/semmedchemical/query?fields=negatively_regulated_by (POST -d q=C0935989&scopes=umls)
API 6.1: https://biothings.ncats.io/cord_chemical/query?fields=associated_with (POST -d q=CHEBI:45783&scopes=chebi)
API 1.2: https://mychem.info/v1/query?fields=drugcentral.bioactivity (POST -d q=CHEMBL941,CHEMBL1642&scopes=chembl.molecule_chembl_id)
API 2.1: http://ctdbase.org/tools/batchQuery.go?inputType=chem&report=genes_curated&format=json&inputTerms=D000068877
API 1.1: https://mychem.info/v1/query?fields=drugbank.targets (POST -d q=DB00619&scopes=drugbank.id)
API 1.3: https://mychem.info/v1/query?fields=drugbank.enzymes (POST -d q=DB00619&scopes=drugbank.id)
API 3.7: https://biothings.ncats.io/semmedchemical/query?fields=coexists_with (POST -d q=C0935989&scopes=umls)
API 3.11: https://biothings.ncats.io/semmedchemical/query?fields=affects (POST -d q=C0935989&scopes=umls)
API 10.1: http://dgidb.genome.wustl.edu/api/v2/interactions.json?drugs=CHEMBL941
API 3.6: https://biothings.ncats.io/semmedchemical/query?fields=disrupts (POST -d q=C0935989&scopes=umls)
API 10.2: http://dgidb.genome.wustl.edu/api/v2/interactions.json?drugs=CHEMBL1642
API 3.1: https://biothings.ncats.io/semmedchemical/query?fields=physically_interacts_with (POST -d q=C0935989&scopes=umls)
API 8.1: https://automat.renci.org/cord19_scigraph_v2/chemical_substance/gene/CHEBI:45783
API 5.1: https://automat.renci.org/cord19_scibite_v2/chemical_substance/gene/CHEBI:45783
API 4.1: https://automat.renci.org/chembio/chemical_substance/gene/CHEBI:45783
API 9.1: https://automat.renci.org/hmdb/chemical_substance/gene/CHEBI:45783
API 7.1: https://automat.renci.org/pharos/chemical_substance/gene/CHEBI:45783


==== Step #3: Output normalization ====

API 3.1 semmed_chemical: 300 hits
API 4.1 chembio: No hits
API 1.1 mychem: 9 hits
API 1.2 mychem: 80 hits
API 1.3 mychem: 9 hits
API 5.1 scibite: 8 hits
API 7.1 pharos: 6 hits
API 3.2 semmed_chemical: 98 hits
API 3.3 semmed_chemical: 38 hits
API 3.4 semmed_chemical: No hits
API 3.5 semmed_chemical: No hits
API 8.1 scigraph: 3 hits
API 10.1 dgidb: 34 hits
API 10.2 dgidb: 5 hits
API 9.1 hmdb: No hits
API 3.6 semmed_chemical: No hits
API 3.7 semmed_chemical: 138 hits
API 3.8 semmed_chemical: No hits
API 3.9 semmed_chemical: 12 hits
API 6.1 cord_chemical: 172 hits
API 2.1 ctd: 199 hits
API 2.2 ctd: No hits
API 3.10 semmed_chemical: 171 hits
API 3.11 semmed_chemical: No hits
API 3.12 semmed_chemical: No hits
API 3.13 semmed_chemical: 39 hits

After id-to-object translation, BTE retrieved 730 unique objects.

==========
========== Final assembly of results ==========
==========


BTE found 255 unique intermediate nodes connecting 'chronic myelogenous leukemia' and 'imatinib'

Step 3: Display and Filter results

This section demonstrates post-query filtering done in Python. Later, more advanced filtering functions will be added to the query path execution module for interleaved filtering, thereby enabling longer query paths. More details to come...

First, all matching paths can be exported to a data frame. Let's examine a sample of those results.


In [6]:
df = fc.display_table_view()

# because UMLS is not currently well-integrated in our ID-to-object translation system, removing UMLS-only entries here
patternDel = "^UMLS:C\d+"
filter = df.node1_id.str.contains(patternDel)
df = df[~filter]

print(df.shape)
df.sample(10)


(1621, 16)
Out[6]:
input input_type pred1 pred1_source pred1_api pred1_pubmed node1_type node1_name node1_id pred2 pred2_source pred2_api pred2_pubmed output_type output_name output_id
1125 CHRONIC MYELOGENOUS LEUKEMIA Disease related_to SEMMED SEMMED Disease API 18406870 Gene ABL1 NCBIGene:25 positively_regulated_by SEMMED SEMMED Chemical API 26251899 Gene IMATINIB name:IMATINIB
725 CHRONIC MYELOGENOUS LEUKEMIA Disease related_to None BioLink API None Gene BCR NCBIGene:613 negatively_regulated_by SEMMED SEMMED Chemical API 19307018 Gene IMATINIB name:IMATINIB
816 CHRONIC MYELOGENOUS LEUKEMIA Disease related_to pharos Automat PHAROS API None Gene BCR NCBIGene:613 physically_interacts_with None DGIdb API None Gene IMATINIB name:IMATINIB
1537 CHRONIC MYELOGENOUS LEUKEMIA Disease caused_by SEMMED SEMMED Disease API 1815390,11587372,10676660,10092207,11905636,12... Gene MTTP NCBIGene:4547 coexists_with SEMMED SEMMED Chemical API 12669727,15283151,21505592,23434731,22713161 Gene IMATINIB name:IMATINIB
803 CHRONIC MYELOGENOUS LEUKEMIA Disease related_to scibite Automat CORD19 Scibite API None Gene BCR NCBIGene:613 positively_regulated_by SEMMED SEMMED Chemical API 16343892,22904675,24269846,27444277 Gene IMATINIB name:IMATINIB
1695 CHRONIC MYELOGENOUS LEUKEMIA Disease treated_by SEMMED SEMMED Disease API 23818300,26966074 Gene TP53 NCBIGene:7157 positively_regulates SEMMED SEMMED Chemical API 20094798,23598363,25280212 Gene IMATINIB name:IMATINIB
1532 CHRONIC MYELOGENOUS LEUKEMIA Disease caused_by SEMMED SEMMED Disease API 1815390,11587372,10676660,10092207,11905636,12... Gene MTTP NCBIGene:4547 negatively_regulated_by SEMMED SEMMED Chemical API 23434731 Gene IMATINIB name:IMATINIB
589 CHRONIC MYELOGENOUS LEUKEMIA Disease affected_by SEMMED SEMMED Disease API 8161775,2395384,12829610,16045749,20520635 Gene BCR NCBIGene:613 physically_interacts_with drugcentral MyChem.info API None Gene IMATINIB name:IMATINIB
1890 CHRONIC MYELOGENOUS LEUKEMIA Disease related_to Translator Text Mining Provider CORD Disease API None Gene ESR1 NCBIGene:2099 coexists_with SEMMED SEMMED Chemical API 12783377,12682876,24860788 Gene IMATINIB name:IMATINIB
1417 CHRONIC MYELOGENOUS LEUKEMIA Disease related_to Translator Text Mining Provider CORD Disease API None Gene 7322 HGNC:7322 related_to Translator Text Mining Provider CORD Chemical API None Gene IMATINIB name:IMATINIB

While most results are based on edges from semmed, edges from DGIdb, biolink, disgenet, mydisease.info and drugcentral were also retrieved from their respective APIs.

Next, let's look to see which genes are mentioned the most.


In [7]:
df.node1_name.value_counts().head(10)


Out[7]:
BCR      273
KIT      135
ABL1     126
MTTP      45
AKT1      44
ABCB1     39
TP53      35
CD34      24
ABCG2     24
LYN       21
Name: node1_name, dtype: int64

Not surprisingly, the top two genes that BioThings Explorer found that join imatinib to CML are ABL1 and BCR, the two genes that are fused in the "Philadelphia chromosome", the genetic abnormality that underlies CML, and the validate target of imatinib.

Let's examine some of the PubMed articles linking CML to ABL1 and ABL1 to imatinib.


In [8]:
# fetch all articles connecting 'chronic myelogenous leukemia' and 'ABL1'
articles = []
for info in fc.display_edge_info('chronic myelogenous leukemia', 'ABL1').values():
    if 'pubmed' in info['info']:
        articles += info['info']['pubmed']
print("There are "+str(len(articles))+" articles supporting the edge between CML and ABL1. Sampling of 10 of those:")
x = [print("http://pubmed.gov/"+str(x)) for x in articles[0:10] ]


There are 17 articles supporting the edge between CML and ABL1. Sampling of 10 of those:
http://pubmed.gov/24662807
http://pubmed.gov/26179066
http://pubmed.gov/11979553
http://pubmed.gov/10498618
http://pubmed.gov/10822991
http://pubmed.gov/11368359
http://pubmed.gov/20809971
http://pubmed.gov/18082628
http://pubmed.gov/18243808
http://pubmed.gov/23287430

In [11]:
# fetch all articles connecting 'ABL1' and 'Imatinib
articles = []
for info in fc.display_edge_info('ABL1', 'imatinib').values():
    if 'pubmed' in info['info']:
        articles += info['info']['pubmed']
print("There are "+str(len(articles))+" articles supporting the edge between ABL1 and imatinib. Sampling of 10 of those:")
x = [print("http://pubmed.gov/"+str(x)) for x in articles[0:10] ]


There are 32 articles supporting the edge between ABL1 and imatinib. Sampling of 10 of those:
http://pubmed.gov/15799618
http://pubmed.gov/15917650
http://pubmed.gov/15949566
http://pubmed.gov/16153117
http://pubmed.gov/16205964
http://pubmed.gov/15713800
http://pubmed.gov/15713800
http://pubmed.gov/15713800
http://pubmed.gov/19166098
http://pubmed.gov/26030291

Comparing results between CML and GIST

Let's perform another BioThings Explorer query, this time looking to EXPLAIN the relationship between imatinib and gastrointestinal stromal tumors (GIST), another disease treated by imatinib.


In [12]:
ht = Hint()
# find all potential representations of CML
gist_hint = ht.query("gastrointestinal stromal tumor")
# select the correct representation of CML
gist = gist_hint['Disease'][0]
gist


Out[12]:
{'MONDO': 'MONDO:0011719',
 'DOID': 'DOID:9253',
 'UMLS': 'C3179349',
 'name': 'gastrointestinal stromal tumor',
 'MESH': 'D046152',
 'OMIM': '606764',
 'ORPHANET': '44890',
 'primary': {'identifier': 'MONDO',
  'cls': 'Disease',
  'value': 'MONDO:0011719'},
 'display': 'MONDO(MONDO:0011719) DOID(DOID:9253) OMIM(606764) ORPHANET(44890) UMLS(C3179349) MESH(D046152) name(gastrointestinal stromal tumor)',
 'type': 'Disease'}

In [13]:
fc = FindConnection(input_obj=gist, output_obj=imatinib, intermediate_nodes='Gene')

In [14]:
fc.connect(verbose=False) # skipping the verbose log here

In [15]:
df = fc.display_table_view()

# because UMLS is not currently well-integrated in our ID-to-object translation system, removing UMLS-only entries here
patternDel = "^UMLS:C\d+"
filter = df.node1_id.str.contains(patternDel)
df = df[~filter]

print(df.shape)
df.sample(10)


(1757, 16)
Out[15]:
input input_type pred1 pred1_source pred1_api pred1_pubmed node1_type node1_name node1_id pred2 pred2_source pred2_api pred2_pubmed output_type output_name output_id
1076 GASTROINTESTINAL STROMAL SARCOMA Disease related_to SEMMED SEMMED Disease API 17193822,16900856,15838387,15685537,15917417,1... Gene PDGFRA NCBIGene:5156 negatively_regulates SEMMED SEMMED Chemical API 24963404,28020350,28768491,17437861 Gene IMATINIB name:IMATINIB
463 GASTROINTESTINAL STROMAL SARCOMA Disease related_to DISEASE DISEASES API None Gene KIT NCBIGene:3815 negatively_regulated_by SEMMED SEMMED Chemical API 20425130,20043176,20109338,26722383 Gene IMATINIB name:IMATINIB
1030 GASTROINTESTINAL STROMAL SARCOMA Disease related_to None BioLink API None Gene PDGFRA NCBIGene:5156 positively_regulates SEMMED SEMMED Chemical API 21828142 Gene IMATINIB name:IMATINIB
715 GASTROINTESTINAL STROMAL SARCOMA Disease treated_by SEMMED SEMMED Disease API 21666577 Gene KIT NCBIGene:3815 coexists_with SEMMED SEMMED Chemical API 17438095,20975605,23787115 Gene IMATINIB name:IMATINIB
1247 GASTROINTESTINAL STROMAL SARCOMA Disease related_to SEMMED SEMMED Disease API 15297464 Gene FLT3 NCBIGene:2322 physically_interacts_with SEMMED SEMMED Chemical API 14976243 Gene IMATINIB name:IMATINIB
483 GASTROINTESTINAL STROMAL SARCOMA Disease related_to disgenet mydisease.info API None Gene KIT NCBIGene:3815 negatively_regulated_by SEMMED SEMMED Chemical API 20425130,20043176,20109338,26722383 Gene IMATINIB name:IMATINIB
617 GASTROINTESTINAL STROMAL SARCOMA Disease caused_by SEMMED SEMMED Disease API 12938260,10779223,10779223,11706520,27771813 Gene KIT NCBIGene:3815 physically_interacts_with SEMMED SEMMED Chemical API 12441322,15846297,16797704,21586300,21249321,2... Gene IMATINIB name:IMATINIB
1372 GASTROINTESTINAL STROMAL SARCOMA Disease treated_by SEMMED SEMMED Disease API 20489620,26098203 Gene PDGFA NCBIGene:5154 negatively_regulates SEMMED SEMMED Chemical API 18312355 Gene IMATINIB name:IMATINIB
603 GASTROINTESTINAL STROMAL SARCOMA Disease related_to disgenet mydisease.info API None Gene KIT NCBIGene:3815 physically_interacts_with SEMMED SEMMED Chemical API 12481435,12969987,16087693,20109338,25985771 Gene IMATINIB name:IMATINIB
888 GASTROINTESTINAL STROMAL SARCOMA Disease related_to SEMMED SEMMED Disease API 12672043,16077968,17582306,18486988,21300610 Gene KIT NCBIGene:3815 related_to CTD CTD API 21295132 Gene IMATINIB name:IMATINIB

In [16]:
df.node1_name.value_counts().head(10)


Out[16]:
KIT       594
PDGFRA    192
BCR        63
ABL1       28
TP53       28
VEGFA      24
CD34       24
EGFR       18
MTTP       18
BRAF       16
Name: node1_name, dtype: int64

Here, the top two genes that BioThings Explorer found that join imatinib to GIST are PDGFRA and KIT, the most commonly mutated genes found in GIST and validated targets of imatinib.

While several of the listed genes would be considered positive controls, others on the list could be viewed as testable hypotheses and discovery opportunities to be evaluated by domain experts.

Conclusions and caveats

This notebook demonstrated the use of BioThings Explorer in EXPLAIN mode to investigate the relationship between imatinib and two diseases that it treats -- chronic myelogenous leukemia (CML) and gastrointestinal stromal tumors (GIST). In each case, BioThings Explorer autonomously queried a distributed knowledge graph of biomedical APIs to find the most common genes, and in each case the relevant targets were retrieved.

There are still many areas for improvement (and some areas in which BioThings Explorer is still buggy). And of course, BioThings Explorer is dependent on the accessibility of the APIs that comprise the distributed knowledge graph. Nevertheless, we encourage users to try other variants of the EXPLAIN queries demonstrated in this notebook.