Fear of Bees: Extracting Ontologies from Wikidata

Wikidata includes links between entities using predicates such as SubClassOf (P279). These form a classification hierarchy, although as this comes from multiple sources, it may not conform to the same rules as ontology hierarchies.

OntoBio includes a wikidata ontology factory, so we can transparently create an Ontology object from wikidata, and leverage the same methods available in ontobio.

This example is focused around Anxiety disorders


In [1]:
from ontobio.ontol_factory import OntologyFactory
f = OntologyFactory()

## OntologyFactory recognizes the prefix wdq for wikidata queries;
## We use this to make a sub-ontology
## (currently we have no lazy wrapper for WD, only Eager, so we limit the size)
ont = f.create('wdq:Q544006') # Anxiety disorder


WARNING:rdflib.term:  does not look like a valid URI, trying to serialize this will break.

In [2]:
## Find terms starting with Anxiety in the sub-ontology
qids = ont.search('Anxiety%')
qids


Out[2]:
[rdflib.term.URIRef('http://www.wikidata.org/entity/Q544006')]

In [3]:
## Traverse up and down from query node in our sub-ontology
nodes = ont.traverse_nodes(qids, up=True, down=True)
labels = [ont.label(n) for n in nodes]
labels[:25]


Out[3]:
['Aktualneurosen',
 'cognitive disorder',
 'Anti-French sentiment in the United States',
 'acarophobia',
 'Organic disease',
 'identifier',
 'Alektorophobia',
 'Katagelasticism',
 'answer',
 'Counterphobic attitude',
 'compulsive act',
 'physical condition',
 'Piblokto',
 'blood phobia',
 'category of being',
 'Childhood phobias',
 'ability',
 'disposition',
 'Entomophobia',
 'physiological condition',
 'property',
 'Cynophobia',
 'neurosis effects',
 'bowel-control anxiety',
 'Anxiety disorder']

In [16]:
## Test for cycles
import networkx as nx
g = ont.get_graph()
def show_cycle(nl):
    print(["{} {}".format(n, ont.label(n)) for n in nl])

cycles_list = list(nx.simple_cycles(g))
show_cycle(cycles_list[0])


['http://www.wikidata.org/entity/Q1347367 ability', 'http://www.wikidata.org/entity/Q151885 concept', 'http://www.wikidata.org/entity/Q9081 knowledge', 'http://www.wikidata.org/entity/Q3695082 sign', 'http://www.wikidata.org/entity/Q853614 identifier', 'http://www.wikidata.org/entity/Q937228 property']

In [5]:
## Show our extract of the sub-ontology as an ascii tree
## (note this is resilient to cycles)

## only traverse down from our query nodes
## (including ancestors causes multiple paths, and a verbose display)
nodes = ont.traverse_nodes(qids, up=False, down=True)

from ontobio.io.ontol_renderers import GraphRenderer
w = GraphRenderer.create('tree')
w.write_subgraph(ont, nodes, query_ids=qids)


. http://www.wikidata.org/entity/Q544006 ! Anxiety disorder * 
 % http://www.wikidata.org/entity/Q741713 ! panic disorder
 % http://www.wikidata.org/entity/Q6374996 ! Katagelasticism
 % http://www.wikidata.org/entity/Q845224 ! generalized anxiety disorder
 % http://www.wikidata.org/entity/Q377493 ! selective mutism
  % http://www.wikidata.org/entity/Q5354941 ! Elective mutism
 % http://www.wikidata.org/entity/Q202387 ! post-traumatic stress disorder
 % http://www.wikidata.org/entity/Q10547816 ! Counterphobic attitude
 % http://www.wikidata.org/entity/Q13604751 ! lovesickness
 % http://www.wikidata.org/entity/Q1316515 ! School refusal
 % http://www.wikidata.org/entity/Q4386741 ! Olfactory Reference Syndrome
 % http://www.wikidata.org/entity/Q424221 ! acute stress disorder
  % http://www.wikidata.org/entity/Q1482034 ! combat disorder
  % http://www.wikidata.org/entity/Q18967153 ! mixed disorder as reaction to stress
  % http://www.wikidata.org/entity/Q18967156 ! acute stress reaction with predominant disturbance of consciousness
 % http://www.wikidata.org/entity/Q178190 ! obsessive-compulsive disorder
  % http://www.wikidata.org/entity/Q7458802 ! Sexual obsessions
  % http://www.wikidata.org/entity/Q231624 ! compulsive act
  % http://www.wikidata.org/entity/Q7310756 ! Relationship obsessive–compulsive disorder
 % http://www.wikidata.org/entity/Q19000444 ! neurotic disorder
  % http://www.wikidata.org/entity/Q181032 ! neurosis effects
   % http://www.wikidata.org/entity/Q144119 ! hysteria
    % http://www.wikidata.org/entity/Q336203 ! Abwehrhysterie
    % http://www.wikidata.org/entity/Q1779438 ! Piblokto
   % http://www.wikidata.org/entity/Q423509 ! Aktualneurosen
 % http://www.wikidata.org/entity/Q2300749 ! separation anxiety disorder
 % http://www.wikidata.org/entity/Q19000931 ! organic anxiety disorder
 % http://www.wikidata.org/entity/Q175854 ! phobia
  % http://www.wikidata.org/entity/Q560107 ! Tryophobia
  % http://www.wikidata.org/entity/Q1343559 ! ochlophobia
  % http://www.wikidata.org/entity/Q980010 ! Tokophobia
  % http://www.wikidata.org/entity/Q5097985 ! Childhood phobias
  % http://www.wikidata.org/entity/Q909355 ! Francophobia
   % http://www.wikidata.org/entity/Q3427834 ! Anti-French sentiment in the United States
  % http://www.wikidata.org/entity/Q174589 ! agoraphobia
  % http://www.wikidata.org/entity/Q22906231 ! Afrophobia
  % http://www.wikidata.org/entity/Q1363791 ! erythrophobia
  % http://www.wikidata.org/entity/Q13 ! triskaidekaphobia
  % http://www.wikidata.org/entity/Q2015728 ! specific phobia
   % http://www.wikidata.org/entity/Q944108 ! animal phobia
    % http://www.wikidata.org/entity/Q619261 ! Ornithophobia
    % http://www.wikidata.org/entity/Q4694196 ! Agrizoophobia
    % http://www.wikidata.org/entity/Q3321265 ! Fear of fish
    % http://www.wikidata.org/entity/Q596505 ! Ophidiophobia
    % http://www.wikidata.org/entity/Q4422074 ! Vermiphobia
    % http://www.wikidata.org/entity/Q405385 ! Ailurophobia
    % http://www.wikidata.org/entity/Q4297397 ! Fear of frogs
    % http://www.wikidata.org/entity/Q2319444 ! Herpetophobia
    % http://www.wikidata.org/entity/Q38579 ! Cynophobia
    % http://www.wikidata.org/entity/Q5384517 ! Equinophobia
    % http://www.wikidata.org/entity/Q2157130 ! Entomophobia
     % http://www.wikidata.org/entity/Q2160101 ! Fear of bees
     % http://www.wikidata.org/entity/Q2822642 ! acarophobia
    % http://www.wikidata.org/entity/Q220783 ! arachnophobia
    % http://www.wikidata.org/entity/Q3440772 ! Fear of mice
    % http://www.wikidata.org/entity/Q16002436 ! Alektorophobia
    % http://www.wikidata.org/entity/Q5439392 ! Fear of bats
   % http://www.wikidata.org/entity/Q3381344 ! Blood-injection-injury type phobia
    % http://www.wikidata.org/entity/Q886731 ! blood phobia
    % http://www.wikidata.org/entity/Q6034425 ! Injury phobia
    % http://www.wikidata.org/entity/Q169922 ! Fear of needles
   % http://www.wikidata.org/entity/Q1127417 ! flying phobia
   % http://www.wikidata.org/entity/Q3052614 ! nosophobia
    % http://www.wikidata.org/entity/Q18557105 ! cancerophobia
    % http://www.wikidata.org/entity/Q18557109 ! AIDS phobia
  % http://www.wikidata.org/entity/Q281928 ! social phobia
   % http://www.wikidata.org/entity/Q17147649 ! Specific social phobia
    % http://www.wikidata.org/entity/Q1335831 ! paruresis
    % http://www.wikidata.org/entity/Q612851 ! Telephone phobia
    % http://www.wikidata.org/entity/Q7136497 ! Parcopresis
    % http://www.wikidata.org/entity/Q2540262 ! Glossophobia
   % http://www.wikidata.org/entity/Q3219948 ! bowel-control anxiety
  % http://www.wikidata.org/entity/Q168995 ! Surdophobia
  % http://www.wikidata.org/entity/Q1131359 ! Amaxophobia



In [6]:
## Show as graph using GraphViz
## We can do this for both descendants and ancestors
nodes = ont.traverse_nodes(qids, up=True, down=True)

w = GraphRenderer.create('png')
w.outfile = 'output/anxiety-disorder.png'
w.write_subgraph(ont, nodes, query_ids=qids)

Querying for associated entities

TODO: Drugs


In [4]:
## What proteins are associated with PTSD? (via GWAS)
[ptsd] = ont.search('post-traumatic stress disorder')
import ontobio.sparql.wikidata as wd
proteins = wd.canned_query('disease2protein', ptsd)

In [5]:
proteins


Out[5]:
['UniProtKB:Q92831',
 'UniProtKB:P17252',
 'UniProtKB:Q8N9K7',
 'UniProtKB:O75899',
 'UniProtKB:Q92597',
 'UniProtKB:P40145',
 'UniProtKB:Q9HA38',
 'UniProtKB:P42658',
 'UniProtKB:Q9Y243',
 'UniProtKB:Q9NUQ9',
 'UniProtKB:Q9P272',
 'UniProtKB:Q9BY07',
 'UniProtKB:O43897',
 'UniProtKB:A0A024R9G4',
 'UniProtKB:Q4F7X0',
 'UniProtKB:E5RIR1',
 'UniProtKB:Q8IYG9',
 'UniProtKB:A7E2E4']

In [10]:
## Find GO terms for all genes/products associated with all nodes in Anxiety sub-ontology

## First create a GO handle and get association sets for GO (in human)
go = f.create('go')

from ontobio.assoc_factory import AssociationSetFactory
afactory = AssociationSetFactory()
aset = afactory.create(ontology=go,
                       subject_category='gene',
                       object_category='function',
                       taxon='NCBITaxon:9606')

In [19]:
for n in ont.nodes():
    proteins = wd.canned_query('disease2protein', n)
    anns = [a for p in proteins for a in aset.annotations(p)]
    if len(anns) > 0:
        print("{} {}".format(n,ont.label(n)))
        for a in anns:
            print("  {} {}".format(a, go.label(a)))


http://www.wikidata.org/entity/Q202387 post-traumatic stress disorder
  GO:0007616 long-term memory
  GO:0006171 cAMP biosynthetic process
  GO:0007193 adenylate cyclase-inhibiting G-protein coupled receptor signaling pathway
  GO:0016021 integral component of membrane
  GO:0005524 ATP binding
  GO:0003091 renal water homeostasis
  GO:0005886 plasma membrane
  GO:0004016 adenylate cyclase activity
  GO:0004383 guanylate cyclase activity
  GO:0006182 cGMP biosynthetic process
  GO:0007165 signal transduction
  GO:0007190 activation of adenylate cyclase activity
  GO:0008294 calcium- and calmodulin-responsive adenylate cyclase activity
  GO:0008074 guanylate cyclase complex, soluble
  GO:0007189 adenylate cyclase-activating G-protein coupled receptor signaling pathway
  GO:0046872 metal ion binding
  GO:0007611 learning or memory
  GO:0071377 cellular response to glucagon stimulus
  GO:0016020 membrane
  GO:0035556 intracellular signal transduction
  GO:0034199 activation of protein kinase A activity
  GO:0008198 ferrous iron binding
  GO:0016706 oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, 2-oxoglutarate as one donor, and incorporation of one atom each of oxygen into both donors
  GO:0005634 nucleus
  GO:0005737 cytoplasm
  GO:0055114 oxidation-reduction process
  GO:0016300 tRNA (uracil) methyltransferase activity
  GO:0030488 tRNA methylation
  GO:0002098 tRNA wobble uridine modification
  GO:0000049 tRNA binding
  GO:0006400 tRNA modification
  GO:0008175 tRNA methyltransferase activity
http://www.wikidata.org/entity/Q741713 panic disorder
  GO:0003713 transcription coactivator activity
  GO:0030374 ligand-dependent nuclear receptor transcription coactivator activity
  GO:0043565 sequence-specific DNA binding
  GO:0044212 transcription regulatory region DNA binding
  GO:0005515 protein binding
  GO:0005634 nucleus
  GO:0007165 signal transduction
  GO:0045893 positive regulation of transcription, DNA-templated
  GO:0003682 chromatin binding
  GO:0001047 core promoter binding
  GO:0003712 transcription cofactor activity
  GO:0008022 protein C-terminus binding
  GO:0043231 intracellular membrane-bounded organelle
  GO:0045944 positive regulation of transcription from RNA polymerase II promoter
  GO:0030518 intracellular steroid hormone receptor signaling pathway
  GO:0006351 transcription, DNA-templated
  GO:0008013 beta-catenin binding
  GO:0070016 armadillo repeat domain binding
  GO:0010628 positive regulation of gene expression
  GO:0016055 Wnt signaling pathway
  GO:0005829 cytosol
  GO:0000790 nuclear chromatin

In [ ]: