DBpedia Schema Queries

Overview

In this notebook, I begin the process of analyzing the schema of the DBpedia Ontology. This is a local notebook in which I load data from the filesystem into an in-memory graph, thus it is part of the unit tests for gastrodon. This is feasible because the schema is much smaller than DBpedia as a whole.

The following diagram illustrates the relationships between DBpedia Ontology, it's parts, DBpedia, and the world it describes

The numbers above are really rough (by as much as 30 orders of magnitude)), but some key points are:

  • The DBpedia Ontology has it's own Ontology, which is a subset of RDFS, OWL, Dublin Core, Prov and similar vocabularies
  • The DBpedia Ontology is much smaller (thousands of times) than DBpedia itself
  • DBpedia does not directly describe the external universe, but instead describes Wikipedia, which itself describes the universe.

It's important to keep these layers straight, because in this notebook, we are looking at a description of the vocabulary used in DBpedia that uses RDFS, OWL, etc. vocabulary. RDF is unusual among data representations in that schemas in RDF are themselves written in RDF, and can be joined together with the data they describe. In this case, however, I've separated out a small amount of schema data that I intend to use to control operations against a larger database, much like the program of a numerically controlled machine tool or the punched cards that control a Jacquard Loom.

This notebook is part of the test suite for the Gastrodon framework, and a number of bugs were squashed and API improvements made in the process of creating it. It will be of use to anyone who wants to better understand RDF, SPARQL, DBPedia, Pandas, and how to put it all together with Gastrodon.

Setup

As always, I import names from the Python packages I use:


In [1]:
%matplotlib inline

import sys
from rdflib import Graph,URIRef
from gastrodon import LocalEndpoint,one,QName
import gzip
import pandas as pd
pd.set_option("display.width",100)
pd.set_option("display.max_colwidth",80)

Loading the graph

I Zopfli compressed a copy of the DBpedia Ontology from version 2015-10, so I can load it like so:


In [2]:
g=Graph()

In [3]:
g.parse(gzip.open("data/dbpedia_2015-10.nt.gz"),format="nt")


Out[3]:
<Graph identifier=N1ff69bed952145658b2800fb345c46b7 (<class 'rdflib.graph.Graph'>)>

Now it is loaded in memory in an RDF graph which I can do SPARQL queries on; think of it as a hashtable on steroids. I can get the size of the graph (number of triples) the same way I would get the size of any Python object:


In [4]:
len(g)


Out[4]:
30318

The Graph is supplied by RDFLib, but I wrap it in an Endpoint object supplied by Gastrodon; this provides a bridge between RDFLib and pandas as well as smoothing away the difference between a local endpoint and remote endpoints (a SPARQL database running in another process or on another computer)


In [5]:
e=LocalEndpoint(g)

Counting properties and discovering namespaces

Probably the best query to run on an unfamiliar database is to count the properties (predicates) used in it. Note that the predicates that I'm working with in this stage are the predicates that make up the DBpedia Ontology, they are not the predicates that are used in the larger DBpedia Ontology. I'll show you those later.


In [6]:
properties=e.select("""
   SELECT ?p (COUNT(*) AS ?cnt) {
      ?s ?p ?o .
   } GROUP BY ?p ORDER BY DESC(?cnt)
""")
properties


Out[6]:
cnt
p
rdfs:label 11645
rdf:type 6681
http://www.w3.org/ns/prov#wasDerivedFrom 3434
rdfs:range 2558
rdfs:domain 2407
rdfs:comment 1208
rdfs:subPropertyOf 971
rdfs:subClassOf 748
http://www.w3.org/2002/07/owl#equivalentClass 407
http://www.w3.org/2002/07/owl#equivalentProperty 222
http://www.w3.org/2002/07/owl#disjointWith 24
http://creativecommons.org/ns#license 2
http://purl.org/vocab/vann/preferredNamespacePrefix 1
http://purl.org/dc/terms/description 1
http://www.w3.org/2002/07/owl#versionInfo 1
http://purl.org/dc/terms/title 1
http://purl.org/dc/terms/modified 1
http://purl.org/dc/terms/source 1
http://xmlns.com/foaf/0.1/homepage 1
http://purl.org/dc/terms/publisher 1
http://purl.org/dc/terms/creator 1
http://purl.org/vocab/vann/preferredNamespaceUri 1
http://purl.org/dc/terms/issued 1

Note that the leftmost column is bold; this is because gastrodon recognized that this query groups on the ?p variable and it made this an index of the pandas dataframe. Gastrodon uses the SPARQL parser from RDFLib to understand your queries to support you in writing and displaying them. One advantage of this is that if you want to make a plot from the above data frame (which I'll do in a moment after cleaning the data) the dependent and independent variables will be automatically determined and things will 'just work'.

Another thing to note is that the table shows short names such as rdfs:label as well as full URIs for predicates. The full URIs are tedious to work with, so I add a number of namespace declarations and make a new LocalEndpoint


In [7]:
g.bind("prov","http://www.w3.org/ns/prov#")
g.bind("owl","http://www.w3.org/2002/07/owl#")
g.bind("cc","http://creativecommons.org/ns#")
g.bind("foaf","http://xmlns.com/foaf/0.1/")
g.bind("dc","http://purl.org/dc/terms/")
g.bind("vann","http://purl.org/vocab/vann/")

In [8]:
e=LocalEndpoint(g)
properties=e.select("""
   SELECT ?p (COUNT(*) AS ?cnt) {
      ?s ?p ?o .
   } GROUP BY ?p ORDER BY DESC(?cnt)
""")
properties


Out[8]:
cnt
p
rdfs:label 11645
rdf:type 6681
prov:wasDerivedFrom 3434
rdfs:range 2558
rdfs:domain 2407
rdfs:comment 1208
rdfs:subPropertyOf 971
rdfs:subClassOf 748
owl:equivalentClass 407
owl:equivalentProperty 222
owl:disjointWith 24
cc:license 2
vann:preferredNamespacePrefix 1
dc:description 1
owl:versionInfo 1
dc:title 1
dc:modified 1
dc:source 1
foaf:homepage 1
dc:publisher 1
dc:creator 1
vann:preferredNamespaceUri 1
dc:issued 1

Metadata about the DBpedia Ontology

I find it suspicious that so many properties occur only once, so I investigate:


In [9]:
single=e.select("""
   SELECT ?s {
      ?s dc:source ?o .
    }
""")
single


Out[9]:
s
0 http://dbpedia.org/ontology/

The one function will extract the single member of any list, iterable, DataFrame, or Series that has just one member.


In [10]:
ontology=one(single)

The select function can see variables in the stack frame that calls it. Simply put, if you use the ?_ontology variable in a SPARQL query, select will look for a Python variable called ontology, and substitute the value of ontology into ?_ontology. The underscore sigil prevents substitutions from happening by accident.

and we see here a number of facts about the DBpedia Ontology, that is, the data set we are working with.


In [11]:
meta=e.select("""
    SELECT ?p ?o {
        ?_ontology ?p ?o .
    } ORDER BY ?p
""")
meta


Out[11]:
p o
0 cc:license http://creativecommons.org/licenses/by-sa/3.0/
1 cc:license http://www.gnu.org/copyleft/fdl.html
2 dc:creator DBpedia Maintainers and Contributors
3 dc:description \n The DBpedia ontology provides the classes and properties use...
4 dc:issued 2008-11-17T12:00Z
5 dc:modified 2015-11-02T09:36Z
6 dc:publisher DBpedia Maintainers
7 dc:source http://mappings.dbpedia.org
8 dc:title The DBpedia Ontology
9 vann:preferredNamespacePrefix dbo
10 vann:preferredNamespaceUri http://dbpedia.org/ontology/
11 rdf:type owl:Ontology
12 rdf:type http://purl.org/vocommons/voaf#Vocabulary
13 rdfs:comment \n This ontology is generated from the manually created specifi...
14 owl:versionInfo 4.1-SNAPSHOT
15 foaf:homepage http://wiki.dbpedia.org/Ontology

In [12]:
ontology


Out[12]:
rdflib.term.URIRef('http://dbpedia.org/ontology/')

How Gastrodon handles URI References

Gastrodon tries to display things in a simple way while watching your back to prevent mistakes. One potential mistake is that RDF makes a distinction between a literal string such as "http://dbpedia.org/ontology/" and a URI references such as <http://dbpedia.org/ontology/>. Use the wrong one and your queries won't work!

This particularly could be a problem with abbreviated names, for instance, let's look at the first predicate in the meta frame. When displayed in as a result of Jupyter notebook or in a Pandas Dataframe, the short name looks just like a string:


In [13]:
license=meta.at[0,'p']
license


Out[13]:
'cc:license'

that's because it is a string! It's more than a string, however, it is a class which is a subclass of string:


In [14]:
type(license)


Out[14]:
gastrodon.GastrodonURI

and in fact has the full URI reference hidden away inside of it


In [15]:
meta.at[0,'p'].to_uri_ref()


Out[15]:
rdflib.term.URIRef('http://creativecommons.org/ns#license')

When you access this value in a SPARQL query, the select function recognizes the type of the variable and automatically inserts the full URI reference


In [16]:
e.select("""
    SELECT ?s ?o {
        ?s ?_license ?o .
    }
""")


Out[16]:
s o
0 http://dbpedia.org/ontology/ http://creativecommons.org/licenses/by-sa/3.0/
1 http://dbpedia.org/ontology/ http://www.gnu.org/copyleft/fdl.html

Counting properties that are not about the Ontology

Since the metadata properties that describe this dataset really aren't part of it, it makes sense to remove these from the list so that we don't have so many properties that are used just once


In [17]:
properties=e.select("""
   SELECT ?p (COUNT(*) AS ?cnt) {
      ?s ?p ?o .
      FILTER(?s!=?_ontology)
   } GROUP BY ?p ORDER BY DESC(?cnt)
""")
properties


Out[17]:
cnt
p
rdfs:label 11645
rdf:type 6679
prov:wasDerivedFrom 3434
rdfs:range 2558
rdfs:domain 2407
rdfs:comment 1207
rdfs:subPropertyOf 971
rdfs:subClassOf 748
owl:equivalentClass 407
owl:equivalentProperty 222
owl:disjointWith 24

At this point it is about as easy to make a pie chart as it is with Excel. A pie chart is a good choice here because each fact has exactly one property in it:


In [18]:
properties["cnt"].plot.pie(figsize=(6,6)).set_ylabel('')


Out[18]:
Text(0,0.5,'')

My favorite method for understanding this kind of distribution is to sort the most common properties first and then compute the Cumulative Distribution Function, which is the percentage of facts that have used the predicates we've seen so far.

This is easy to compute with Pandas


In [19]:
100.0*properties["cnt"].cumsum()/properties["cnt"].sum()


Out[19]:
p
rdfs:label                 38.429807
rdf:type                   60.471256
prov:wasDerivedFrom        71.803841
rdfs:range                 80.245528
rdfs:domain                88.188898
rdfs:comment               92.172134
rdfs:subPropertyOf         95.376543
rdfs:subClassOf            97.845027
owl:equivalentClass        99.188172
owl:equivalentProperty     99.920797
owl:disjointWith          100.000000
Name: cnt, dtype: float64

Note that this result looks different than the DataFrames you've seen so far because it is not a DataFrame, it is a series, which has just one index column and one data column. It's possible to stick several series together to make a DataFrame, however.


In [20]:
pd.DataFrame.from_items([
    ('count',properties["cnt"]),
    ("frequency",100.0*properties["cnt"]/properties["cnt"].sum()),
    ("distribution",100.0*properties["cnt"].cumsum()/properties["cnt"].sum())
])


Out[20]:
count frequency distribution
p
rdfs:label 11645 38.429807 38.429807
rdf:type 6679 22.041449 60.471256
prov:wasDerivedFrom 3434 11.332585 71.803841
rdfs:range 2558 8.441687 80.245528
rdfs:domain 2407 7.943370 88.188898
rdfs:comment 1207 3.983235 92.172134
rdfs:subPropertyOf 971 3.204409 95.376543
rdfs:subClassOf 748 2.468484 97.845027
owl:equivalentClass 407 1.343146 99.188172
owl:equivalentProperty 222 0.732625 99.920797
owl:disjointWith 24 0.079203 100.000000

Unlike many graphical depictions, the above chart is fair to both highly common and unusually rare predicates.

Languages

It makes sense to start with rdf:label, which is the most common property in this database.

Unlike many data formats, RDF supports language tagging for strings. Objects (mainly properties and classes used in DBpedia) that are talked about in the DBpedia Ontology are described in multiple human languages, and counting the language tags involves a query that is very similar to the property counting query:


In [21]:
e.select("""
   SELECT (LANG(?label) AS ?lang) (COUNT(*) AS ?cnt) {
      ?s rdfs:label ?label .
   } GROUP BY LANG(?label) ORDER BY DESC(?cnt)
""")


Out[21]:
lang cnt
0 en 3953
1 de 2049
2 nl 1296
3 el 1227
4 fr 755
5 ga 469
6 ja 374
7 sr 259
8 es 256
9 it 244
10 ko 237
11 pt 221
12 pl 120
13 gl 39
14 tr 27
15 sl 26
16 ca 26
17 ru 19
18 bg 11
19 zh 11
20 id 6
21 ar 5
22 bn 4
23 be 4
24 eu 4
25 hy 1
26 lv 1
27 cs 1

A detail you might notice is that the lang column is not bolded, instead, a sequential numeric index was created when I made the data frame. This is because Gastrodon, at this moment, isn't smart enough to understand a function that appears in the GROUP BY clause.

This is easy to work around by assinging the output of this function to a variable in a BIND clause.


In [22]:
lang=e.select("""
   SELECT ?lang (COUNT(*) AS ?cnt) {
      ?s rdfs:label ?label .
      BIND (LANG(?label) AS ?lang)
   } GROUP BY ?lang ORDER BY DESC(?cnt)
""")
lang


Out[22]:
cnt
lang
en 3953
de 2049
nl 1296
el 1227
fr 755
ga 469
ja 374
sr 259
es 256
it 244
ko 237
pt 221
pl 120
gl 39
tr 27
sl 26
ca 26
ru 19
bg 11
zh 11
id 6
ar 5
bn 4
be 4
eu 4
hy 1
lv 1
cs 1

One key to getting correct results in a data analysis is to test your assumptions. English is the most prevalent language by far, but can we assume that every object has an English name? There are 3593 objects with English labels, but


In [23]:
distinct_s=one(e.select("""
   SELECT (COUNT(DISTINCT ?s) AS ?cnt) {
      ?s rdfs:label ?o .
   }
"""))
distinct_s


Out[23]:
3954

objects with labels overall, so there must be at least one object without an English label. SPARQL has negation operators so we can find objects like that:


In [24]:
black_sheep=one(e.select("""
   SELECT ?s {
      ?s rdfs:label ?o .
      FILTER NOT EXISTS {
          ?s rdfs:label ?o2 .
          FILTER(LANG(?o2)='en')
      }
   }
"""))
black_sheep


Out[24]:
rdflib.term.URIRef('http://dbpedia.org/ontology/hasSurfaceForm')

Looking up all the facts for that object (which is a property used in DBpedia) shows that it has a name in greek, but not any other language


In [25]:
meta=e.select("""
    SELECT ?p ?o {
        ?_black_sheep ?p ?o .
    } ORDER BY ?p
""")
meta


Out[25]:
p o
0 rdf:type rdf:Property
1 rdf:type owl:DatatypeProperty
2 rdfs:comment Reserved for DBpedia.
3 rdfs:label επιφάνεια από
4 rdfs:range xsd:string
5 prov:wasDerivedFrom http://mappings.dbpedia.org/index.php/OntologyProperty:hasSurfaceForm

I guess that's the exception that proves the rule. Everything else has a name in English, about half of the schema objects have a name in German, and the percentage falls off pretty rapidly from there:


In [26]:
lang_coverage=100*lang["cnt"]/distinct_s
lang_coverage


Out[26]:
lang
en    99.974709
de    51.820941
nl    32.776935
el    31.031866
fr    19.094588
ga    11.861406
ja     9.458776
sr     6.550329
es     6.474456
it     6.170966
ko     5.993930
pt     5.589277
pl     3.034901
gl     0.986343
tr     0.682853
sl     0.657562
ca     0.657562
ru     0.480526
bg     0.278199
zh     0.278199
id     0.151745
ar     0.126454
bn     0.101163
be     0.101163
eu     0.101163
hy     0.025291
lv     0.025291
cs     0.025291
Name: cnt, dtype: float64

As the percentages add up to more than 100 (an object can have names in many languages), the pie chart would be a wrong choice, but a bar chart is effective.


In [27]:
lang_coverage.plot(kind="barh",figsize=(10,6))


Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x17ccba20780>

Classes used in the DBpedia Ontology

I use another GROUP BY query to count classes used in the DBpedia Ontology. In the name of keeping the levels of abstraction straight, I'll point out that there are eight classes in the DBpedia Ontology, but that the DBpedia Ontology describes 739 classes used in DBpedia.


In [28]:
types=e.select("""
   SELECT ?type (COUNT(*) AS ?cnt) {
      ?s a ?type .
   } GROUP BY ?type ORDER BY DESC(?cnt)
""")
types


Out[28]:
cnt
type
rdf:Property 2695
owl:DatatypeProperty 1734
owl:ObjectProperty 1099
owl:Class 739
rdfs:Datatype 382
owl:FunctionalProperty 30
owl:Ontology 1
http://purl.org/vocommons/voaf#Vocabulary 1

739 classes are really a lot of classes! You personally might be interested in some particular domain (say Pop Music) but to survey the whole thing, I need some way to pick out classes which are important.

If I had access to the whole DBpedia database, I could count how many instances of these classes occur, and that would be one measure of importance. (I have access to this database, as do you, but I'm not using it for this notebook because I want this notebook to be self-contained)

As it is, one proxy for importance is how many properties apply to a particular class, or, in RDFS speak, how many properties have this class as the domain. The assumption here is that important classes are well documented, and we get a satisfying list of the top 20 classes this way


In [29]:
types=e.select("""
   SELECT ?s (COUNT(*) AS ?cnt) {
      ?s a owl:Class .
      ?p rdfs:domain ?s .
   } GROUP BY ?s ORDER BY DESC(?cnt) LIMIT 20
""")
types


Out[29]:
cnt
s
http://dbpedia.org/ontology/Person 253
http://dbpedia.org/ontology/Place 183
http://dbpedia.org/ontology/PopulatedPlace 151
http://dbpedia.org/ontology/Athlete 94
http://dbpedia.org/ontology/Settlement 56
http://dbpedia.org/ontology/School 47
http://dbpedia.org/ontology/SpaceMission 43
http://dbpedia.org/ontology/Island 38
http://dbpedia.org/ontology/MilitaryUnit 29
http://dbpedia.org/ontology/Planet 27
http://dbpedia.org/ontology/Organisation 27
http://dbpedia.org/ontology/Work 27
http://dbpedia.org/ontology/Species 27
http://dbpedia.org/ontology/Spacecraft 25
http://dbpedia.org/ontology/Broadcaster 25
http://dbpedia.org/ontology/MeanOfTransportation 25
http://dbpedia.org/ontology/Film 24
http://dbpedia.org/ontology/ArchitecturalStructure 24
http://dbpedia.org/ontology/Artist 22
http://dbpedia.org/ontology/AutomobileEngine 22

Adding another namespace binding makes sense to make the output more managable


In [30]:
g.bind("dbo","http://dbpedia.org/ontology/")
e=LocalEndpoint(g)
types=e.select("""
   SELECT ?s (COUNT(*) AS ?cnt) {
      ?s a owl:Class .
      ?p rdfs:domain ?s .
   } GROUP BY ?s ORDER BY DESC(?cnt) LIMIT 5
""")
types.head()


Out[30]:
cnt
s
dbo:Person 253
dbo:Place 183
dbo:PopulatedPlace 151
dbo:Athlete 94
dbo:Settlement 56

Common properties for People

To survey some important properties that apply to a dbo:Person I need some estimate of importance. I choose to count how many languages a property is labeled with as a proxy for importance -- after all, if a property is broadly interesting, it will be translated into many languages. The result is pretty satisfying.


In [31]:
person_types=e.select("""
   SELECT ?p (COUNT(*) AS ?cnt) {
       ?p rdfs:domain dbo:Person .
       ?p rdfs:label ?l .
   } GROUP BY ?p ORDER BY DESC(?cnt) LIMIT 30
""")
person_types


Out[31]:
cnt
p
dbo:birthDate 10
dbo:birthPlace 9
http://dbpedia.org/ontology/Person/weight 8
http://dbpedia.org/ontology/Person/height 8
dbo:nationality 7
dbo:knownFor 7
dbo:eyeColor 6
dbo:residence 6
dbo:child 6
dbo:achievement 6
dbo:deathPlace 6
dbo:birthName 6
dbo:deathDate 6
dbo:placeOfBurial 6
dbo:school 6
dbo:bloodType 6
dbo:education 5
dbo:birthYear 5
dbo:sibling 5
dbo:college 5
dbo:deathYear 5
dbo:spouse 5
dbo:university 5
dbo:relative 5
dbo:parent 5
dbo:waistSize 5
dbo:hairColor 5
dbo:bustSize 5
dbo:weddingParentsDate 4
dbo:partner 4

To make something that looks like a real report, I reach into my bag of tricks.

Since the predicate URI contains an English name for the predicate, I decide to show a label in German. The OPTIONAL clause is essential so that we don't lose properties that don't have a German label (there is exactly one in the list below). I use a subquery to compute the language count, and then filter for properties that have more than one language.


In [32]:
e.select("""
   SELECT ?p ?range ?label ?cnt {
        ?p rdfs:range ?range .
        OPTIONAL { 
            ?p rdfs:label ?label .
             FILTER(LANG(?label)='de')
        }
        {
           SELECT ?p (COUNT(*) AS ?cnt) {
               ?p rdfs:domain dbo:Person .
               ?p rdfs:label ?l .
           } GROUP BY ?p ORDER BY DESC(?cnt)
        }
       FILTER(?cnt>4)
   } ORDER BY DESC(?cnt)
""")


Out[32]:
p range label cnt
0 dbo:birthDate xsd:date Geburtsdatum 10
1 dbo:birthPlace dbo:Place Geburtsort 9
2 http://dbpedia.org/ontology/Person/weight http://dbpedia.org/datatype/kilogram Gewicht (kg) 8
3 http://dbpedia.org/ontology/Person/height http://dbpedia.org/datatype/centimetre Höhe (cm) 8
4 dbo:nationality dbo:Country Nationalität 7
5 dbo:deathPlace dbo:Place Sterbeort 6
6 dbo:school dbo:EducationalInstitution schule 6
7 dbo:residence dbo:Place Residenz 6
8 dbo:eyeColor xsd:string Augenfarbe 6
9 dbo:placeOfBurial dbo:Place Ort der Bestattung 6
10 dbo:deathDate xsd:date Sterbedatum 6
11 dbo:birthName rdf:langString Geburtsname 6
12 dbo:child dbo:Person Kind 6
13 dbo:parent dbo:Person Elternteil 5
14 dbo:spouse dbo:Person Ehepartner 5
15 dbo:waistSize xsd:double Taillenumfang (μ) 5
16 dbo:college dbo:EducationalInstitution College 5
17 dbo:university dbo:EducationalInstitution Universität 5
18 dbo:deathYear xsd:gYear Sterbejahr 5
19 dbo:relative dbo:Person Verwandter 5
20 dbo:birthYear xsd:gYear Geburtsjahr 5
21 dbo:bustSize xsd:double None 5
22 dbo:sibling dbo:Person Geschwister 5
23 dbo:hairColor xsd:string Haarfarbe 5

Towards a simple schema browser

You'd probably agree with me that the query above is getting to be a bit much, but now that I have it, I can bake it into a function which makes it easy to ask questions of the schema. The following query lets us make a similar report for any class and any language. (I use the German word for 'class' because the English word class and the synonymous word type are both reserved words in Python.)


In [33]:
def top_properties(klasse='dbo:Person',lang='de',threshold=4):
    klasse=QName(klasse)
    df=e.select("""
       SELECT ?p ?range ?label ?cnt {
            ?p rdfs:range ?range .
            OPTIONAL { 
                ?p rdfs:label ?label .
                 FILTER(LANG(?label)=?_lang)
            }
            {
               SELECT ?p (COUNT(*) AS ?cnt) {
                   ?p rdfs:domain ?_klasse .
                   ?p rdfs:label ?l .
               } GROUP BY ?p ORDER BY DESC(?cnt)
            }
           FILTER(?cnt>?_threshold)
       } ORDER BY DESC(?cnt)
    """)
    return df.style.highlight_null(null_color='red')

Note that the select here can see variables in the immediately enclosing scope, that is, the function definition. As it is inside a function definition, it does not see variables defined in the Jupyter notebook. The handling of missing values is a big topic in Pandas, so I take the liberty of highlighting the label that is missing in German.


In [34]:
top_properties()


Out[34]:
p range label cnt
0 dbo:birthDate xsd:date Geburtsdatum 10
1 dbo:birthPlace dbo:Place Geburtsort 9
2 http://dbpedia.org/ontology/Person/weight http://dbpedia.org/datatype/kilogram Gewicht (kg) 8
3 http://dbpedia.org/ontology/Person/height http://dbpedia.org/datatype/centimetre Höhe (cm) 8
4 dbo:nationality dbo:Country Nationalität 7
5 dbo:deathPlace dbo:Place Sterbeort 6
6 dbo:school dbo:EducationalInstitution schule 6
7 dbo:residence dbo:Place Residenz 6
8 dbo:eyeColor xsd:string Augenfarbe 6
9 dbo:placeOfBurial dbo:Place Ort der Bestattung 6
10 dbo:deathDate xsd:date Sterbedatum 6
11 dbo:birthName rdf:langString Geburtsname 6
12 dbo:child dbo:Person Kind 6
13 dbo:parent dbo:Person Elternteil 5
14 dbo:spouse dbo:Person Ehepartner 5
15 dbo:waistSize xsd:double Taillenumfang (μ) 5
16 dbo:college dbo:EducationalInstitution College 5
17 dbo:university dbo:EducationalInstitution Universität 5
18 dbo:deathYear xsd:gYear Sterbejahr 5
19 dbo:relative dbo:Person Verwandter 5
20 dbo:birthYear xsd:gYear Geburtsjahr 5
21 dbo:bustSize xsd:double None 5
22 dbo:sibling dbo:Person Geschwister 5
23 dbo:hairColor xsd:string Haarfarbe 5

In Japanese, a different set of labels is missing. It's nice to see that Unicode characters outside the latin-1 codepage work just fine.


In [35]:
top_properties(lang='ja')


Out[35]:
p range label cnt
0 dbo:birthDate xsd:date 生年月日 10
1 dbo:birthPlace dbo:Place 生地 9
2 http://dbpedia.org/ontology/Person/weight http://dbpedia.org/datatype/kilogram 体重 (kg) 8
3 http://dbpedia.org/ontology/Person/height http://dbpedia.org/datatype/centimetre 身長 (cm) 8
4 dbo:nationality dbo:Country 国籍 7
5 dbo:deathPlace dbo:Place 死没地 6
6 dbo:school dbo:EducationalInstitution None 6
7 dbo:residence dbo:Place 居住地 6
8 dbo:eyeColor xsd:string None 6
9 dbo:placeOfBurial dbo:Place None 6
10 dbo:deathDate xsd:date 没年月日 6
11 dbo:birthName rdf:langString None 6
12 dbo:child dbo:Person 子供 6
13 dbo:parent dbo:Person 5
14 dbo:spouse dbo:Person 配偶者 5
15 dbo:waistSize xsd:double ウエスト (μ) 5
16 dbo:college dbo:EducationalInstitution None 5
17 dbo:university dbo:EducationalInstitution 大学 5
18 dbo:deathYear xsd:gYear 没年 5
19 dbo:relative dbo:Person 親戚 5
20 dbo:birthYear xsd:gYear 生年 5
21 dbo:bustSize xsd:double バスト (μ) 5
22 dbo:sibling dbo:Person 兄弟 5
23 dbo:hairColor xsd:string None 5

And of course it can be fun to look at other classes and languages:


In [36]:
top_properties('dbo:SpaceMission',lang='fr',threshold=1)


Out[36]:
p range label cnt
0 dbo:spacecraft dbo:Spacecraft véhicule spatial 4
1 http://dbpedia.org/ontology/SpaceMission/distanceTraveled http://dbpedia.org/datatype/kilometre None 3
2 http://dbpedia.org/ontology/SpaceMission/mass http://dbpedia.org/datatype/kilogram None 3
3 dbo:distanceTraveled xsd:double None 3
4 dbo:nextMission dbo:SpaceMission mision siguiente 3
5 dbo:landingSite xsd:string None 2
6 dbo:launchPad dbo:LaunchPad None 2
7 dbo:landingDate xsd:date None 2
8 http://dbpedia.org/ontology/SpaceMission/missionDuration http://dbpedia.org/datatype/day None 2
9 dbo:numberOfOrbits xsd:nonNegativeInteger None 2
10 dbo:spacestation dbo:SpaceStation None 2
11 dbo:launchSite dbo:Building None 2
12 http://dbpedia.org/ontology/SpaceMission/lunarOrbitTime http://dbpedia.org/datatype/hour None 2
13 dbo:spacewalkEnd xsd:date None 2
14 dbo:crewMember dbo:Astronaut None 2
15 dbo:lunarModule xsd:string None 2
16 dbo:launchDate xsd:date None 2
17 dbo:spacewalkBegin xsd:date None 2
18 dbo:lunarOrbitTime xsd:double None 2
19 dbo:missionDuration xsd:double None 2
20 dbo:booster dbo:Rocket None 2
21 dbo:lunarRover dbo:MeanOfTransportation None 2
22 dbo:orbitalInclination xsd:float None 2
23 dbo:previousMission dbo:SpaceMission None 2
24 dbo:crewSize xsd:nonNegativeInteger None 2

About "prov:wasDerivedFrom"

The prov:wasDerivedFrom property links properties and classes defined in the DBpedia Ontology to the places where they are defined on the mappings web site.


In [37]:
e.select("""
    SELECT ?s ?o {
       ?s prov:wasDerivedFrom ?o .
    } LIMIT 10
""")


Out[37]:
s o
0 dbo:dateOfAbandonment http://mappings.dbpedia.org/index.php/OntologyProperty:dateOfAbandonment
1 dbo:Moss http://mappings.dbpedia.org/index.php/OntologyClass:Moss
2 dbo:CountrySeat http://mappings.dbpedia.org/index.php/OntologyClass:CountrySeat
3 dbo:Archeologist http://mappings.dbpedia.org/index.php/OntologyClass:Archeologist
4 dbo:numberOfOfficials http://mappings.dbpedia.org/index.php/OntologyProperty:numberOfOfficials
5 dbo:nflCode http://mappings.dbpedia.org/index.php/OntologyProperty:nflCode
6 dbo:parkingLotsCars http://mappings.dbpedia.org/index.php/OntologyProperty:parkingLotsCars
7 dbo:personName http://mappings.dbpedia.org/index.php/OntologyProperty:personName
8 dbo:Jockey http://mappings.dbpedia.org/index.php/OntologyClass:Jockey
9 dbo:licensee http://mappings.dbpedia.org/index.php/OntologyProperty:licensee

In [38]:
_.at[0,'o']


Out[38]:
rdflib.term.URIRef('http://mappings.dbpedia.org/index.php/OntologyProperty:dateOfAbandonment')

Subclasses

Subclasses can be queried with queries like the following, which lists direct subtypes of dbo:Person.


In [39]:
e.select("""
   SELECT ?type {
      ?type rdfs:subClassOf dbo:Person .
   }
""")


Out[39]:
type
0 dbo:MovieDirector
1 dbo:Religious
2 dbo:Cleric
3 dbo:Scientist
4 dbo:Psychologist
5 dbo:Linguist
6 dbo:Artist
7 dbo:Criminal
8 dbo:TelevisionDirector
9 dbo:Philosopher
10 dbo:Architect
11 dbo:Noble
12 dbo:Economist
13 dbo:Judge
14 dbo:Politician
15 dbo:HorseTrainer
16 dbo:OrganisationMember
17 dbo:Orphan
18 dbo:Monarch
19 dbo:Model
20 dbo:FictionalCharacter
21 dbo:Athlete
22 dbo:Royalty
23 dbo:PoliticianSpouse
24 dbo:OfficeHolder
25 dbo:BeautyQueen
26 dbo:Aristocrat
27 dbo:Chef
28 dbo:Celebrity
29 dbo:Writer
30 dbo:Coach
31 dbo:Archeologist
32 dbo:RomanEmperor
33 dbo:Producer
34 dbo:PlayboyPlaymate
35 dbo:Ambassador
36 dbo:SportsManager
37 dbo:Lawyer
38 dbo:Farmer
39 dbo:Engineer
40 dbo:Presenter
41 dbo:TelevisionPersonality
42 dbo:MemberResistanceMovement
43 dbo:Astronaut
44 dbo:Journalist
45 dbo:BusinessPerson
46 dbo:Referee
47 dbo:TheatreDirector
48 dbo:MilitaryPerson
49 dbo:Egyptologist

SPARQL 1.1 has property path operators that will make the query engine recurse through multiple rdfs:subClassOf property links.


In [40]:
e.select("""
   SELECT ?type {
      ?type rdfs:subClassOf* dbo:Person .
   }
""")


Out[40]:
type
0 dbo:Person
1 dbo:MovieDirector
2 dbo:Religious
3 dbo:Cleric
4 dbo:Priest
5 dbo:Cardinal
6 dbo:Vicar
7 dbo:ChristianPatriarch
8 dbo:Pope
9 dbo:Saint
10 dbo:ChristianBishop
11 dbo:Scientist
12 dbo:Biologist
13 dbo:Entomologist
14 dbo:Professor
15 dbo:Medician
16 dbo:Psychologist
17 dbo:Linguist
18 dbo:Artist
19 dbo:Photographer
20 dbo:Dancer
21 dbo:MusicalArtist
22 dbo:Singer
23 dbo:MusicDirector
24 dbo:Instrumentalist
25 dbo:Guitarist
26 dbo:ClassicalMusicArtist
27 dbo:BackScene
28 dbo:Painter
29 dbo:Comedian
... ...
154 dbo:ScreenWriter
155 dbo:Historian
156 dbo:SongWriter
157 dbo:Coach
158 dbo:AmericanFootballCoach
159 dbo:CollegeCoach
160 dbo:VolleyballCoach
161 dbo:Archeologist
162 dbo:RomanEmperor
163 dbo:Producer
164 dbo:PlayboyPlaymate
165 dbo:Ambassador
166 dbo:SportsManager
167 dbo:SoccerManager
168 dbo:Lawyer
169 dbo:Farmer
170 dbo:Engineer
171 dbo:Presenter
172 dbo:RadioHost
173 dbo:TelevisionHost
174 dbo:TelevisionPersonality
175 dbo:Host
176 dbo:MemberResistanceMovement
177 dbo:Astronaut
178 dbo:Journalist
179 dbo:BusinessPerson
180 dbo:Referee
181 dbo:TheatreDirector
182 dbo:MilitaryPerson
183 dbo:Egyptologist

184 rows × 1 columns

The previous queries work "down" from a higher-level class, but by putting a '^' before the property name, I can reverse the direction of traversal, to find all topics which dbo:Painter is a subclass of.


In [41]:
e.select("""
   SELECT ?type {
      ?type ^rdfs:subClassOf* dbo:Painter .
   }
""")


Out[41]:
type
0 dbo:Painter
1 dbo:Artist
2 dbo:Person
3 dbo:Agent
4 owl:Thing

In [42]:
e.select("""
   SELECT ?type {
      dbo:Painter rdfs:subClassOf* ?type .
   }
""")


Out[42]:
type
0 dbo:Painter
1 dbo:Artist
2 dbo:Person
3 dbo:Agent
4 owl:Thing

The same outcome can be had by switching the subject and object positions in the triple:


In [43]:
e.select("""
   SELECT ?type {
      dbo:City rdfs:subClassOf* ?type .
   }
""")


Out[43]:
type
0 dbo:City
1 dbo:Settlement
2 dbo:PopulatedPlace
3 dbo:Place
4 owl:Thing

Equivalent Classes

The DBpedia Ontology uses owl:equivalentClass to specify equivalency between DBpedia Ontology types and types used in other popular systems such as wikidata and schema.org:


In [44]:
e.select("""
   SELECT ?a ?b {
    ?a owl:equivalentClass ?b .
   } LIMIT 10
""")


Out[44]:
a b
0 dbo:Road http://www.wikidata.org/entity/Q34442
1 dbo:Pope http://www.wikidata.org/entity/Q19546
2 dbo:Ligament http://www.wikidata.org/entity/Q39888
3 dbo:MilitaryVehicle http://schema.org/Product
4 dbo:Astronaut http://www.wikidata.org/entity/Q11631
5 dbo:WaterRide http://www.wikidata.org/entity/Q2870166
6 dbo:Deity http://www.wikidata.org/entity/Q178885
7 dbo:Beer http://www.wikidata.org/entity/Q44
8 dbo:Event http://schema.org/Event
9 dbo:Treadmill http://www.wikidata.org/entity/Q683267

Here are all of the equivalencies between the DBpedia Ontology and schema.org.


In [45]:
e.select("""
   SELECT ?a ?b {
    ?a owl:equivalentClass ?b .
    FILTER(STRSTARTS(STR(?b),"http://schema.org/"))
   }
""")


Out[45]:
a b
0 dbo:MilitaryVehicle http://schema.org/Product
1 dbo:Event http://schema.org/Event
2 dbo:Automobile http://schema.org/Product
3 dbo:Canal http://schema.org/Canal
4 dbo:Airport http://schema.org/Airport
5 dbo:Painting http://schema.org/Painting
6 dbo:SportsTeam http://schema.org/SportsTeam
7 dbo:City http://schema.org/City
8 dbo:Sea http://schema.org/SeaBodyOfWater
9 dbo:SkiArea http://schema.org/SkiResort
10 dbo:Organisation http://schema.org/Organization
11 dbo:MusicFestival http://schema.org/Festival
12 dbo:Book http://schema.org/Book
13 dbo:Lake http://schema.org/LakeBodyOfWater
14 dbo:Hotel http://schema.org/Hotel
15 dbo:Mountain http://schema.org/Mountain
16 dbo:Annotation http://schema.org/Comment
17 dbo:Website http://schema.org/WebPage
18 dbo:University http://schema.org/CollegeOrUniversity
19 dbo:Work http://schema.org/CreativeWork
20 dbo:Film http://schema.org/Movie
21 dbo:Locomotive http://schema.org/Product
22 dbo:Restaurant http://schema.org/Restaurant
23 dbo:Ship http://schema.org/Product
24 dbo:Sculpture http://schema.org/Sculpture
25 dbo:Country http://schema.org/Country
26 dbo:Library http://schema.org/Library
27 dbo:Language http://schema.org/Language
28 dbo:EducationalInstitution http://schema.org/EducationalOrganization
29 dbo:TelevisionEpisode http://schema.org/TVEpisode
30 dbo:Album http://schema.org/MusicAlbum
31 dbo:TelevisionStation http://schema.org/TelevisionStation
32 dbo:ShoppingMall http://schema.org/ShoppingCenter
33 dbo:HistoricPlace http://schema.org/LandmarksOrHistoricalBuildings
34 dbo:RadioStation http://schema.org/RadioStation
35 dbo:Arena http://schema.org/StadiumOrArena
36 dbo:SportsEvent http://schema.org/SportsEvent
37 dbo:School http://schema.org/School
38 dbo:Hospital http://schema.org/Hospital
39 dbo:College http://schema.org/CollegeOrUniversity
40 dbo:Stadium http://schema.org/StadiumOrArena
41 dbo:HistoricBuilding http://schema.org/LandmarksOrHistoricalBuildings
42 dbo:Place http://schema.org/Place
43 dbo:Park http://schema.org/Park
44 dbo:Person http://schema.org/Person
45 dbo:AdministrativeRegion http://schema.org/AdministrativeArea
46 dbo:BodyOfWater http://schema.org/BodyOfWater
47 dbo:Bank http://schema.org/BankOrCreditUnion
48 dbo:Continent http://schema.org/Continent
49 dbo:Song http://schema.org/MusicRecording
50 dbo:River http://schema.org/RiverBodyOfWater
51 dbo:Aircraft http://schema.org/Product

Many of these are as you would expect, but there are some that are not correct, given the definition of owl:equivalentClass from the OWL specification.

9.1.2 Equivalent Classes

An equivalent classes axiom EquivalentClasses( CE1 ... CEn ) states that all of the class expressions CEi, 1 ≤ i ≤ n, are semantically equivalent to each other. This axiom allows one to use each CEi as a synonym for each CEj — that is, in any expression in the ontology containing such an axiom, CEi can be replaced with CEj without affecting the meaning of the ontology. An axiom EquivalentClasses( CE1 CE2 ) is equivalent to the following two axioms:

SubClassOf( CE1 CE2 )
SubClassOf( CE2 CE1 )

Put differently, anything that is a member of one class is a member of the other class and vice versa. That's true for dbo:TelevisionEpisode and schema:TVEpisode, but not true for many cases involving schema:Product


In [46]:
g.bind("schema","http://schema.org/")
e=LocalEndpoint(g)
e.select("""
   SELECT ?a ?b {
    ?a owl:equivalentClass ?b .
    FILTER(?b=<http://schema.org/Product>)
   }
""")


Out[46]:
a b
0 dbo:MilitaryVehicle schema:Product
1 dbo:Automobile schema:Product
2 dbo:Locomotive schema:Product
3 dbo:Ship schema:Product
4 dbo:Aircraft schema:Product

I think you'd agree that an Automobile is a Product, but that a Product is not necessarily an automobile. In these cases,

dbo:Automobile rdfs:subClassOf schema:Product .

is more accurate.

Let's take a look at external classes which aren't part of schema.org or wikidata:


In [47]:
e.select("""
    SELECT ?a ?b {
        ?a owl:equivalentClass ?b .
        FILTER(!STRSTARTS(STR(?b),"http://schema.org/"))
        FILTER(!STRSTARTS(STR(?b),"http://www.wikidata.org/"))    
    }
""")


Out[47]:
a b
0 dbo:Organisation http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#SocialPerson
1 dbo:TopicalConcept http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Concept
2 dbo:UnitOfWork http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Situation
3 dbo:Food http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#FunctionalSubstance
4 dbo:Unknown http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Entity
5 dbo:Activity http://www.ontologydesignpatterns.org/ont/d0.owl#Activity
6 dbo:ChemicalSubstance http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#ChemicalObject
7 dbo:PenaltyShootOut http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Event
8 dbo:Annotation http://purl.org/ontology/bibo/Note
9 dbo:GovernmentType http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Concept
10 dbo:Event http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Event
11 dbo:Monastery http://www.ontologydesignpatterns.org/ont/d0.owl#Location
12 dbo:Article http://purl.org/ontology/bibo/Article
13 dbo:LegalCase http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Situation
14 dbo:Polyhedron http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#SpaceRegion
15 dbo:Project http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#PlanExecution
16 dbo:List http://www.w3.org/2004/02/skos/core#OrderedCollection
17 dbo:Tax http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Description
18 dbo:Agent http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Agent
19 dbo:Sales http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Situation
20 dbo:List http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Collection
21 dbo:Ideology http://www.ontologydesignpatterns.org/ont/d0.owl#CognitiveEntity
22 dbo:Book http://purl.org/ontology/bibo/Book
23 dbo:Document foaf:Document
24 dbo:MeanOfTransportation http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#DesignedArtifact
25 dbo:Film http://dbpedia.org/ontology/Wikidata:Q11424
26 dbo:Holiday http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#TimeInterval
27 dbo:Abbey dbo:Monastery
28 dbo:Person foaf:Person
29 dbo:Year http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#TimeInterval
30 dbo:Person http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#NaturalPerson
31 dbo:Place dbo:Location
32 dbo:Database http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#InformationObject
33 dbo:MusicGenre http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Concept

To keep track of them all, I add a few more namespace declarations.


In [48]:
g.bind("dzero","http://www.ontologydesignpatterns.org/ont/d0.owl#")
g.bind("dul","http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#")
g.bind("bibo","http://purl.org/ontology/bibo/")
g.bind("skos","http://www.w3.org/2004/02/skos/core#")
e=LocalEndpoint(g)

In [49]:
e.select("""
    SELECT ?a ?b {
        ?a owl:equivalentClass ?b .
        FILTER(!STRSTARTS(STR(?b),"http://schema.org/"))
        FILTER(!STRSTARTS(STR(?b),"http://www.wikidata.org/"))    
    }
""")


Out[49]:
a b
0 dbo:Organisation dul:SocialPerson
1 dbo:TopicalConcept dul:Concept
2 dbo:UnitOfWork dul:Situation
3 dbo:Food dul:FunctionalSubstance
4 dbo:Unknown dul:Entity
5 dbo:Activity dzero:Activity
6 dbo:ChemicalSubstance dul:ChemicalObject
7 dbo:PenaltyShootOut dul:Event
8 dbo:Annotation bibo:Note
9 dbo:GovernmentType dul:Concept
10 dbo:Event dul:Event
11 dbo:Monastery dzero:Location
12 dbo:Article bibo:Article
13 dbo:LegalCase dul:Situation
14 dbo:Polyhedron dul:SpaceRegion
15 dbo:Project dul:PlanExecution
16 dbo:List skos:OrderedCollection
17 dbo:Tax dul:Description
18 dbo:Agent dul:Agent
19 dbo:Sales dul:Situation
20 dbo:List dul:Collection
21 dbo:Ideology dzero:CognitiveEntity
22 dbo:Book bibo:Book
23 dbo:Document foaf:Document
24 dbo:MeanOfTransportation dul:DesignedArtifact
25 dbo:Film http://dbpedia.org/ontology/Wikidata:Q11424
26 dbo:Holiday dul:TimeInterval
27 dbo:Abbey dbo:Monastery
28 dbo:Person foaf:Person
29 dbo:Year dul:TimeInterval
30 dbo:Person dul:NaturalPerson
31 dbo:Place dbo:Location
32 dbo:Database dul:InformationObject
33 dbo:MusicGenre dul:Concept

The mapping from dbo:Film to <http://dbpedia.org/ontology/Wikidata:Q11424> is almost certainly a typo.

Disjoint Classes

Another bit of OWL vocabulary used in the DBpedia Ontology is owl:disjointWith


In [50]:
e.select("""
    SELECT ?b ?a {
        ?a owl:disjointWith ?b .   
    } ORDER BY ?b
""")


Out[50]:
b a
0 dbo:Fish dbo:Mammal
1 dbo:HistoricalPeriod dbo:PrehistoricalPeriod
2 dbo:Person dbo:On-SiteTransportation
3 dbo:Person dbo:GeologicalPeriod
4 dbo:Person dbo:ProtohistoricalPeriod
5 dbo:Person dbo:MeanOfTransportation
6 dbo:Person dbo:HistoricalPeriod
7 dbo:Person dbo:Mine
8 dbo:Person dbo:UnitOfWork
9 dbo:Person dbo:TimePeriod
10 dbo:Person dbo:Tower
11 dbo:Person dbo:Mountain
12 dbo:Person dbo:PeriodOfArtisticStyle
13 dbo:Person dbo:ConveyorSystem
14 dbo:Person dbo:MovingWalkway
15 dbo:Person dbo:Building
16 dbo:Person dbo:Gate
17 dbo:Person dbo:Activity
18 dbo:Person dbo:Event
19 dbo:Person dbo:Escalator
20 dbo:Place dbo:Agent
21 http://dbpedia.org/ontology/wgs84_pos:SpatialThing dbo:ReligiousOrganisation
22 http://dbpedia.org/ontology/wgs84_pos:SpatialThing dbo:Work
23 http://dbpedia.org/ontology/wgs84_pos:SpatialThing dbo:Organisation

If two classes are disjoint, that means that nothing can be an instance of both things. For instance, a Fish cannot be a Mammal, a Person is not a building, etc. These sort of facts are helpful for validation, but one should resist the impulse to make statements of disjointness which aren't strictly true. (For instance, it would be unlikely, but not impossible, to be the winner of both a Heisman Trophy and a Fields Metal, so these are not disjoint categories.)

Datatypes

RDF not only has "types" (classes) that represent named concepts, but it also has literal datatypes. These include the standard datatypes from XML such as xsd:integer and xsd:datetime, but also derived types that specialize those types. This makes it possible to tag quantities in terms of physical units, currency units, etc.


In [51]:
e.select("""
    SELECT ?type {
        ?type a rdfs:Datatype .
    } LIMIT 10
""")


Out[51]:
type
0 http://dbpedia.org/datatype/laoKip
1 http://dbpedia.org/datatype/mexicanPeso
2 http://dbpedia.org/datatype/erg
3 xsd:gYearMonth
4 http://dbpedia.org/datatype/millibar
5 xsd:gYear
6 http://dbpedia.org/datatype/samoanTala
7 http://dbpedia.org/datatype/squareFoot
8 http://dbpedia.org/datatype/terahertz
9 http://dbpedia.org/datatype/uruguayanPeso

In [52]:
g.bind("type","http://dbpedia.org/datatype/")
e=LocalEndpoint(g)

In [53]:
e.select("""
    SELECT ?type {
        ?type a rdfs:Datatype .
    } LIMIT 10
""")


Out[53]:
type
0 type:laoKip
1 type:mexicanPeso
2 type:erg
3 xsd:gYearMonth
4 type:millibar
5 xsd:gYear
6 type:samoanTala
7 type:squareFoot
8 type:terahertz
9 type:uruguayanPeso

The information about these data types are currently sparse: this particular type has just a label and a type.


In [54]:
e.select("""
    SELECT ?p ?o {
        type:lightYear ?p ?o .
    }
""")


Out[54]:
p o
0 rdf:type rdfs:Datatype
1 rdfs:label lightYear

These turn out to be the only properties that any datatypes have; pretty clearly, datatypes are not labeled in the rich set of languages that properties and classes are labeled in. (Note that vocabulary exists in RDFS and OWL for doing just that, such as specifying that type:lightYear would be represented as a number, specifying that a particular type of numeric value is in a particular range, etc.)


In [55]:
e.select("""
    SELECT ?p (COUNT(*) AS ?cnt) {
        ?s a rdfs:Datatype .
        ?s ?p ?o .
    } GROUP BY ?p
""")


Out[55]:
cnt
p
rdf:type 382
rdfs:label 382

Another approach is to look at how datatypes get used, that is, how frequently various datatypes are used as the range of a property.


In [56]:
e.select("""
    SELECT ?type (COUNT(*) AS ?cnt) {
        ?p rdfs:range ?type .
        ?type a rdfs:Datatype .
    } GROUP BY ?type ORDER BY DESC(?cnt)
""")


Out[56]:
cnt
type
xsd:string 779
xsd:nonNegativeInteger 265
xsd:double 187
xsd:date 146
xsd:gYear 60
xsd:integer 53
rdf:langString 49
type:millimetre 26
type:kilogram 25
xsd:float 25
xsd:positiveInteger 18
type:kilometre 15
type:kelvin 12
type:squareKilometre 10
type:metre 9
type:hour 6
type:day 6
xsd:boolean 5
xsd:dateTime 4
type:cubicMetrePerSecond 4
type:inhabitantsPerSquareKilometre 4
type:kilogramPerCubicMetre 3
type:minute 2
type:cubicKilometre 2
type:cubicMetre 2
type:kilometrePerSecond 2
xsd:gYearMonth 1
type:squareMetre 1
type:newtonMetre 1
type:kilometrePerHour 1
xsd:anyURI 1
type:valvetrain 1
type:fuelType 1
type:litre 1
type:megabyte 1
type:second 1
type:centimetre 1
type:engineConfiguration 1
type:gramPerKilometre 1
type:cubicCentimetre 1
type:kilowatt 1

In [57]:
len(_)


Out[57]:
41

Out of 382 properties, only 41 actually appear as the range of the properties in the schema. Here are a few properties that are unused in the schema.


In [58]:
e.select("""
    SELECT ?type {
        ?type a rdfs:Datatype .
        MINUS { ?s ?p ?type }
    } LIMIT 20
""")


Out[58]:
type
0 type:laoKip
1 type:mexicanPeso
2 type:erg
3 type:millibar
4 type:samoanTala
5 type:squareFoot
6 type:terahertz
7 type:uruguayanPeso
8 type:turkmenistaniManat
9 type:Density
10 type:millinewton
11 type:stone
12 type:nanonewton
13 type:millivolt
14 type:netherlandsAntilleanGuilder
15 type:unitedArabEmiratesDirham
16 type:haitiGourde
17 type:pond
18 type:mile
19 type:kilolightYear

According to the DBpedia Ontology documentation, there are two kinds of datatype declarations in mappings. In some cases the unit is explictly specified in the mapping field (ex. a field that contains a length is specified in meters) and in other cases, a particular datatype is specific to the field.

It turns out most of the knowledge in the DBpedia Ontology system is hard coded into a scala file; this file contains rich information that is not exposed in the RDF form of the Ontology, such as conversion factors, the fact that miles per hour is a speed, etc.

It is quite possible to encode datatypes directly into a fact, for example,

:Iron :meltsAt "1811 K"^^type:kelvin .


It is possible that such facts could be found in DBpedia or some other database, but I'm not going to check for that in this notebook, because this notebook is only considering facts that are in the ontology file supplied with this notebook.

Properties Measured in Kilograms


In [59]:
e.select("""
    SELECT ?p {
        ?p rdfs:range type:kilogram
    }
""")


Out[59]:
p
0 http://dbpedia.org/ontology/Spacecraft/totalMass
1 http://dbpedia.org/ontology/Spacecraft/cargoGas
2 http://dbpedia.org/ontology/SpaceMission/mass
3 http://dbpedia.org/ontology/Person/weight
4 http://dbpedia.org/ontology/MovingWalkway/mass
5 http://dbpedia.org/ontology/MeanOfTransportation/mass
6 http://dbpedia.org/ontology/ConveyorSystem/weight
7 http://dbpedia.org/ontology/Escalator/mass
8 http://dbpedia.org/ontology/Planet/mass
9 http://dbpedia.org/ontology/Escalator/weight
10 http://dbpedia.org/ontology/Spacecraft/dryCargo
11 http://dbpedia.org/ontology/Rocket/lowerEarthOrbitPayload
12 http://dbpedia.org/ontology/Galaxy/mass
13 http://dbpedia.org/ontology/MovingWalkway/weight
14 http://dbpedia.org/ontology/ConveyorSystem/mass
15 http://dbpedia.org/ontology/Weapon/weight
16 http://dbpedia.org/ontology/Engine/weight
17 http://dbpedia.org/ontology/Rocket/mass
18 http://dbpedia.org/ontology/On-SiteTransportation/mass
19 http://dbpedia.org/ontology/On-SiteTransportation/weight
20 http://dbpedia.org/ontology/SpaceMission/lunarSampleMass
21 http://dbpedia.org/ontology/Spacecraft/cargoFuel
22 http://dbpedia.org/ontology/Spacecraft/totalCargo
23 http://dbpedia.org/ontology/Spacecraft/cargoWater
24 http://dbpedia.org/ontology/MeanOfTransportation/weight

One unfortunate thing is that the DBpedia ontology sometimes composes property URIs by putting together the class (ex. "Galaxy") and the property (ex. "mass") with a slash between them. Slash is not allowed in a localname, which means that you can't write ontology:Galaxy/mass. You can write the full URI, or you could define a prefix galaxy such that you can write Galaxy:mass. Yet another approach is to set the base URI to

http://dbpedia.org/ontology/

in which case you could write <Galaxy/mass>. I was tempted to do that for this notebook, but decided against it, because soon I will be joining the schema with more DBpedia data, where I like to set the base to

http://dbpedia.org/resource/

In a slightly better world, the property might be composed with a period, so that the URI is just "ontology:Galaxy.mass". (Hmm... Could Gastrodon do that for you?)

Datatype properties vs Object Properties

RDFS has a single class to represent a property, rdf:Property; OWL makes it a little more complex by defining both owl:DatatypeProperty and owl:ObjectProperty. The difference between these two kinds of property is the range: a Datatype Property has a literal value (object), while an Object Property has a Resource (URI or blank node) as a value.

I'd imagine that every rdf:Property should be either an owl:DatatypeProperty or owl:ObjectProperty, so that the sums would match. I wouldn't take it for granted, so I'll check it:


In [60]:
counts=e.select("""
   SELECT ?type (COUNT(*) AS ?cnt) {
      ?s a ?type .
      FILTER (?type IN (rdf:Property,owl:DatatypeProperty,owl:ObjectProperty))
   } GROUP BY ?type ORDER BY DESC(?cnt)
""")["cnt"]
counts


Out[60]:
type
rdf:Property            2695
owl:DatatypeProperty    1734
owl:ObjectProperty      1099
Name: cnt, dtype: int64

In [61]:
counts["rdf:Property"]


Out[61]:
2695

In [62]:
counts["owl:DatatypeProperty"]+counts["owl:ObjectProperty"]


Out[62]:
2833

The sums don't match.

I'd expect the two kinds of DatatypeProperties to be disjoint; and they are, because I can't find any classes which are an instance of both.


In [63]:
e.select("""
   SELECT ?klasse {
      ?klasse a owl:DatatypeProperty .
      ?klasse a owl:ObjectProperty .
   }
""")


Out[63]:
klasse

However, there are cases where a property is registered as an OWL property but not as an RDFS property:


In [64]:
e.select("""
   SELECT ?klasse {
      ?klasse a owl:DatatypeProperty .
      MINUS {?klasse a rdf:Property}
   }
""")


Out[64]:
klasse
0 http://dbpedia.org/ontology/Spacecraft/cargoWater
1 http://dbpedia.org/ontology/MovingWalkway/length
2 http://dbpedia.org/ontology/Galaxy/minimumTemperature
3 http://dbpedia.org/ontology/Planet/maximumTemperature
4 http://dbpedia.org/ontology/MovingWalkway/mass
5 http://dbpedia.org/ontology/Engine/height
6 http://dbpedia.org/ontology/On-SiteTransportation/weight
7 http://dbpedia.org/ontology/Planet/mass
8 http://dbpedia.org/ontology/Planet/meanRadius
9 http://dbpedia.org/ontology/Engine/cylinderBore
10 http://dbpedia.org/ontology/Person/weight
11 http://dbpedia.org/ontology/MeanOfTransportation/mass
12 http://dbpedia.org/ontology/Engine/diameter
13 http://dbpedia.org/ontology/MeanOfTransportation/weight
14 http://dbpedia.org/ontology/SpaceMission/distanceTraveled
15 http://dbpedia.org/ontology/SpaceStation/volume
16 http://dbpedia.org/ontology/Spacecraft/freeFlightTime
17 http://dbpedia.org/ontology/ConveyorSystem/length
18 http://dbpedia.org/ontology/Software/fileSize
19 http://dbpedia.org/ontology/Galaxy/surfaceArea
20 http://dbpedia.org/ontology/Galaxy/apoapsis
21 http://dbpedia.org/ontology/PopulatedPlace/populationUrbanDensity
22 http://dbpedia.org/ontology/Galaxy/meanTemperature
23 http://dbpedia.org/ontology/GrandPrix/distance
24 http://dbpedia.org/ontology/Spacecraft/cargoGas
25 http://dbpedia.org/ontology/ConveyorSystem/mass
26 http://dbpedia.org/ontology/Weapon/height
27 http://dbpedia.org/ontology/PopulatedPlace/populationMetroDensity
28 http://dbpedia.org/ontology/SpaceMission/lunarSampleMass
29 http://dbpedia.org/ontology/On-SiteTransportation/mass
... ...
108 http://dbpedia.org/ontology/Escalator/length
109 http://dbpedia.org/ontology/Weapon/length
110 http://dbpedia.org/ontology/MovingWalkway/height
111 http://dbpedia.org/ontology/Stream/discharge
112 http://dbpedia.org/ontology/Galaxy/averageSpeed
113 http://dbpedia.org/ontology/Stream/watershed
114 http://dbpedia.org/ontology/GeopoliticalOrganisation/populationDensity
115 http://dbpedia.org/ontology/Stream/maximumDischarge
116 http://dbpedia.org/ontology/Planet/apoapsis
117 http://dbpedia.org/ontology/Planet/temperature
118 http://dbpedia.org/ontology/Engine/width
119 http://dbpedia.org/ontology/Canal/maximumBoatLength
120 http://dbpedia.org/ontology/Rocket/lowerEarthOrbitPayload
121 http://dbpedia.org/ontology/SpaceMission/stationEvaDuration
122 http://dbpedia.org/ontology/Lake/volume
123 http://dbpedia.org/ontology/Planet/averageSpeed
124 http://dbpedia.org/ontology/MovingWalkway/diameter
125 http://dbpedia.org/ontology/Planet/surfaceArea
126 http://dbpedia.org/ontology/ConveyorSystem/width
127 http://dbpedia.org/ontology/Galaxy/meanRadius
128 http://dbpedia.org/ontology/Planet/orbitalPeriod
129 http://dbpedia.org/ontology/PopulatedPlace/populationDensity
130 http://dbpedia.org/ontology/SpaceMission/lunarOrbitTime
131 http://dbpedia.org/ontology/MeanOfTransportation/diameter
132 http://dbpedia.org/ontology/Galaxy/orbitalPeriod
133 http://dbpedia.org/ontology/MeanOfTransportation/length
134 http://dbpedia.org/ontology/SpaceShuttle/timeInSpace
135 http://dbpedia.org/ontology/Engine/topSpeed
136 http://dbpedia.org/ontology/ChemicalSubstance/density
137 http://dbpedia.org/ontology/ChemicalSubstance/boilingPoint

138 rows × 1 columns


In [65]:
e.select("""
   SELECT ?klasse {
      ?klasse a owl:ObjectProperty .
      MINUS {?klasse a rdf:Property}
   }
""")


Out[65]:
klasse

However, there are no properties defined as an RDFS property that are not defined in OWL.


In [66]:
e.select("""
   SELECT ?p {
      ?p a rdf:Property .
      MINUS {
          { ?p a owl:DatatypeProperty }
          UNION
          { ?p a owl:ObjectProperty }
      }
   }
""")


Out[66]:
p

Conclusion: to get a complete list of properties defined in the DBpedia Ontology, is necessary and sufficient to use the OWL property declarations. The analysis above that uses rdfs:Property should use the OWL property classes to get complete results.

Subproperties

Subproperties are used in RDF to gather together properties that more or less say the same thing.

For instance, the mass of a galaxy is comparable (in principle) to the mass of objects like stars and planets that make it. Thus in a perfect world, the mass of a galaxy would be related to a more general "mass" property that could apply to anything from coins to aircraft carriers.

I go looking for one...


In [67]:
galaxyMass=URIRef("http://dbpedia.org/ontology/Galaxy/mass")
e.select("""
   SELECT ?p {
      ?_galaxyMass rdfs:subPropertyOf ?p .
   }
""")


Out[67]:
p

... and don't find it. That's not really a problem, because this I can always add one by adding a few more facts to my copy of the DBpedia Ontology. Let's see what is really there...


In [68]:
e.select("""
   SELECT ?from ?to {
      ?from rdfs:subPropertyOf ?to .
   }
""")


Out[68]:
from to
0 dbo:teachingStaff dul:hasMember
1 dbo:handedness dul:hasQuality
2 dbo:bronzeMedalist dul:hasParticipant
3 dbo:editing dul:coparticipatesWith
4 dbo:makeupArtist dul:hasParticipant
5 dbo:goldMedalist dul:hasParticipant
6 dbo:delegateMayor dul:sameSettingAs
7 dbo:playRole dul:hasRole
8 dbo:royalAnthem dul:sameSettingAs
9 dbo:formerTeam dul:isMemberOf
10 dbo:isPartOf dul:isPartOf
11 dbo:codeNationalMonument dbo:code
12 dbo:aircraftUser dul:coparticipatesWith
13 dbo:settlement dul:hasLocation
14 dbo:narrator dul:coparticipatesWith
15 dbo:musicians dul:coparticipatesWith
16 dbo:dutchNAIdentifier dbo:code
17 dbo:parentOrganisation dul:sameSettingAs
18 dbo:associateEditor dul:coparticipatesWith
19 dbo:linkedTo dul:hasLocation
20 dbo:isPartOfMilitaryConflict dbo:isPartOf
21 dbo:basedOn dul:coparticipatesWith
22 dbo:committeeInLegislature dul:hasPart
23 dbo:person dul:isRoleOf
24 dbo:capitalDistrict dul:hasLocation
25 dbo:soccerLeaguePromoted dul:isSettingFor
26 dbo:curator dul:coparticipatesWith
27 dbo:lastRace dul:isParticipantIn
28 dbo:event dul:hasParticipant
29 dbo:associationOfLocalGovernment dul:coparticipatesWith
... ... ...
941 dbo:lowestMountain dul:isLocationOf
942 dbo:iso6393Code dbo:LanguageCode
943 dbo:endingTheme dul:coparticipatesWith
944 dbo:winsAtChallenges dul:isParticipantIn
945 dbo:winsAtKLPGA dul:isParticipantIn
946 dbo:child dul:sameSettingAs
947 dbo:highwaySystem dul:isPartOf
948 dbo:currency dul:sameSettingAs
949 dbo:championInDoubleMale dbo:championInDouble
950 dbo:programmeFormat dul:isClassifiedBy
951 dbo:guest dul:hasParticipant
952 dbo:homeColourHexCode dbo:colourHexCode
953 dbo:highestMountain dul:isLocationOf
954 dbo:rightChild dul:hasCommonBoundary
955 dbo:territory dul:hasParticipant
956 dbo:sportSpecialty dul:isParticipantIn
957 dbo:parish dul:isPartOf
958 dbo:subClassis dbo:classis
959 dbo:elementAbove dul:sameSettingAs
960 dbo:relative dul:sameSettingAs
961 dbo:builder dul:coparticipatesWith
962 dbo:zipCode dbo:postalCode
963 dbo:kingdom dul:specializes
964 dbo:artery dul:coparticipatesWith
965 dbo:school dul:coparticipatesWith
966 dbo:nationalOlympicCommittee dul:hasParticipant
967 dbo:hasVariant dul:isSpecializedBy
968 dbo:grandsire dul:sameSettingAs
969 dbo:poleDriver dul:hasParticipant
970 dbo:ekatteCode dbo:codeSettlement

971 rows × 2 columns

It looks like terms on the left are always part of the DBpedia Ontology:


In [69]:
e.select("""
   SELECT ?from ?to {
      ?from rdfs:subPropertyOf ?to .
      FILTER(!STRSTARTS(STR(?from),"http://dbpedia.org/ontology/"))
   }
""")


Out[69]:
from to

Terms on the right are frequently part of the http://ontologydesignpatterns.org/wiki/Ontology:DOLCE%2BDnS_Ultralite (DUL)

ontology and are a way to explain the meaning of DBpedia Ontology terms in terms of DUL. Let's look at superproperties that aren't from the DUL ontology:


In [70]:
e.select("""
   SELECT ?from ?to {
      ?from rdfs:subPropertyOf ?to .
      FILTER(!STRSTARTS(STR(?to),"http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#"))
   }
""")


Out[70]:
from to
0 dbo:codeNationalMonument dbo:code
1 dbo:dutchNAIdentifier dbo:code
2 dbo:isPartOfMilitaryConflict dbo:isPartOf
3 dbo:championInDouble dbo:champion
4 dbo:alias dbo:alternativeName
5 dbo:senator dbo:MemberOfParliament
6 dbo:codeMunicipalMonument dbo:code
7 dbo:premiereDate dbo:releaseDate
8 dbo:provinceIsoCode dbo:isoCode
9 dbo:southWestPlace dbo:closeTo
10 dbo:commandant dbo:keyPerson
11 dbo:championInMixedDouble dbo:championInDouble
12 dbo:dutchMIPCode dbo:Code
13 dbo:originalLanguage dbo:language
14 dbo:codeListOfHonour dbo:code
15 dbo:championInSingle dbo:champion
16 dbo:ngcName dbo:name
17 dbo:ofsCode dbo:isoCode
18 dbo:nextEvent dbo:followedBy
19 dbo:codeLandRegistry dbo:Code
20 dbo:codeStockExchange dbo:code
21 dbo:chorusCharacterInPlay dbo:characterInPlay
22 dbo:southPlace dbo:closeTo
23 dbo:silverMedalist dbo:Medalist
24 dbo:isPartOfAnatomicalStructure dbo:isPartOf
25 dbo:olympicOathSwornByAthlete dbo:olympicOathSwornBy
26 dbo:distanceToCapital dbo:Distance
27 dbo:dutchPPNCode dbo:code
28 dbo:averageDepth dbo:depth
29 dbo:southEastPlace dbo:closeTo
... ... ...
46 dbo:northWestPlace dbo:closeTo
47 dbo:eastPlace dbo:closeTo
48 dbo:northEastPlace dbo:closeTo
49 dbo:awayColourHexCode dbo:colourHexCode
50 dbo:olympicOathSwornByJudge dbo:olympicOathSwornBy
51 dbo:protectionStatus dbo:Status
52 dbo:codeIndex dbo:code
53 dbo:communityIsoCode dbo:isoCode
54 dbo:bronzeMedalist dbo:Medalist
55 dbo:goldMedalist dbo:Medalist
56 dbo:locationCountry dbo:location
57 dbo:literaryGenre dbo:genre
58 dbo:dutchWinkelID dbo:code
59 dbo:locationCity dbo:location
60 dbo:playRole dbo:uses
61 dbo:codeProvincialMonument dbo:code
62 dbo:subTribus dbo:Tribus
63 dbo:rankingWins dbo:Wins
64 dbo:otherWins dbo:Wins
65 dbo:superTribus dbo:Tribus
66 dbo:officialSchoolColour dbo:ColourName
67 dbo:northPlace dbo:closeTo
68 dbo:capital dbo:administrativeHeadCity
69 dbo:iso6392Code dbo:LanguageCode
70 dbo:iso6393Code dbo:LanguageCode
71 dbo:championInDoubleMale dbo:championInDouble
72 dbo:homeColourHexCode dbo:colourHexCode
73 dbo:subClassis dbo:classis
74 dbo:zipCode dbo:postalCode
75 dbo:ekatteCode dbo:codeSettlement

76 rows × 2 columns

Out of those 75 relationships, I bet many of them point to the same superproperties:


In [71]:
e.select("""
   SELECT ?to (COUNT(*) AS ?cnt) {
      ?from rdfs:subPropertyOf ?to .
      FILTER(!STRSTARTS(STR(?to),"http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#"))
   } GROUP BY ?to ORDER BY DESC(?cnt)
""")


Out[71]:
cnt
to
dbo:code 9
dbo:closeTo 8
dbo:LanguageCode 4
dbo:isPartOf 3
dbo:isoCode 3
dbo:championInDouble 3
dbo:Code 3
dbo:Medalist 3
dbo:champion 2
dbo:name 2
dbo:followedBy 2
dbo:characterInPlay 2
dbo:olympicOathSwornBy 2
dbo:depth 2
dbo:championInSingle 2
dbo:codeSettlement 2
dbo:colourHexCode 2
dbo:location 2
dbo:Tribus 2
dbo:Wins 2
dbo:alternativeName 1
dbo:MemberOfParliament 1
dbo:releaseDate 1
dbo:keyPerson 1
dbo:language 1
dbo:Distance 1
dbo:Department 1
dbo:releaseYear 1
dbo:owner 1
dbo:Status 1
dbo:genre 1
dbo:uses 1
dbo:ColourName 1
dbo:administrativeHeadCity 1
dbo:classis 1
dbo:postalCode 1

The most common superproperty is dbo:code, which represents identifying codes. For instance, this could be a postal Code, UPC Code, or a country or regional code. Unfortunately, only a small number of code-containing fields are so identified.


In [72]:
e.select("""
   SELECT ?about ?from {
      ?from 
          rdfs:subPropertyOf dbo:code ;
          rdfs:domain ?about .
   }
""")


Out[72]:
about from
0 dbo:MemberResistanceMovement dbo:dutchNAIdentifier
1 dbo:UndergroundJournal dbo:dutchWinkelID
2 dbo:MemberResistanceMovement dbo:codeIndex
3 dbo:Place dbo:codeProvincialMonument
4 dbo:Place dbo:codeNationalMonument
5 dbo:Company dbo:codeStockExchange
6 dbo:Place dbo:codeMunicipalMonument
7 dbo:MemberResistanceMovement dbo:codeListOfHonour
8 dbo:WrittenWork dbo:dutchPPNCode

Looking at the superproperty dbo:closeTo, the subproperties represent (right-hand) locations that are adjacent to (left-hand) locations in the directions of the cardinal and ordinal (definition) directions.


In [73]:
e.select("""
   SELECT ?about ?from {
      ?from 
          rdfs:subPropertyOf dbo:closeTo ;
          rdfs:domain ?about .
   }
""")


Out[73]:
about from
0 dbo:Place dbo:northEastPlace
1 dbo:Place dbo:eastPlace
2 dbo:Place dbo:southPlace
3 dbo:Place dbo:westPlace
4 dbo:Place dbo:southEastPlace
5 dbo:Place dbo:southWestPlace
6 dbo:Place dbo:northWestPlace
7 dbo:Place dbo:northPlace

Looking a the superproperties in DUL, these look much like the kind of properties one would expect to defined in an upper or middle ontology:


In [74]:
e.select("""
   SELECT ?to (COUNT(*) AS ?cnt) {
      ?from rdfs:subPropertyOf ?to .
      FILTER(STRSTARTS(STR(?to),"http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#"))
   } GROUP BY ?to ORDER BY DESC(?cnt)
""")


Out[74]:
cnt
to
dul:coparticipatesWith 226
dul:sameSettingAs 169
dul:hasLocation 85
dul:hasParticipant 63
dul:isParticipantIn 42
dul:isPartOf 39
dul:isClassifiedBy 38
dul:hasCommonBoundary 33
dul:isMemberOf 31
dul:hasPart 26
dul:isLocationOf 24
dul:hasQuality 15
dul:isSettingFor 15
dul:hasMember 13
dul:isDescribedBy 9
dul:specializes 8
dul:nearTo 8
dul:hasComponent 7
dul:isSpecializedBy 6
dul:isExpressedBy 6
dul:hasSetting 6
dul:hasRole 4
dul:conceptualizes 3
dul:hasConstituent 3
dul:precedes 3
dul:isAbout 2
dul:unifies 2
dul:follows 2
dul:hasRegion 2
dul:associatedWith 2
dul:isRoleOf 1
dul:overlaps 1
dul:concretelyExpresses 1

A really common kind of property is a "part-of" relationship, known as meronymy if you like greek.


In [75]:
e.select("""
   SELECT ?domain ?p ?range {
      ?p 
          rdfs:subPropertyOf dul:isPartOf ;
          rdfs:domain ?domain ;
          rdfs:range ?range .
   }
""")


Out[75]:
domain p range
0 dbo:Place dbo:provinceLink dbo:Province
1 dbo:PopulatedPlace dbo:oldDistrict dbo:PopulatedPlace
2 dbo:Department dbo:prefecture dbo:PopulatedPlace
3 dbo:PopulatedPlace dbo:sheading dbo:PopulatedPlace
4 dbo:MilitaryConflict dbo:isPartOfMilitaryConflict dbo:MilitaryConflict
5 dbo:Mountain dbo:mountainRange dbo:MountainRange
6 dbo:Island dbo:governmentRegion dbo:PopulatedPlace
7 dbo:AnatomicalStructure dbo:organSystem dbo:AnatomicalStructure
8 dbo:Settlement dbo:geolocDepartment dbo:PopulatedPlace
9 dbo:Settlement dbo:federalState dbo:PopulatedPlace
10 dbo:PopulatedPlace dbo:department dbo:PopulatedPlace
11 dbo:WineRegion dbo:isPartOfWineRegion dbo:WineRegion
12 http://dbpedia.org/ontology/Diocese,_Parish dbo:deanery dbo:Deanery
13 dbo:Brain dbo:isPartOfAnatomicalStructure dbo:AnatomicalStructure
14 dbo:Settlement dbo:jointCommunity dbo:PopulatedPlace
15 dbo:PopulatedPlace dbo:lieutenancyArea dbo:PopulatedPlace
16 dbo:Settlement dbo:isoCodeRegion xsd:string
17 dbo:Place dbo:sovereignCountry dbo:PopulatedPlace
18 dbo:PopulatedPlace dbo:oldProvince dbo:PopulatedPlace
19 dbo:PopulatedPlace dbo:parish dbo:PopulatedPlace
20 dbo:Country dbo:continent dbo:Continent
21 dbo:Place dbo:district dbo:PopulatedPlace
22 http://dbpedia.org/ontology/Parish,_Deanery dbo:diocese dbo:Diocese
23 dbo:PopulatedPlace dbo:councilArea dbo:PopulatedPlace
24 dbo:PopulatedPlace dbo:metropolitanBorough dbo:PopulatedPlace
25 dbo:Island dbo:lowestState dbo:PopulatedPlace
26 dbo:Place dbo:province dbo:Province

Equivalent Property

The case of "part of" properties is a good example of a subproperty relationship in that, say, "Mountain X is a part of Y Mountain range" is clearly a specialization of "X is a part of Y." That's different from the case where two properties mean exactly the same thing.

Let's take a look at equivalent properties defined in the DBpedia Ontology:


In [76]:
e.select("""
    SELECT ?a ?b {
        ?a owl:equivalentProperty ?b
    }
""")


Out[76]:
a b
0 dbo:runtime schema:duration
1 dbo:manager http://www.wikidata.org/entity/P286
2 dbo:party http://www.wikidata.org/entity/P102
3 dbo:albumRuntime schema:duration
4 dbo:composer http://www.wikidata.org/entity/P86
5 dbo:deathCause http://www.wikidata.org/entity/P509
6 dbo:parentOrganisation schema:branchOf
7 dbo:doctoralAdvisor http://www.wikidata.org/entity/P184
8 dbo:foundingDate http://www.wikidata.org/entity/P571
9 dbo:sex http://www.wikidata.org/entity/P21
10 dbo:nutsCode http://www.wikidata.org/entity/P605
11 dbo:icd10 http://www.wikidata.org/entity/P494
12 dbo:builder http://www.wikidata.org/entity/P176
13 dbo:genre schema:genre
14 dbo:spouse http://www.wikidata.org/entity/P26
15 dbo:almaMater http://www.wikidata.org/entity/P69
16 dbo:endDate http://www.wikidata.org/entity/P582
17 dbo:award http://www.wikidata.org/entity/P166
18 dbo:city http://www.wikidata.org/entity/P131
19 dbo:rkdArtistsId http://www.wikidata.org/entity/P650
20 dbo:causeOfDeath http://www.wikidata.org/entity/P509
21 dbo:absoluteMagnitude http://www.wikidata.org/entity/P1457
22 dbo:musicalArtist http://www.wikidata.org/entity/P175
23 dbo:terytCode http://www.wikidata.org/entity/P1653
24 dbo:killedBy http://www.wikidata.org/entity/P157
25 dbo:basedOn http://www.wikidata.org/entity/P144
26 dbo:genre http://www.wikidata.org/entity/P136
27 dbo:unloCode http://www.wikidata.org/entity/P1937
28 dbo:continent http://www.wikidata.org/entity/P30
29 dbo:picture schema:image
... ... ...
192 dbo:country http://www.wikidata.org/entity/P17
193 dbo:isniId http://www.wikidata.org/entity/P213
194 dbo:einecsNumber http://www.wikidata.org/entity/P232
195 dbo:deathDate http://www.wikidata.org/entity/P570
196 dbo:bnfId http://www.wikidata.org/entity/P268
197 dbo:nisCode http://www.wikidata.org/entity/P1567
198 dbo:birthYear http://www.wikidata.org/entity/P569
199 dbo:isbn http://www.wikidata.org/entity/P212
200 dbo:iso6391Code http://www.wikidata.org/entity/P218
201 dbo:startDate schema:startDate
202 dbo:restingDate schema:deathDate
203 dbo:spouse schema:spouse
204 dbo:district http://www.wikidata.org/entity/P131
205 dbo:discipline http://www.wikidata.org/entity/P101
206 dbo:instrument http://www.wikidata.org/entity/P1303
207 dbo:inflow http://www.wikidata.org/entity/P200
208 dbo:filmRuntime schema:duration
209 dbo:ofsCode http://www.wikidata.org/entity/P771
210 dbo:kingdom http://www.wikidata.org/entity/P75
211 dbo:launchSite http://www.wikidata.org/entity/P448
212 dbo:amgid http://www.wikidata.org/entity/P1562
213 dbo:iataLocationIdentifier http://www.wikidata.org/entity/P238
214 dbo:chromosome http://www.wikidata.org/entity/P1057
215 dbo:coordinates http://www.wikidata.org/entity/P625
216 dbo:lccnId http://www.wikidata.org/entity/P244
217 dbo:meshId http://www.wikidata.org/entity/P486
218 dbo:isPartOf http://www.wikidata.org/entity/P361
219 dbo:isbn http://www.wikidata.org/entity/P957
220 dbo:landArea dbo:area
221 dbo:episodeNumber schema:episodeNumber

222 rows × 2 columns

Many of these properties are from Wikidata, so it probably makes sense to bind a namespace for Wikidata.


In [77]:
g.bind("wikidata","http://www.wikidata.org/entity/")
e=LocalEndpoint(g)

This kind of equivalency with Wikidata is meaningful precisely because DBpedia and Wikidata are competitive (and cooperative) databases that cover the same domain. Let's take a look at equivalencies to databases other than Wikidata:


In [78]:
e.select("""
    SELECT ?a ?b {
        ?a owl:equivalentProperty ?b
        FILTER(!STRSTARTS(STR(?b),"http://www.wikidata.org/entity/"))
    }
""")


Out[78]:
a b
0 dbo:runtime schema:duration
1 dbo:albumRuntime schema:duration
2 dbo:parentOrganisation schema:branchOf
3 dbo:genre schema:genre
4 dbo:picture schema:image
5 dbo:endDate schema:endDate
6 dbo:musicComposer schema:musicBy
7 dbo:publisher schema:publisher
8 dbo:deathDate schema:deathDate
9 dbo:producer schema:producer
10 dbo:director schema:director
11 dbo:deFactoLanguage dbo:language
12 dbo:starring schema:actors
13 dbo:relative schema:relatedTo
14 dbo:illustrator schema:illustrator
15 dbo:mediaType schema:bookFormat
16 dbo:firstPublisher schema:publisher
17 dbo:isbn schema:isbn
18 dbo:club dbo:team
19 dbo:birthDate schema:birthDate
20 dbo:award schema:awards
21 dbo:nationality schema:nationality
22 dbo:waterArea dbo:area
23 dbo:locatedInArea schema:containedIn
24 dbo:map schema:maps
25 dbo:language schema:inLanguage
26 dbo:jureLanguage dbo:language
27 dbo:artist schema:byArtist
28 dbo:numberOfEpisodes schema:numberOfEpisodes
29 dbo:duration schema:duration
30 dbo:numberOfPages schema:numberOfPages
31 dbo:author schema:author
32 dbo:startDate schema:startDate
33 dbo:restingDate schema:deathDate
34 dbo:spouse schema:spouse
35 dbo:filmRuntime schema:duration
36 dbo:landArea dbo:area
37 dbo:episodeNumber schema:episodeNumber

The vast number of those link to schema.org, except for a handful which link to other DBpedia Ontology properties.


In [79]:
e.select("""
    SELECT ?a ?b {
        ?a owl:equivalentProperty ?b
        FILTER(STRSTARTS(STR(?b),"http://dbpedia.org/ontology/"))
    }
""")


Out[79]:
a b
0 dbo:deFactoLanguage dbo:language
1 dbo:club dbo:team
2 dbo:waterArea dbo:area
3 dbo:jureLanguage dbo:language
4 dbo:landArea dbo:area

The quality of these equivalencies are questionable to me; for instance, in geography, people often publish separate "land area" and "water areas" for a region. Still, out of 30,000 facts, I've seen fewer than 30 that looked obviously wrong: an error rate of 0.1% is not bad on some terms, but if we put these facts into a reasoning system, small errors in the schema can result in an avalanche of inferred facts resulting in a disproportionately large impact on results.

Namespaces

Rather than starting with a complete list of namespaces used in the DBpedia Ontology, I gradually added them as they turned up in queries. It would be nice to have a tool that automatically generates this kind of list, but for the time being, I am saving this list here for future reference.


In [80]:
e.namespaces()


Out[80]:
prefix namespace
bibo http://purl.org/ontology/bibo/
cc http://creativecommons.org/ns#
dbo http://dbpedia.org/ontology/
dc http://purl.org/dc/terms/
dul http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#
dzero http://www.ontologydesignpatterns.org/ont/d0.owl#
foaf http://xmlns.com/foaf/0.1/
owl http://www.w3.org/2002/07/owl#
prov http://www.w3.org/ns/prov#
rdf http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs http://www.w3.org/2000/01/rdf-schema#
schema http://schema.org/
skos http://www.w3.org/2004/02/skos/core#
type http://dbpedia.org/datatype/
vann http://purl.org/vocab/vann/
wikidata http://www.wikidata.org/entity/
xml http://www.w3.org/XML/1998/namespace
xsd http://www.w3.org/2001/XMLSchema#

Conclusion and next steps

In this notebook, I've made a quick survey of the contents of the DBpedia Ontology. This data set is useful to build into the "local" tests for Gastrodon because it is small enough to work with in memory, but complex enough to be a real-life example. For other notebooks, I work over the queries and data repeatedly to eliminate imperfections that make the notebooks unclear, but here the data set is a fixed target, which makes it a good shakedown cruise for Gastrodon in which I was able to fix a number of bugs and make a number of improvements.

One longer term goal is to explore data from DBpedia and use it as a basis for visualization and data analysis. The next step towards that is to gather data from the full DBpedia that will help prioritize the exploration (which properties really get used?) and answer some questions that are still unclear (what about the data types which aren't used the schema?)

Another goal is to develop tools to further simplify the exploration of data sets and schemas. The top_properties function defined above is an example of the kind of function that could be built into a function library that would reduce the need to write so many SPARQL queries by hand.