SPARQL from Python

SPARQLWrapper is a simple Python wrapper around a SPARQL service for remote query execution. Not only does it enable us to write more complex queries to extract information from RDF than those exposed through a library like rdflib, it can also convert query results into other formats like JSON and CSV!

First, what is SPARQL?

SPARQL ("SPARQL Protocol And RDF Query Language") is a W3C standard for querying RDF and can be used to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for querying required and optional graph patterns along with their conjunctions and disjunctions. SPARQL also supports extensible value testing and constraining queries by source RDF graph. The results of SPARQL queries can be results sets or RDF graphs.

SPARQL allows us to express queries as three-part statements:

"""
PREFIX ... // identifies & nicknames namespace URIs of desired variables 
SELECT ... // lists variables to be returned (start with a ?)
WHERE  ... // contains restrictions on variables expressed as triples
"""

`SPARQLWrapper`

The Python library SPARQLWrapper (which can be installed via pip) enables us to use the SPARQL query language to interact with remote or local SPARQL endpoints, such as DBPedia:



In [1]:

    
from SPARQLWrapper import SPARQLWrapper, JSON

# Specify the DBPedia endpoint
sparql = SPARQLWrapper("http://dbpedia.org/sparql")

# Query for the description of "Capsaicin", filtered by language 
sparql.setQuery("""
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT ?comment
    WHERE { <http://dbpedia.org/resource/Capsaicin> rdfs:comment ?comment 
    FILTER (LANG(?comment)='en')
    }
""")

# Convert results to JSON format
sparql.setReturnFormat(JSON)
result = sparql.query().convert()

# The return data contains "bindings" (a list of dictionaries)
for hit in result["results"]["bindings"]:
    # We want the "value" attribute of the "comment" field
    print(hit["comment"]["value"])









    



Capsaicin (/kæpˈseɪ.ᵻsɪn/ (INN); 8-methyl-N-vanillyl-6-nonenamide) is an active component of chili peppers, which are plants belonging to the genus Capsicum. It is an irritant for mammals, including humans, and produces a sensation of burning in any tissue with which it comes into contact. Capsaicin and several related compounds are called capsaicinoids and are produced as secondary metabolites by chili peppers, probably as deterrents against certain mammals and fungi. Pure capsaicin is a volatile, hydrophobic, colorless, odorless, crystalline to waxy compound.

Querying Wikidata

We can also use the Wikidata Query Service (WDQS) endpoint to query Wikidata.

Let's say we want to continue our research into spicy things by searching for information about hot sauces in Wikidata. The first step is to find the unique identifier that Wikidata uses to reference "hot sauce", which we can do by searching on Wikidata. It turns out to be "Q522171", which is an "entity", which corresponds to the "wd" prefix in Wikidata.

If we want to get back results for all of the kinds of hot sauces cataloged in Wikidata, we want to query for the results that have the direct property -- or "wdt" in Wikidata prefix speak -- "<subclasses of>", which is encoded as "P279" in Wikidata.

NOTE: For simple WDQS triples, items should be prefixed with wd:, and properties with wdt:. We don't need to explicitly alias any prefixes in this case because WDQS already knows many shortcut abbreviations commonly used externally (e.g. rdf, skos, owl, schema, etc.) as well as ones internal to Wikidata, such as:

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wds: <http://www.wikidata.org/entity/statement/>
PREFIX wdv: <http://www.wikidata.org/value/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX p: <http://www.wikidata.org/prop/>
PREFIX ps: <http://www.wikidata.org/prop/statement/>
PREFIX pq: <http://www.wikidata.org/prop/qualifier/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX bd: <http://www.bigdata.com/rdf#>

More on prefixes here.



In [20]:

    
sparql = SPARQLWrapper("https://query.wikidata.org/sparql")

# Below we SELECT both the hot sauce items & their labels
# in the WHERE clause we specify that we want labels as well as items
sparql.setQuery("""
SELECT ?item ?itemLabel 

WHERE {
  ?item wdt:P279 wd:Q522171.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
""")
sparql.setReturnFormat(JSON)
results = sparql.query().convert()

Let's use pandas to review the results as a dataframe:



In [17]:

    
import pandas as pd

results_df = pd.io.json.json_normalize(results['results']['bindings'])
results_df[['item.value', 'itemLabel.value']]









    Out[17]:







  
    
      
      item.value
      itemLabel.value
    
  
  
    
      0
      http://www.wikidata.org/entity/Q249114
      salsa
    
    
      1
      http://www.wikidata.org/entity/Q335016
      Tabasco sauce
    
    
      2
      http://www.wikidata.org/entity/Q360459
      Adobo
    
    
      3
      http://www.wikidata.org/entity/Q460439
      Blair's 16 Million Reserve
    
    
      4
      http://www.wikidata.org/entity/Q966327
      harissa
    
    
      5
      http://www.wikidata.org/entity/Q1026822
      Chili oil
    
    
      6
      http://www.wikidata.org/entity/Q1392674
      sriracha sauce
    
    
      7
      http://www.wikidata.org/entity/Q2227032
      mojo
    
    
      8
      http://www.wikidata.org/entity/Q2279518
      Shito
    
    
      9
      http://www.wikidata.org/entity/Q2402909
      Valentina
    
    
      10
      http://www.wikidata.org/entity/Q3273096
      Doubanjiang
    
    
      11
      http://www.wikidata.org/entity/Q3474141
      sauce samouraï
    
    
      12
      http://www.wikidata.org/entity/Q3474250
      Q3474250
    
    
      13
      http://www.wikidata.org/entity/Q4922876
      Nam phrik
    
    
      14
      http://www.wikidata.org/entity/Q5104402
      Cholula Hot Sauce
    
    
      15
      http://www.wikidata.org/entity/Q6961170
      Nam chim
    
    
      16
      http://www.wikidata.org/entity/Q16628511
      Q16628511
    
    
      17
      http://www.wikidata.org/entity/Q16642516
      Q16642516

	item.value	itemLabel.value
0	http://www.wikidata.org/entity/Q249114	salsa
1	http://www.wikidata.org/entity/Q335016	Tabasco sauce
2	http://www.wikidata.org/entity/Q360459	Adobo
3	http://www.wikidata.org/entity/Q460439	Blair's 16 Million Reserve
4	http://www.wikidata.org/entity/Q966327	harissa
5	http://www.wikidata.org/entity/Q1026822	Chili oil
6	http://www.wikidata.org/entity/Q1392674	sriracha sauce
7	http://www.wikidata.org/entity/Q2227032	mojo
8	http://www.wikidata.org/entity/Q2279518	Shito
9	http://www.wikidata.org/entity/Q2402909	Valentina
10	http://www.wikidata.org/entity/Q3273096	Doubanjiang
11	http://www.wikidata.org/entity/Q3474141	sauce samouraï
12	http://www.wikidata.org/entity/Q3474250	Q3474250
13	http://www.wikidata.org/entity/Q4922876	Nam phrik
14	http://www.wikidata.org/entity/Q5104402	Cholula Hot Sauce
15	http://www.wikidata.org/entity/Q6961170	Nam chim
16	http://www.wikidata.org/entity/Q16628511	Q16628511
17	http://www.wikidata.org/entity/Q16642516	Q16642516

SPARQL from Python

First, what is SPARQL?

SPARQLWrapper

Querying Wikidata

More on SPARQL & SPARQL Endpoints

`SPARQLWrapper`