Using grlc from python

Being written in python itself, it is easy to use grlc from python. Here we show how to use grlc to run a SPARQL query which is stored on github.

First we start by importing grlc (and a couple other libraries we use for working with the data).


In [1]:
import pandas as pd
from io import StringIO

import grlc
import grlc.utils as utils
import grlc.swagger as swagger


INFO:rdflib:RDFLib Version: 4.2.1

We can load the grlc specification for a github repository. For example, my github username is c-martinez and my SPARQL queries are on grlc-queries repo.


In [2]:
user = 'c-martinez'
repo = 'grlc-queries'
spec = swagger.build_spec(user, repo)


INFO:grlc.gquery:Decorator guessed endpoint: http://dbpedia.org/sparql
INFO:grlc.gquery:Decorator guessed endpoint: http://dbpedia.org/sparql

In [11]:
print spec[0].keys()


['call_name', 'params', 'description', 'tags', 'item_properties', 'query', 'method', 'summary']

In [10]:
print spec[0]['query']


PREFIX dbo: <http://dbpedia.org/ontology/>

SELECT ?country_name ?capital_name ?population
WHERE {
  ?country rdf:type dbo:Country .
  ?country dbo:capital ?capital .

  ?capital rdfs:label  ?capital_name .
  ?country rdfs:label  ?country_name .
  ?country dbo:populationTotal ?population .

  FILTER (lang(?capital_name) = 'en')
  FILTER (lang(?country_name) = 'en')
  FILTER NOT EXISTS { ?country dbo:dissolutionYear ?yearEnd }
  FILTER (?population > 500000)
} LIMIT 1000

We can use dispatch_query functions to load data from a specific query (dbpediaCapitals in this case). For this example, we are loading data in text/csv format.

NOTE: dbpediaCapitals query loads data from dbpedia.org -- the endpoint is specified via the endpoint decorator on the query file itself.


In [4]:
query_name = 'dbpediaCapitals'
acceptHeader = 'text/csv'

data, code, headers = utils.dispatch_query(user, repo, query_name, acceptHeader=acceptHeader)


INFO:grlc.gquery:Decorator guessed endpoint: http://dbpedia.org/sparql

Now we just transform these results to a pandas dataframe.


In [5]:
data_grlc = pd.read_csv(StringIO(data))
data_grlc.head(10)


Out[5]:
country_name capital_name population
0 Albania Tirana 2886026
1 Algeria Algiers 40400000
2 Greece Athens 10955000
3 Azerbaijan Baku 9754830
4 Germany Berlin 82175700
5 Brazil Brasília 206440850
6 European Union Brussels 510056011
7 Syria Damascus 17064854
8 Finland Helsinki 5488543
9 Indonesia Jakarta 255461700

Grlc via http

Another alternative is to load data via a running grlc server (in this case grlc.io).


In [6]:
import requests

In [7]:
headers = {'accept': 'text/csv'}
resp = requests.get("http://grlc.io/api/c-martinez/grlc-queries/dbpediaCapitals", headers=headers)

In [8]:
data_requests = pd.read_csv(StringIO(resp.text))
data_requests.head(10)


Out[8]:
country_name capital_name population
0 Albania Tirana 2886026
1 Algeria Algiers 40400000
2 Greece Athens 10955000
3 Azerbaijan Baku 9754830
4 Germany Berlin 82175700
5 Brazil Brasília 206440850
6 European Union Brussels 510056011
7 Syria Damascus 17064854
8 Finland Helsinki 5488543
9 Indonesia Jakarta 255461700