Fetching Summary Data

Introducing the Ontology2 Edition of Dbpedia

In our last episode, I did a number of queries against the DBpedia Ontology to map out the information available. In that notebook, I gave myself the restriction that I would only do queries against a copy of the DBpedia Ontology that is stored with the notebook.

Because the Ontology contains roughly 740 types and 2700 properties (more than 250 for Person alone) this turned out to be a serious limitation -- unless we know how much information is available for these properties, I can't know which ones are important, and thus make a visualization that makes sense.

Gastrodon is capable of querying the DBpedia Public SPARQL endpoint, but the DBpedia Endpoint has some limitations, particularly, it returns at most 10,000 results for a query. Complex queries can also time out. Certainly I could write a series of smaller queries to compute statistics, but then I face a balancing act between too many small queries (which will take a long time to run) and queries that get too large (and sometimes time out.)

Fortunately I have a product in the AWS Marketplace, the Ontology2 Edition of DBpedia 2016-04 which is a private SPARQL endpoint already loaded with data from DBpedia. By starting this product, and waiting about an hour for it to initialize, I can run as many SPARQL queries as I like of arbitrary complexity, and shut it down when I'm through.

In this notebook, I use this private SPARQL endpoint to count the prevalence of types, properties, and datatypes. I use SPARQL Construct to save this information into an RDF graph that I'll later be able to combine with the DBpedia Ontology RDF graph to better explore the schema.

I start with the usual preliminaries, importing Python modules and prefix definitions



In [30]:

    
%load_ext autotime
import sys
from os.path import expanduser
from gastrodon import RemoteEndpoint,QName,ttl,URIRef,inline
import pandas as pd
import json
pd.options.display.width=120
pd.options.display.max_colwidth=100









    



The autotime extension is already loaded. To reload it, use:
  %reload_ext autotime
time: 4.5 ms



In [2]:

    
prefixes=inline("""
    @prefix dbo: <http://dbpedia.org/ontology/> .
    @prefix summary: <http://rdf.ontology2.com/summary/> .
""").graph









    



time: 8 ms

It wouldn't be safe for me to check database connection information into Git, so I store it in a file in my home directory named ~/.dbpedia/config.json, which looks like

{
    "url":"http://130.21.14.234:8890/sparql-auth",
    "user":"dba",
    "passwd":"vKUcW1eSVkruDOtT",
    "base_uri":"http://dbpedia.org/resource/"
}

(Note that that is not my real IP address and passwd. If you want to reproduce this, put in the IP address and password for your own server and save it to ~/.dbpedia/config.json



In [3]:

    
connection_data=json.load(open(expanduser("~/.dbpedia/config.json")))
connection_data["prefixes"]=prefixes









    



time: 4 ms



In [4]:

    
endpoint=RemoteEndpoint(**connection_data)









    



time: 12.5 ms

Counting Properties and Classes

Finding the right graphs

The Ontology2 Edition of DBpedia 2016-04 is divided into a number of different named graphs, one for each dataset described here.

It's important to pay attention to this for two reasons.

One of them is that facts can appear in the output of a SPARQL query more than once than if the query covers multiple graphs and if facts are repeated in those graphs. This can throw off the accuracy of our counts.

The other is that some queries seem to take a long time to run if they are run over all graphs; particularly this affects queries that involve filtering over a prefix in the predicate field (ex.)

FILTER(STRSTARTS(STR(?p)),"http://dbpedia.org/ontology/")

Considering both of these factors, it is wise to know which graphs the facts we want are stored in, thus I start exploring:



In [5]:

    
endpoint.select("""
    select ?g (COUNT(*) AS ?cnt) {
       GRAPH ?g { ?a <http://dbpedia.org/ontology/Person/height> ?b } .
    } GROUP BY ?g
""")









    Out[5]:






  
    
      
      cnt
    
    
      g
      
    
  
  
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/citedFacts_en.ttl.bz2
      5502
    
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/specific_mappingbased_properties_en.ttl.bz2
      148105
    
  








    



time: 2.07 s

Thus I find one motherload of properties right away: I save this in a variable so I can use it later.



In [6]:

    
pgraph=URIRef("http://downloads.dbpedia.org/2016-04/core-i18n/en/specific_mappingbased_properties_en.ttl.bz2")









    



time: 999 µs

Looking up types, I find a number of graphs and choose the transitive types:



In [8]:

    
endpoint.select("""
    select ?g (COUNT(*) AS ?cnt) {
       GRAPH ?g { ?a a dbo:Person } .
    } GROUP BY ?g
""")









    Out[8]:






  
    
      
      cnt
    
    
      g
      
    
  
  
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/instance_types_transitive_en.ttl.bz2
      1014819
    
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/instance_types_en.ttl.bz2
      502997
    
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/instance_types_sdtyped_dbo_en.ttl.bz2
      212295
    
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/instance_types_lhd_dbo_en.ttl.bz2
      834547
    
  








    



time: 332 ms



In [9]:

    
tgraph=URIRef("http://downloads.dbpedia.org/2016-04/core-i18n/en/instance_types_transitive_en.ttl.bz2")









    



time: 1 ms

Counting Classes

It is now straightforward to pull up a list of types (classes), noting that these are not mutually exclusive. (You can be a dbo:Actor and a dbo:Politician)



In [10]:

    
endpoint.select("""
   SELECT ?type (COUNT(*) AS ?cnt) {
       GRAPH ?_tgraph { ?s a ?type . }
       FILTER(STRSTARTS(STR(?type),"http://dbpedia.org/ontology/"))
    } GROUP BY ?type
""")









    Out[10]:






  
    
      
      cnt
    
    
      type
      
    
  
  
    
      dbo:WinterSportPlayer
      22373
    
    
      dbo:Project
      11
    
    
      dbo:PopulatedPlace
      505557
    
    
      dbo:Actor
      1969
    
    
      dbo:Document
      23799
    
    
      dbo:Genre
      1229
    
    
      dbo:Group
      33681
    
    
      dbo:Politician
      19569
    
    
      dbo:Station
      2300
    
    
      dbo:Venue
      789
    
    
      dbo:Animal
      219648
    
    
      dbo:Comic
      4097
    
    
      dbo:GridironFootballPlayer
      18908
    
    
      dbo:MusicalArtist
      480
    
    
      dbo:RacingDriver
      1765
    
    
      dbo:Software
      20419
    
    
      dbo:Song
      1155
    
    
      dbo:EducationalInstitution
      53057
    
    
      dbo:MusicalWork
      199355
    
    
      dbo:NaturalEvent
      1191
    
    
      dbo:RaceTrack
      242
    
    
      dbo:Gene
      90
    
    
      dbo:Cartoon
      6373
    
    
      dbo:Cleric
      12842
    
    
      dbo:Database
      358
    
    
      http://dbpedia.org/ontology/%3Chttp://purl.org/dc/terms/Jurisdiction%3E
      24409
    
    
      dbo:Instrumentalist
      151
    
    
      dbo:LegalCase
      2724
    
    
      dbo:Name
      4361
    
    
      dbo:OrganisationMember
      323111
    
    
      ...
      ...
    
    
      dbo:Device
      25748
    
    
      dbo:Engine
      18829
    
    
      dbo:FictionalCharacter
      7990
    
    
      dbo:MotorcycleRider
      701
    
    
      dbo:Olympics
      4032
    
    
      dbo:Person
      1014819
    
    
      dbo:Plant
      5062
    
    
      dbo:Royalty
      799
    
    
      dbo:Settlement
      230132
    
    
      dbo:Species
      295514
    
    
      dbo:SportsEvent
      23588
    
    
      dbo:SportsTeam
      30031
    
    
      dbo:TimePeriod
      922533
    
    
      dbo:Wrestler
      470
    
    
      dbo:WrittenWork
      62761
    
    
      dbo:ArchitecturalStructure
      188172
    
    
      dbo:FootballLeagueSeason
      3348
    
    
      dbo:SportsTeamSeason
      35360
    
    
      dbo:GeneLocation
      86
    
    
      dbo:Athlete
      298681
    
    
      dbo:FloweringPlant
      381
    
    
      dbo:Stream
      28289
    
    
      dbo:ClericalAdministrativeRegion
      3250
    
    
      dbo:Coach
      6985
    
    
      dbo:Horse
      3855
    
    
      dbo:Location
      816252
    
    
      dbo:Region
      24409
    
    
      dbo:Satellite
      2137
    
    
      dbo:SportsManager
      17654
    
    
      dbo:Tower
      1868
    
  

108 rows × 1 columns







    



time: 1min 39s

I can store these facts in an RDF graph (instead of a Pandas DataFrame) by using a CONSTRUCT query (instead of a SELECT query). To capture the results of a GROUP BY query, however, I have to use a subquery -- this is because SPARQL requires that I only use variables in the CONSTRUCT clause, thus I have to evaluate expressions (such as COUNT(*)) somewhere else.

The resulting query is straightforward, even if it looks a little awkward with all the braces: roughly I cut and pasted the above SELECT query into a CONSTRUCT query that defines the facts that will be emitted.



In [11]:

    
t_counts=endpoint.construct("""
   CONSTRUCT {
      ?type summary:count ?cnt .
   } WHERE { 
       {
            SELECT ?type (COUNT(*) AS ?cnt) {
                GRAPH ?_tgraph { ?s a ?type . }
                FILTER(STRSTARTS(STR(?type),"http://dbpedia.org/ontology/"))
            } GROUP BY ?type
       } 
   }
""")









    



time: 1min 40s

I can count the facts in this resulting graph (same as the number of rows in the SELECT query)



In [31]:

    
len(t_counts)









    Out[31]:





108






    



time: 3 ms

And here is a sample fact:



In [40]:

    
next(t_counts.__iter__())









    Out[40]:





(rdflib.term.URIRef('http://dbpedia.org/ontology/Book'),
 rdflib.term.URIRef('http://rdf.ontology2.com/summary/count'),
 rdflib.term.Literal('22', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#integer')))






    



time: 4 ms

Note that in the DBpedia Ontology there are a number of other facts about dbo:Book, so if add the above fact to my copy of the DBpedia Ontology, SPARQL queries will be able to pick up the count together with all the other facts.

Counting "Specific Properties"

If I count properties in the "specific mappingbased properties" graph, I find that these are all properties that have the Class name baked in



In [12]:

    
endpoint.select("""
   SELECT ?p (COUNT(*) AS ?cnt) {
       GRAPH ?_pgraph { ?s ?p ?o . }
    } GROUP BY ?p
""")









    Out[12]:






  
    
      
      cnt
    
    
      p
      
    
  
  
    
      http://dbpedia.org/ontology/Canal/originalMaximumBoatLength
      5
    
    
      http://dbpedia.org/ontology/Engine/length
      16
    
    
      http://dbpedia.org/ontology/Engine/powerOutput
      193
    
    
      http://dbpedia.org/ontology/Planet/averageSpeed
      650
    
    
      http://dbpedia.org/ontology/Planet/density
      123
    
    
      http://dbpedia.org/ontology/Infrastructure/length
      23124
    
    
      http://dbpedia.org/ontology/Lake/volume
      1401
    
    
      http://dbpedia.org/ontology/Person/weight
      66144
    
    
      http://dbpedia.org/ontology/PopulatedPlace/area
      41718
    
    
      http://dbpedia.org/ontology/PopulatedPlace/populationDensity
      75629
    
    
      http://dbpedia.org/ontology/Software/fileSize
      816
    
    
      http://dbpedia.org/ontology/SpaceShuttle/distance
      5
    
    
      http://dbpedia.org/ontology/SpaceShuttle/timeInSpace
      7
    
    
      http://dbpedia.org/ontology/Planet/apoapsis
      3018
    
    
      http://dbpedia.org/ontology/Engine/displacement
      335
    
    
      http://dbpedia.org/ontology/Canal/maximumBoatBeam
      79
    
    
      http://dbpedia.org/ontology/MeanOfTransportation/diameter
      219
    
    
      http://dbpedia.org/ontology/Planet/volume
      62
    
    
      http://dbpedia.org/ontology/PopulatedPlace/populationUrbanDensity
      162
    
    
      http://dbpedia.org/ontology/Stream/dischargeAverage
      148
    
    
      http://dbpedia.org/ontology/Stream/maximumDischarge
      975
    
    
      http://dbpedia.org/ontology/Engine/cylinderBore
      400
    
    
      http://dbpedia.org/ontology/Engine/height
      12
    
    
      http://dbpedia.org/ontology/Engine/pistonStroke
      372
    
    
      http://dbpedia.org/ontology/Engine/torqueOutput
      67
    
    
      http://dbpedia.org/ontology/GrandPrix/course
      2635
    
    
      http://dbpedia.org/ontology/Planet/maximumTemperature
      45
    
    
      http://dbpedia.org/ontology/School/campusSize
      1039
    
    
      http://dbpedia.org/ontology/Stream/discharge
      2133
    
    
      http://dbpedia.org/ontology/Weapon/width
      1416
    
    
      ...
      ...
    
    
      http://dbpedia.org/ontology/Weapon/weight
      3387
    
    
      http://dbpedia.org/ontology/Planet/orbitalPeriod
      3197
    
    
      http://dbpedia.org/ontology/Planet/surfaceArea
      41
    
    
      http://dbpedia.org/ontology/SpaceStation/volume
      27
    
    
      http://dbpedia.org/ontology/Stream/minimumDischarge
      915
    
    
      http://dbpedia.org/ontology/Automobile/fuelCapacity
      11
    
    
      http://dbpedia.org/ontology/Canal/originalMaximumBoatBeam
      7
    
    
      http://dbpedia.org/ontology/Engine/width
      12
    
    
      http://dbpedia.org/ontology/Engine/weight
      48
    
    
      http://dbpedia.org/ontology/GrandPrix/distance
      2595
    
    
      http://dbpedia.org/ontology/PopulatedPlace/areaMetro
      766
    
    
      http://dbpedia.org/ontology/Rocket/lowerEarthOrbitPayload
      55
    
    
      http://dbpedia.org/ontology/Weapon/diameter
      666
    
    
      http://dbpedia.org/ontology/Planet/meanTemperature
      59
    
    
      http://dbpedia.org/ontology/Astronaut/timeInSpace
      419
    
    
      http://dbpedia.org/ontology/Automobile/wheelbase
      5331
    
    
      http://dbpedia.org/ontology/Lake/shoreLength
      1592
    
    
      http://dbpedia.org/ontology/MeanOfTransportation/weight
      3173
    
    
      http://dbpedia.org/ontology/MeanOfTransportation/width
      5721
    
    
      http://dbpedia.org/ontology/Planet/meanRadius
      45
    
    
      http://dbpedia.org/ontology/PopulatedPlace/areaTotal
      162857
    
    
      http://dbpedia.org/ontology/Rocket/mass
      188
    
    
      http://dbpedia.org/ontology/Weapon/length
      3415
    
    
      http://dbpedia.org/ontology/Work/runtime
      261729
    
    
      http://dbpedia.org/ontology/Weapon/height
      1412
    
    
      http://dbpedia.org/ontology/Canal/maximumBoatLength
      77
    
    
      http://dbpedia.org/ontology/Planet/minimumTemperature
      41
    
    
      http://dbpedia.org/ontology/Planet/periapsis
      3031
    
    
      http://dbpedia.org/ontology/PopulatedPlace/areaUrban
      655
    
    
      http://dbpedia.org/ontology/PopulatedPlace/populationMetroDensity
      246
    
  

69 rows × 1 columns







    



time: 418 ms



In [13]:

    
sp_count=endpoint.construct("""
    CONSTRUCT {
      ?p summary:count ?cnt .
    } WHERE { {
        SELECT ?p (COUNT(*) AS ?cnt) {
           GRAPH ?_pgraph { ?s ?p ?o . }
        } GROUP BY ?p
   } }
""")









    



time: 416 ms

Other Ontology properties

That begs the question of in which graphs other properties are stored. Searching for dbo:birthDate I find the location of ordinary Literal properties. (Which could be a date, a number or a string)



In [14]:

    
endpoint.select("""
    select ?g (COUNT(*) AS ?cnt) {
       GRAPH ?g { ?a dbo:birthDate ?b } .
    } GROUP BY ?g
""")









    Out[14]:






  
    
      
      cnt
    
    
      g
      
    
  
  
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/mappingbased_literals_en.ttl.bz2
      819371
    
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/persondata_en.ttl.bz2
      730541
    
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/citedFacts_en.ttl.bz2
      6658
    
  








    



time: 275 ms

A search for dbo:child turns up object properties (which point to a URI reference)



In [15]:

    
endpoint.select("""
    select ?g (COUNT(*) AS ?cnt) {
       GRAPH ?g { ?a dbo:child ?b } .
    } GROUP BY ?g
""")









    Out[15]:






  
    
      
      cnt
    
    
      g
      
    
  
  
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/mappingbased_objects_en.ttl.bz2
      14456
    
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/mappingbased_objects_disjoint_range_en.ttl.bz2
      112
    
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/citedFacts_en.ttl.bz2
      91
    
    
      http://downloads.dbpedia.org/2016-04/core-i18n/en/mappingbased_objects_uncleaned_en.ttl.bz2
      14568
    
  








    



time: 247 ms



In [16]:

    
lgraph=URIRef("http://downloads.dbpedia.org/2016-04/core-i18n/en/mappingbased_literals_en.ttl.bz2")
ograph=URIRef("http://downloads.dbpedia.org/2016-04/core-i18n/en/mappingbased_objects_en.ttl.bz2")









    



time: 1 ms

Counting All Properties

By taking a UNION I can count the "specific", object, and literal properties. The DataFrame looks OK, so I decide to save these counts into a graph.



In [17]:

    
endpoint.select("""
   SELECT ?p (COUNT(*) AS ?cnt) {
       { 
           GRAPH ?_pgraph { 
               ?s ?p ?o .      
           }
       } UNION { 
           GRAPH ?_ograph {
               ?s ?p ?o .
           }
       } UNION {
           GRAPH ?_lgraph {
               ?s ?p ?o .
           }
       }
    } GROUP BY ?p
""")









    Out[17]:






  
    
      
      cnt
    
    
      p
      
    
  
  
    
      dbo:film
      20
    
    
      dbo:headteacher
      1
    
    
      dbo:poleDriverCountry
      84
    
    
      dbo:numberOfClassrooms
      2
    
    
      dbo:conservationStatus
      48437
    
    
      dbo:dateOfBurial
      12
    
    
      dbo:established
      2463
    
    
      dbo:firstPublicationYear
      6390
    
    
      dbo:isniId
      50
    
    
      dbo:lastElectionDate
      844
    
    
      dbo:numberOfVineyards
      60
    
    
      dbo:plays
      3720
    
    
      dbo:testaverage
      69
    
    
      dbo:acquirementDate
      5244
    
    
      dbo:bedCount
      1933
    
    
      dbo:buildingStartYear
      2251
    
    
      dbo:ceeb
      1152
    
    
      dbo:centuryBreaks
      356
    
    
      dbo:chairmanTitle
      5001
    
    
      dbo:closingDate
      1638
    
    
      dbo:closingYear
      3655
    
    
      dbo:configuration
      17234
    
    
      dbo:diameter
      1306
    
    
      dbo:discharge
      2141
    
    
      dbo:electionDateLeader
      542
    
    
      dbo:elevationQuote
      2
    
    
      dbo:fees
      159
    
    
      dbo:firstAirDate
      13743
    
    
      dbo:firstGame
      59
    
    
      dbo:formerCallsign
      19894
    
    
      ...
      ...
    
    
      dbo:jstor
      451
    
    
      dbo:whaDraft
      357
    
    
      dbo:areaRural
      20
    
    
      dbo:productionEndDate
      13
    
    
      dbo:mgiid
      9
    
    
      dbo:classes
      390
    
    
      dbo:schoolNumber
      580
    
    
      dbo:startDate
      9444
    
    
      dbo:fansgroup
      29
    
    
      dbo:internationally
      4303
    
    
      dbo:lccn
      3000
    
    
      dbo:nutsCode
      20
    
    
      dbo:rocketStages
      192
    
    
      dbo:routeTypeAbbreviation
      17595
    
    
      dbo:tournamentRecord
      831
    
    
      dbo:characterInPlay
      5373
    
    
      dbo:digitalSubChannel
      4440
    
    
      dbo:frequencyOfPublication
      6082
    
    
      dbo:recommissioningDate
      925
    
    
      dbo:successfulLaunches
      258
    
    
      dbo:originalMaximumBoatBeam
      7
    
    
      dbo:argueDate
      13
    
    
      dbo:lastFlightStartDate
      7
    
    
      dbo:volumeQuote
      1
    
    
      dbo:distanceToEdinburgh
      140
    
    
      dbo:issDockings
      3
    
    
      dbo:maximumDepthQuote
      1
    
    
      dbo:reservations
      58
    
    
      dbo:statisticValue
      8
    
    
      dbo:throwingSide
      655
    
  

1438 rows × 1 columns







    



time: 3.16 s



In [18]:

    
p_counts=endpoint.construct("""
    CONSTRUCT {
       ?p summary:count ?cnt .
    } WHERE {
        {
            SELECT ?p (COUNT(*) AS ?cnt) {
               { 
                   GRAPH ?_pgraph { 
                       ?s ?p ?o .      
                   }
               } UNION { 
                   GRAPH ?_ograph {
                       ?s ?p ?o .
                   }
               } UNION {
                   GRAPH ?_lgraph {
                       ?s ?p ?o .
                   }
               }
            } GROUP BY ?p
        }
    }
""")









    



time: 3.09 s



In [19]:

    
len(p_counts)









    Out[19]:





1438






    



time: 2.99 ms

Counting datatypes

In a RDF, a Class is a kind of type which represents a "Thing" in the world. Datatypes, on the other hand, are types that represent literal values. The most famous types in RDF come from the XML Schema Datatypes and represent things such as integers, dates, and strings.

RDF also allows us to define custom datatypes, which are specified with URIs, like most things in RDF.

A GROUP BY query reveals the prevalence of various datatypes, which I then dump to a graph.

There still are some big questions to research such as "does the same property turn up with different units?" For instance, it is very possible that a length could be represented in kilometers, centimeters, feet, or furlongs. You won't get the right answer, however, if you try to add multiple lengths in different units that are all represented as floats. Thus it may be necessary at some point to build a bridge to a package like numericalunits or alternately build something that canonicalizes them.



In [20]:

    
endpoint.select("""
   SELECT ?datatype (COUNT(*) AS ?cnt) {
       { 
           GRAPH ?_pgraph { 
               ?s ?p ?o .      
           }
       } UNION {
           GRAPH ?_lgraph {
               ?s ?p ?o .
           }
       }
       BIND(DATATYPE(?o) AS ?datatype)
    } GROUP BY ?datatype
""")









    Out[20]:






  
    
      
      cnt
    
    
      datatype
      
    
  
  
    
      http://dbpedia.org/datatype/kilometre
      36045
    
    
      http://dbpedia.org/datatype/kelvin
      559
    
    
      http://dbpedia.org/datatype/millimetre
      59730
    
    
      http://dbpedia.org/datatype/centimetre
      148105
    
    
      http://dbpedia.org/datatype/metre
      387
    
    
      http://dbpedia.org/datatype/litre
      11
    
    
      http://dbpedia.org/datatype/newtonMetre
      67
    
    
      xsd:nonNegativeInteger
      824348
    
    
      xsd:string
      3571269
    
    
      http://dbpedia.org/datatype/usDollar
      56793
    
    
      http://dbpedia.org/datatype/norwegianKrone
      549
    
    
      http://dbpedia.org/datatype/russianRouble
      92
    
    
      http://dbpedia.org/datatype/swissFranc
      251
    
    
      http://dbpedia.org/datatype/indianRupee
      118
    
    
      xsd:integer
      1248777
    
    
      http://dbpedia.org/datatype/tanzanianShilling
      35
    
    
      http://dbpedia.org/datatype/southKoreanWon
      63
    
    
      http://dbpedia.org/datatype/nicaraguanCórdoba
      18
    
    
      http://dbpedia.org/datatype/iranianRial
      6
    
    
      http://dbpedia.org/datatype/rwandaFranc
      24
    
    
      http://dbpedia.org/datatype/mauritianRupee
      6
    
    
      http://dbpedia.org/datatype/ukrainianHryvnia
      28
    
    
      http://dbpedia.org/datatype/renminbi
      37
    
    
      http://dbpedia.org/datatype/moldovanLeu
      6
    
    
      http://dbpedia.org/datatype/australianDollar
      39
    
    
      http://dbpedia.org/datatype/trinidadAndTobagoDollar
      1
    
    
      http://dbpedia.org/datatype/peruvianNuevoSol
      5
    
    
      http://dbpedia.org/datatype/gambianDalasi
      2
    
    
      http://dbpedia.org/datatype/bulgarianLev
      5
    
    
      http://dbpedia.org/datatype/maldivianRufiyaa
      3
    
    
      ...
      ...
    
    
      http://dbpedia.org/datatype/indonesianRupiah
      26
    
    
      http://dbpedia.org/datatype/qatariRial
      7
    
    
      http://dbpedia.org/datatype/thaiBaht
      40
    
    
      http://dbpedia.org/datatype/jordanianDinar
      8
    
    
      http://dbpedia.org/datatype/icelandKrona
      35
    
    
      http://dbpedia.org/datatype/lithuanianLitas
      10
    
    
      http://dbpedia.org/datatype/turkishLira
      8
    
    
      http://dbpedia.org/datatype/malawianKwacha
      6
    
    
      http://dbpedia.org/datatype/ghanaianCedi
      11
    
    
      http://dbpedia.org/datatype/hungarianForint
      18
    
    
      http://dbpedia.org/datatype/romanianNewLeu
      15
    
    
      http://dbpedia.org/datatype/bangladeshiTaka
      25
    
    
      http://dbpedia.org/datatype/nepaleseRupee
      12
    
    
      http://dbpedia.org/datatype/myanmaKyat
      1
    
    
      http://dbpedia.org/datatype/sierraLeoneanLeone
      1
    
    
      http://dbpedia.org/datatype/brazilianReal
      3
    
    
      http://dbpedia.org/datatype/newZealandDollar
      12
    
    
      http://dbpedia.org/datatype/estonianKroon
      2
    
    
      http://dbpedia.org/datatype/latvianLats
      11
    
    
      http://dbpedia.org/datatype/bahrainiDinar
      1
    
    
      http://dbpedia.org/datatype/honduranLempira
      1
    
    
      http://dbpedia.org/datatype/chileanPeso
      1
    
    
      http://dbpedia.org/datatype/iraqiDinar
      1
    
    
      http://dbpedia.org/datatype/guineaFranc
      1
    
    
      http://dbpedia.org/datatype/newTaiwanDollar
      2
    
    
      http://dbpedia.org/datatype/papuaNewGuineanKina
      2
    
    
      http://dbpedia.org/datatype/israeliNewSheqel
      2
    
    
      xsd:anyURI
      25
    
    
      http://dbpedia.org/datatype/fuelType
      457
    
    
      http://dbpedia.org/datatype/valvetrain
      495
    
  

129 rows × 1 columns







    



time: 1min 36s



In [21]:

    
dt_counts=endpoint.construct("""
    CONSTRUCT {
       ?datatype summary:count ?cnt .
    } WHERE {
       SELECT ?datatype (COUNT(*) AS ?cnt) {
           { 
               GRAPH ?_pgraph { 
                   ?s ?p ?o .      
               }
           } UNION {
               GRAPH ?_lgraph {
                   ?s ?p ?o .
               }
           }
           BIND(DATATYPE(?o) AS ?datatype)
        } GROUP BY ?datatype
    }
""")









    



time: 1min 35s

Writing to disk

RDFlib overloads the '+' operator so that we can easily merge the type, property and datatype counts into one (modestly sized) graph.



In [25]:

    
all_counts = t_counts + p_counts + dt_counts









    



time: 182 ms

I add a few prefix declarations for (human) readability, then write the data to disk in Turtle format. I was tempted to write it to a relative path which would put this file in its final destination. (Underneath the local notebook directory, where it could be found by notebooks) but decided against it, since I don't want to take the chance of me (or you) trashing the project by mistake. Instead I'll have to copy the file into place later.



In [28]:

    
all_counts.bind("datatype","http://dbpedia.org/datatype/")
all_counts.bind("dbo","http://dbpedia.org/ontology/")
all_counts.bind("summary","http://rdf.ontology2.com/summary/")
all_counts.serialize("/data/schema_counts.ttl",format='ttl',encoding='utf-8')









    



time: 507 ms

Bonus File: Human Dimensions

While I had my copy of DBpedia running, I thought I'd gather a data set that would be worth making visualizations of. Quite a lot of data exists in DBpedia concerning people's body dimensions, so I decided to run a query and save the data for future use.



In [22]:

    
dimensions=endpoint.select("""
select ?p ?height ?weight {
    GRAPH ?_pgraph {
        ?p <http://dbpedia.org/ontology/Person/weight> ?weight .
        ?p <http://dbpedia.org/ontology/Person/height> ?height .
    }
}
""")









    



time: 51 s



In [23]:

    
dimensions









    Out[23]:






  
    
      
      p
      height
      weight
    
  
  
    
      0
      <Alexander_Hug_(rugby_union)>
      188.00
      91.000000
    
    
      1
      <Anderson_Silva>
      187.96
      83.916000
    
    
      2
      <Andrew_Gee>
      183.00
      102.000000
    
    
      3
      <Bernard_Ackah>
      185.42
      90.720000
    
    
      4
      <Billy_Brandt>
      177.80
      74.844000
    
    
      5
      <Bob_Beamon>
      191.00
      70.000000
    
    
      6
      <Caleb_Moore>
      177.80
      72.576000
    
    
      7
      <Charles_Hamelin>
      175.00
      71.000000
    
    
      8
      <Charmaine_Sinclair>
      172.72
      57.152639
    
    
      9
      <Christina_Von_Eerie>
      162.56
      55.792800
    
    
      10
      <Daniela_Hantuchová>
      181.00
      62.000000
    
    
      11
      <Denice_Klarskov>
      170.18
      50.000000
    
    
      12
      <Eamon_Sullivan>
      191.00
      74.000000
    
    
      13
      <Folke_Jansson>
      187.00
      80.000000
    
    
      14
      <Forrest_Towns>
      188.00
      75.000000
    
    
      15
      <Franco_Columbu>
      164.00
      88.000000
    
    
      16
      <Frank_Zane>
      175.26
      83.916000
    
    
      17
      <Frederique_van_der_Wal>
      177.80
      61.236000
    
    
      18
      <Félix_Sánchez>
      178.00
      73.000000
    
    
      19
      <Georg_Lammers>
      178.00
      84.000000
    
    
      20
      <Gloria_Leonard>
      172.72
      70.761600
    
    
      21
      <Gory_Guerrero>
      175.00
      95.000000
    
    
      22
      <Guillaume_LeBlanc>
      183.00
      74.000000
    
    
      23
      <Habiba_Ghribi>
      174.00
      49.000000
    
    
      24
      <Haile_Gebrselassie>
      165.00
      56.000000
    
    
      25
      <Hans-Joachim_Reske>
      184.00
      80.000000
    
    
      26
      <Harald_Andersson>
      191.00
      99.000000
    
    
      27
      <Heinz-Joachim_Rothenburg>
      185.00
      118.000000
    
    
      28
      <Ivan_Ivančić>
      193.04
      127.915200
    
    
      29
      <Jan_Henne>
      152.40
      63.957600
    
    
      ...
      ...
      ...
      ...
    
    
      41633
      <Brett_McDermott__Brett_McDermott__1>
      180.00
      93.000000
    
    
      41634
      <Attila_Czene__Attila_Czene__1>
      185.00
      76.000000
    
    
      41635
      <Adam_Braidwood__Adam_Braidwood__1>
      193.04
      122.472000
    
    
      41636
      <Clarence_Childs__1>
      183.00
      102.000000
    
    
      41637
      <Eugena_Washington__Eugena_Washington__1>
      152.40
      58.514400
    
    
      41638
      <Niall_Breslin__Niall_Breslin__1>
      198.00
      100.000000
    
    
      41639
      <Matt_Ghaffari__Matt_Ghaffari__1>
      182.88
      127.008000
    
    
      41640
      <Paul_Kelly_(fighter)__Paul_Kelly__1>
      175.26
      70.308000
    
    
      41641
      <Tünde_Szabó__Tünde_Szabó__1>
      175.00
      60.000000
    
    
      41642
      <Muhammed_Lawal__Muhammed_Lawal__1>
      0.00
      92.988000
    
    
      41643
      <Rodney_Glunder__Rodney_Glunder__1>
      185.42
      113.400000
    
    
      41644
      <Garth_Wood__Garth_Wood__1>
      179.00
      80.000000
    
    
      41645
      <Achim_Albrecht__Achim_Albrecht__1>
      180.34
      125.647200
    
    
      41646
      <Anna_Bogomazova__Anna_Bogomazova__1>
      185.42
      70.308000
    
    
      41647
      <Elisabetta_Dessy__Elisabetta_Dessy__1>
      180.00
      58.000000
    
    
      41648
      <Paul_Schaus__Paul_Schaus__1>
      152.40
      70.308000
    
    
      41649
      <Arnold_Jackson_(British_Army_officer)__Arnold_Jackson_1912.jpg__1>
      176.00
      67.000000
    
    
      41650
      <Chris_Laidlaw__1>
      175.00
      78.000000
    
    
      41651
      <Chyna__Chyna__1>
      177.80
      81.648000
    
    
      41652
      <Ernie_Ladd__Ernie_Ladd__1>
      205.74
      145.152000
    
    
      41653
      <Herschel_Walker__Herschel_Walker__1>
      185.42
      99.792000
    
    
      41654
      <Ken_Shamrock__Ken_Shamrock__1>
      185.42
      110.224800
    
    
      41655
      <Ron_Clarke__2>
      183.00
      72.000000
    
    
      41656
      <Adam_Jones_(American_football)__Adam_Jones__1>
      0.00
      83.916000
    
    
      41657
      <Christi_Wolf__Christi_Wolf__1>
      160.02
      68.040000
    
    
      41658
      <Don_Frye__Don_Frye__1>
      185.00
      110.000000
    
    
      41659
      <Shaggy_2_Dope__Shaggy_2_Dope__1>
      187.96
      104.328000
    
    
      41660
      <Victoria_Zdrok__Victoria_Nika_Zdrok__1>
      175.26
      54.432000
    
    
      41661
      <Violent_J__Violent_J__1>
      190.50
      127.008000
    
    
      41662
      <Katsuyori_Shibata__Katsuyori_Shibata__1>
      183.00
      103.000000
    
  

41663 rows × 3 columns







    



time: 25 ms

The data looks a bit messy. Most noticeably, I see quite a few facts which, instead of pointing to DBpedia concepts, point to synthetic URLs (such as <Ron_Clarke__2>) which are supposed to represent 'topics' such the time that a particular employee worked for a particular employer. (See this notebook for some discussion of the phenomenon).

Filtering these out will not be hard, as these synthetic URLs all contain two consecutive underscores.

I also think it's suspicious that a few people have a height of 0.0, which might be in the underlying data, or might be because Gastrodon is not properly handling a missing data value.

It would be certainly possible to serialize these results into an RDF graph, but instead I write them into a CSV for simplicity.



In [24]:

    
dimensions.to_csv("/data/people_weight.csv.gz",compression="gzip",encoding="utf-8")









    



time: 504 ms

Conclusion

To continue the analysis I began here, I needed a count of how often various classes, properties, and datatypes were used in DBpedia. API limits could make getting this data from the public SPARQL endpoint challenging, so I decided to run queries against my own private SPARQL endpoint powered by the Ontology2 Edition of DBpedia.

After setting up connection information, connecting to this private endpoint turned out to be as simple as connecting to a public endpoint and I was efficiently able to get the data I needed into an RDF graph, ready to merge with the DBpedia Ontology graph to make a more meaningful analysis of the data in DBpedia towards the goal of producing interesting and attractive visualizations.



In [ ]:

	cnt
g
http://downloads.dbpedia.org/2016-04/core-i18n/en/citedFacts_en.ttl.bz2	5502
http://downloads.dbpedia.org/2016-04/core-i18n/en/specific_mappingbased_properties_en.ttl.bz2	148105

	cnt
g
http://downloads.dbpedia.org/2016-04/core-i18n/en/instance_types_transitive_en.ttl.bz2	1014819
http://downloads.dbpedia.org/2016-04/core-i18n/en/instance_types_en.ttl.bz2	502997
http://downloads.dbpedia.org/2016-04/core-i18n/en/instance_types_sdtyped_dbo_en.ttl.bz2	212295
http://downloads.dbpedia.org/2016-04/core-i18n/en/instance_types_lhd_dbo_en.ttl.bz2	834547

	cnt
type
dbo:WinterSportPlayer	22373
dbo:Project	11
dbo:PopulatedPlace	505557
dbo:Actor	1969
dbo:Document	23799
dbo:Genre	1229
dbo:Group	33681
dbo:Politician	19569
dbo:Station	2300
dbo:Venue	789
dbo:Animal	219648
dbo:Comic	4097
dbo:GridironFootballPlayer	18908
dbo:MusicalArtist	480
dbo:RacingDriver	1765
dbo:Software	20419
dbo:Song	1155
dbo:EducationalInstitution	53057
dbo:MusicalWork	199355
dbo:NaturalEvent	1191
dbo:RaceTrack	242
dbo:Gene	90
dbo:Cartoon	6373
dbo:Cleric	12842
dbo:Database	358
http://dbpedia.org/ontology/%3Chttp://purl.org/dc/terms/Jurisdiction%3E	24409
dbo:Instrumentalist	151
dbo:LegalCase	2724
dbo:Name	4361
dbo:OrganisationMember	323111
...	...
dbo:Device	25748
dbo:Engine	18829
dbo:FictionalCharacter	7990
dbo:MotorcycleRider	701
dbo:Olympics	4032
dbo:Person	1014819
dbo:Plant	5062
dbo:Royalty	799
dbo:Settlement	230132
dbo:Species	295514
dbo:SportsEvent	23588
dbo:SportsTeam	30031
dbo:TimePeriod	922533
dbo:Wrestler	470
dbo:WrittenWork	62761
dbo:ArchitecturalStructure	188172
dbo:FootballLeagueSeason	3348
dbo:SportsTeamSeason	35360
dbo:GeneLocation	86
dbo:Athlete	298681
dbo:FloweringPlant	381
dbo:Stream	28289
dbo:ClericalAdministrativeRegion	3250
dbo:Coach	6985
dbo:Horse	3855
dbo:Location	816252
dbo:Region	24409
dbo:Satellite	2137
dbo:SportsManager	17654
dbo:Tower	1868

	cnt
p
http://dbpedia.org/ontology/Canal/originalMaximumBoatLength	5
http://dbpedia.org/ontology/Engine/length	16
http://dbpedia.org/ontology/Engine/powerOutput	193
http://dbpedia.org/ontology/Planet/averageSpeed	650
http://dbpedia.org/ontology/Planet/density	123
http://dbpedia.org/ontology/Infrastructure/length	23124
http://dbpedia.org/ontology/Lake/volume	1401
http://dbpedia.org/ontology/Person/weight	66144
http://dbpedia.org/ontology/PopulatedPlace/area	41718
http://dbpedia.org/ontology/PopulatedPlace/populationDensity	75629
http://dbpedia.org/ontology/Software/fileSize	816
http://dbpedia.org/ontology/SpaceShuttle/distance	5
http://dbpedia.org/ontology/SpaceShuttle/timeInSpace	7
http://dbpedia.org/ontology/Planet/apoapsis	3018
http://dbpedia.org/ontology/Engine/displacement	335
http://dbpedia.org/ontology/Canal/maximumBoatBeam	79
http://dbpedia.org/ontology/MeanOfTransportation/diameter	219
http://dbpedia.org/ontology/Planet/volume	62
http://dbpedia.org/ontology/PopulatedPlace/populationUrbanDensity	162
http://dbpedia.org/ontology/Stream/dischargeAverage	148
http://dbpedia.org/ontology/Stream/maximumDischarge	975
http://dbpedia.org/ontology/Engine/cylinderBore	400
http://dbpedia.org/ontology/Engine/height	12
http://dbpedia.org/ontology/Engine/pistonStroke	372
http://dbpedia.org/ontology/Engine/torqueOutput	67
http://dbpedia.org/ontology/GrandPrix/course	2635
http://dbpedia.org/ontology/Planet/maximumTemperature	45
http://dbpedia.org/ontology/School/campusSize	1039
http://dbpedia.org/ontology/Stream/discharge	2133
http://dbpedia.org/ontology/Weapon/width	1416
...	...
http://dbpedia.org/ontology/Weapon/weight	3387
http://dbpedia.org/ontology/Planet/orbitalPeriod	3197
http://dbpedia.org/ontology/Planet/surfaceArea	41
http://dbpedia.org/ontology/SpaceStation/volume	27
http://dbpedia.org/ontology/Stream/minimumDischarge	915
http://dbpedia.org/ontology/Automobile/fuelCapacity	11
http://dbpedia.org/ontology/Canal/originalMaximumBoatBeam	7
http://dbpedia.org/ontology/Engine/width	12
http://dbpedia.org/ontology/Engine/weight	48
http://dbpedia.org/ontology/GrandPrix/distance	2595
http://dbpedia.org/ontology/PopulatedPlace/areaMetro	766
http://dbpedia.org/ontology/Rocket/lowerEarthOrbitPayload	55
http://dbpedia.org/ontology/Weapon/diameter	666
http://dbpedia.org/ontology/Planet/meanTemperature	59
http://dbpedia.org/ontology/Astronaut/timeInSpace	419
http://dbpedia.org/ontology/Automobile/wheelbase	5331
http://dbpedia.org/ontology/Lake/shoreLength	1592
http://dbpedia.org/ontology/MeanOfTransportation/weight	3173
http://dbpedia.org/ontology/MeanOfTransportation/width	5721
http://dbpedia.org/ontology/Planet/meanRadius	45
http://dbpedia.org/ontology/PopulatedPlace/areaTotal	162857
http://dbpedia.org/ontology/Rocket/mass	188
http://dbpedia.org/ontology/Weapon/length	3415
http://dbpedia.org/ontology/Work/runtime	261729
http://dbpedia.org/ontology/Weapon/height	1412
http://dbpedia.org/ontology/Canal/maximumBoatLength	77
http://dbpedia.org/ontology/Planet/minimumTemperature	41
http://dbpedia.org/ontology/Planet/periapsis	3031
http://dbpedia.org/ontology/PopulatedPlace/areaUrban	655
http://dbpedia.org/ontology/PopulatedPlace/populationMetroDensity	246

	cnt
g
http://downloads.dbpedia.org/2016-04/core-i18n/en/mappingbased_literals_en.ttl.bz2	819371
http://downloads.dbpedia.org/2016-04/core-i18n/en/persondata_en.ttl.bz2	730541
http://downloads.dbpedia.org/2016-04/core-i18n/en/citedFacts_en.ttl.bz2	6658

	cnt
g
http://downloads.dbpedia.org/2016-04/core-i18n/en/mappingbased_objects_en.ttl.bz2	14456
http://downloads.dbpedia.org/2016-04/core-i18n/en/mappingbased_objects_disjoint_range_en.ttl.bz2	112
http://downloads.dbpedia.org/2016-04/core-i18n/en/citedFacts_en.ttl.bz2	91
http://downloads.dbpedia.org/2016-04/core-i18n/en/mappingbased_objects_uncleaned_en.ttl.bz2	14568

	cnt
p
dbo:film	20
dbo:headteacher	1
dbo:poleDriverCountry	84
dbo:numberOfClassrooms	2
dbo:conservationStatus	48437
dbo:dateOfBurial	12
dbo:established	2463
dbo:firstPublicationYear	6390
dbo:isniId	50
dbo:lastElectionDate	844
dbo:numberOfVineyards	60
dbo:plays	3720
dbo:testaverage	69
dbo:acquirementDate	5244
dbo:bedCount	1933
dbo:buildingStartYear	2251
dbo:ceeb	1152
dbo:centuryBreaks	356
dbo:chairmanTitle	5001
dbo:closingDate	1638
dbo:closingYear	3655
dbo:configuration	17234
dbo:diameter	1306
dbo:discharge	2141
dbo:electionDateLeader	542
dbo:elevationQuote	2
dbo:fees	159
dbo:firstAirDate	13743
dbo:firstGame	59
dbo:formerCallsign	19894
...	...
dbo:jstor	451
dbo:whaDraft	357
dbo:areaRural	20
dbo:productionEndDate	13
dbo:mgiid	9
dbo:classes	390
dbo:schoolNumber	580
dbo:startDate	9444
dbo:fansgroup	29
dbo:internationally	4303
dbo:lccn	3000
dbo:nutsCode	20
dbo:rocketStages	192
dbo:routeTypeAbbreviation	17595
dbo:tournamentRecord	831
dbo:characterInPlay	5373
dbo:digitalSubChannel	4440
dbo:frequencyOfPublication	6082
dbo:recommissioningDate	925
dbo:successfulLaunches	258
dbo:originalMaximumBoatBeam	7
dbo:argueDate	13
dbo:lastFlightStartDate	7
dbo:volumeQuote	1
dbo:distanceToEdinburgh	140
dbo:issDockings	3
dbo:maximumDepthQuote	1
dbo:reservations	58
dbo:statisticValue	8
dbo:throwingSide	655

	cnt
datatype
http://dbpedia.org/datatype/kilometre	36045
http://dbpedia.org/datatype/kelvin	559
http://dbpedia.org/datatype/millimetre	59730
http://dbpedia.org/datatype/centimetre	148105
http://dbpedia.org/datatype/metre	387
http://dbpedia.org/datatype/litre	11
http://dbpedia.org/datatype/newtonMetre	67
xsd:nonNegativeInteger	824348
xsd:string	3571269
http://dbpedia.org/datatype/usDollar	56793
http://dbpedia.org/datatype/norwegianKrone	549
http://dbpedia.org/datatype/russianRouble	92
http://dbpedia.org/datatype/swissFranc	251
http://dbpedia.org/datatype/indianRupee	118
xsd:integer	1248777
http://dbpedia.org/datatype/tanzanianShilling	35
http://dbpedia.org/datatype/southKoreanWon	63
http://dbpedia.org/datatype/nicaraguanCórdoba	18
http://dbpedia.org/datatype/iranianRial	6
http://dbpedia.org/datatype/rwandaFranc	24
http://dbpedia.org/datatype/mauritianRupee	6
http://dbpedia.org/datatype/ukrainianHryvnia	28
http://dbpedia.org/datatype/renminbi	37
http://dbpedia.org/datatype/moldovanLeu	6
http://dbpedia.org/datatype/australianDollar	39
http://dbpedia.org/datatype/trinidadAndTobagoDollar	1
http://dbpedia.org/datatype/peruvianNuevoSol	5
http://dbpedia.org/datatype/gambianDalasi	2
http://dbpedia.org/datatype/bulgarianLev	5
http://dbpedia.org/datatype/maldivianRufiyaa	3
...	...
http://dbpedia.org/datatype/indonesianRupiah	26
http://dbpedia.org/datatype/qatariRial	7
http://dbpedia.org/datatype/thaiBaht	40
http://dbpedia.org/datatype/jordanianDinar	8
http://dbpedia.org/datatype/icelandKrona	35
http://dbpedia.org/datatype/lithuanianLitas	10
http://dbpedia.org/datatype/turkishLira	8
http://dbpedia.org/datatype/malawianKwacha	6
http://dbpedia.org/datatype/ghanaianCedi	11
http://dbpedia.org/datatype/hungarianForint	18
http://dbpedia.org/datatype/romanianNewLeu	15
http://dbpedia.org/datatype/bangladeshiTaka	25
http://dbpedia.org/datatype/nepaleseRupee	12
http://dbpedia.org/datatype/myanmaKyat	1
http://dbpedia.org/datatype/sierraLeoneanLeone	1
http://dbpedia.org/datatype/brazilianReal	3
http://dbpedia.org/datatype/newZealandDollar	12
http://dbpedia.org/datatype/estonianKroon	2
http://dbpedia.org/datatype/latvianLats	11
http://dbpedia.org/datatype/bahrainiDinar	1
http://dbpedia.org/datatype/honduranLempira	1
http://dbpedia.org/datatype/chileanPeso	1
http://dbpedia.org/datatype/iraqiDinar	1
http://dbpedia.org/datatype/guineaFranc	1
http://dbpedia.org/datatype/newTaiwanDollar	2
http://dbpedia.org/datatype/papuaNewGuineanKina	2
http://dbpedia.org/datatype/israeliNewSheqel	2
xsd:anyURI	25
http://dbpedia.org/datatype/fuelType	457
http://dbpedia.org/datatype/valvetrain	495

	p	height	weight
0	<Alexander_Hug_(rugby_union)>	188.00	91.000000
1	<Anderson_Silva>	187.96	83.916000
2	<Andrew_Gee>	183.00	102.000000
3	<Bernard_Ackah>	185.42	90.720000
4	<Billy_Brandt>	177.80	74.844000
5	<Bob_Beamon>	191.00	70.000000
6	<Caleb_Moore>	177.80	72.576000
7	<Charles_Hamelin>	175.00	71.000000
8	<Charmaine_Sinclair>	172.72	57.152639
9	<Christina_Von_Eerie>	162.56	55.792800
10	<Daniela_Hantuchová>	181.00	62.000000
11	<Denice_Klarskov>	170.18	50.000000
12	<Eamon_Sullivan>	191.00	74.000000
13	<Folke_Jansson>	187.00	80.000000
14	<Forrest_Towns>	188.00	75.000000
15	<Franco_Columbu>	164.00	88.000000
16	<Frank_Zane>	175.26	83.916000
17	<Frederique_van_der_Wal>	177.80	61.236000
18	<Félix_Sánchez>	178.00	73.000000
19	<Georg_Lammers>	178.00	84.000000
20	<Gloria_Leonard>	172.72	70.761600
21	<Gory_Guerrero>	175.00	95.000000
22	<Guillaume_LeBlanc>	183.00	74.000000
23	<Habiba_Ghribi>	174.00	49.000000
24	<Haile_Gebrselassie>	165.00	56.000000
25	<Hans-Joachim_Reske>	184.00	80.000000
26	<Harald_Andersson>	191.00	99.000000
27	<Heinz-Joachim_Rothenburg>	185.00	118.000000
28	<Ivan_Ivančić>	193.04	127.915200
29	<Jan_Henne>	152.40	63.957600
...	...	...	...
41633	<Brett_McDermott__Brett_McDermott__1>	180.00	93.000000
41634	<Attila_Czene__Attila_Czene__1>	185.00	76.000000
41635	<Adam_Braidwood__Adam_Braidwood__1>	193.04	122.472000
41636	<Clarence_Childs__1>	183.00	102.000000
41637	<Eugena_Washington__Eugena_Washington__1>	152.40	58.514400
41638	<Niall_Breslin__Niall_Breslin__1>	198.00	100.000000
41639	<Matt_Ghaffari__Matt_Ghaffari__1>	182.88	127.008000
41640	<Paul_Kelly_(fighter)__Paul_Kelly__1>	175.26	70.308000
41641	<Tünde_Szabó__Tünde_Szabó__1>	175.00	60.000000
41642	<Muhammed_Lawal__Muhammed_Lawal__1>	0.00	92.988000
41643	<Rodney_Glunder__Rodney_Glunder__1>	185.42	113.400000
41644	<Garth_Wood__Garth_Wood__1>	179.00	80.000000
41645	<Achim_Albrecht__Achim_Albrecht__1>	180.34	125.647200
41646	<Anna_Bogomazova__Anna_Bogomazova__1>	185.42	70.308000
41647	<Elisabetta_Dessy__Elisabetta_Dessy__1>	180.00	58.000000
41648	<Paul_Schaus__Paul_Schaus__1>	152.40	70.308000
41649	<Arnold_Jackson_(British_Army_officer)__Arnold_Jackson_1912.jpg__1>	176.00	67.000000
41650	<Chris_Laidlaw__1>	175.00	78.000000
41651	<Chyna__Chyna__1>	177.80	81.648000
41652	<Ernie_Ladd__Ernie_Ladd__1>	205.74	145.152000
41653	<Herschel_Walker__Herschel_Walker__1>	185.42	99.792000
41654	<Ken_Shamrock__Ken_Shamrock__1>	185.42	110.224800
41655	<Ron_Clarke__2>	183.00	72.000000
41656	<Adam_Jones_(American_football)__Adam_Jones__1>	0.00	83.916000
41657	<Christi_Wolf__Christi_Wolf__1>	160.02	68.040000
41658	<Don_Frye__Don_Frye__1>	185.00	110.000000
41659	<Shaggy_2_Dope__Shaggy_2_Dope__1>	187.96	104.328000
41660	<Victoria_Zdrok__Victoria_Nika_Zdrok__1>	175.26	54.432000
41661	<Violent_J__Violent_J__1>	190.50	127.008000
41662	<Katsuyori_Shibata__Katsuyori_Shibata__1>	183.00	103.000000