In [18]:
%matplotlib inline
wikimap create exited with these statistics:
Graphed a newtork of 11398 nodes and 9344 edges from 26202 unrendered : rendered attribute pairs, spanning 3595 infoboxes
Ran the wikimap status command in wikimap/wikimap/ like this:
$ wikimap status ./data-folder/clean_attribute_synonyms_redone.gpickle ./data-folder/infoboxes.xlsx ./data-folder/empty_report.json ./data-folder/explosion_report.json > status.log
with the following excerpted results:
This includes statistics about infoboxes for which Python WikipediaBase does not provide mappings.
Statistics:
Missing 2339 Infobox templates out of 3595 total, or 65.06%
Missing 430375 Wikipedia pages out of 2607241 total, or 16.51%
Missed template with most pages: [Template:Infobox french commune] with 36791 pages
This includes statistics about the largest subgraph (see breakdown of subgraphs by size in "Sample Subgraphs" below).
Statistics:
There are 2131532 Wikipedia pages in the explosion out of 2607241 total, or 81.75%
In order, missed infoboxes with the most pages:
('settlement', 354090.0)
('person', 146915.0)
('album', 122492.0)
('football-biography', 118713.0)
('film', 92138.0)
('musical-artist', 75975.0)
('nrhp', 51425.0)
('company', 50172.0)
('single', 47227.0)
('officeholder', 37139.0)
('book', 30794.0)
('television', 29280.0)
('ship-characteristics', 27818.0)
('military-person', 27662.0)
('ship-career', 25193.0)
('school', 22192.0)
('sportsperson', 19344.0)
('writer', 19319.0)
('scientist', 19272.0)
('radio-station', 18663.0)
('road', 18055.0)
('university', 17891.0)
('football-club', 17065.0)
('mountain', 16137.0)
('station', 16026.0)
('military-unit', 15748.0)
('ice-hockey-player', 14392.0)
('airport', 14094.0)
('cricketer', 13781.0)
('river', 12909.0)
('organization', 12855.0)
('video-game', 12534.0)
('politician', 11483.0)
('planet', 11055.0)
('artist', 10836.0)
('lake', 10244.0)
('building', 9962.0)
('software', 9822.0)
('royalty', 8840.0)
('stadium', 8669.0)
('language', 8021.0)
('football-club-season', 7850.0)
('cyclist', 7407.0)
('television-episode', 7237.0)
('basketball-biography', 6972.0)
('church', 6836.0)
('college-coach', 6630.0)
('protected-area', 6418.0)
('newspaper', 6244.0)
('state-representative', 6030.0)
('journal', 5939.0)
('rugby-league-biography', 5891.0)
('gridiron-football-person', 5845.0)
('rugby-biography', 5690.0)
('rockunit', 5383.0)
('political-party', 5315.0)
('japan-station', 5215.0)
('weapon', 5184.0)
('song', 4960.0)
('automobile', 4903.0)
('military-structure', 4794.0)
('government-agency', 4760.0)
('magazine', 4641.0)
('museum', 4571.0)
('tennis-biography', 4288.0)
('islands', 4224.0)
('prepared-food', 4221.0)
('swimmer', 4052.0)
('congressman', 3929.0)
('award', 3833.0)
('bridge', 3764.0)
('artist-discography', 3732.0)
('artwork', 3675.0)
('airline', 3650.0)
('boxer', 3629.0)
('character', 3624.0)
('historic-site', 3384.0)
('broadcast', 3367.0)
('website', 3332.0)
('religious-building', 3292.0)
('locomotive', 3225.0)
('diocese', 3153.0)
('martial-artist', 3146.0)
('rail-line', 3139.0)
('governor', 3094.0)
('athlete', 3087.0)
('saint', 3059.0)
('dam', 3005.0)
('figure-skater', 2913.0)
('television-season', 2891.0)
('state-senator', 2864.0)
('hospital', 2834.0)
('vg', 2826.0)
('professional-wrestler', 2818.0)
('golfer', 2800.0)
('record-label', 2757.0)
('thoroughbred-racehorse', 2729.0)
('former-country', 2640.0)
('park', 2636.0)
('comics-creator', 2628.0)
('international-football-competition', 2605.0)
('monarch', 2598.0)
('district', 2502.0)
('shopping-mall', 2502.0)
('given-name', 2493.0)
('football-match', 2481.0)
('pageant-titleholder', 2380.0)
('football-tournament-season', 2344.0)
('mountain-range', 2311.0)
('concert-tour', 2217.0)
('president', 2178.0)
('rugby-team', 2098.0)
('sports-league', 2086.0)
('court-case', 2026.0)
('spaceflight', 2014.0)
('power-station', 2004.0)
('nobility', 1956.0)
('political-post', 1950.0)
('sports-season', 1932.0)
('judge', 1928.0)
('military-award', 1841.0)
('anatomy', 1838.0)
('horseraces', 1836.0)
('racing-driver', 1824.0)
('comic-book-title', 1819.0)
('play', 1772.0)
('secondary-school', 1718.0)
('handball-biography', 1686.0)
('prime-minister', 1661.0)
('criminal', 1646.0)
('constituency', 1641.0)
('china-station', 1629.0)
('cultivar', 1627.0)
('education-in-canada', 1626.0)
('mine', 1612.0)
('lighthouse', 1612.0)
('architect', 1611.0)
('philosopher', 1598.0)
('model', 1591.0)
('union', 1568.0)
('mayor', 1560.0)
('gymnast', 1544.0)
('historic-building', 1530.0)
('rugby-union-biography', 1530.0)
('adult-biography', 1515.0)
('legislature', 1498.0)
('ancient-site', 1498.0)
('school-district', 1487.0)
('soap-character-2', 1480.0)
('civilian-attack', 1462.0)
('short-story', 1447.0)
('theatre', 1432.0)
('football-league', 1414.0)
('bus-transit', 1394.0)
('former-subdivision', 1383.0)
('song-contest-entry', 1382.0)
('comedian', 1303.0)
('law-enforcement-agency', 1300.0)
('publisher', 1281.0)
('cycling-race-report', 1272.0)
('volleyball-player', 1271.0)
('basketball-club', 1269.0)
('musical', 1258.0)
('chess-player', 1252.0)
('hurricane', 1219.0)
('hotel', 1203.0)
('rail-service', 1166.0)
('skier', 1160.0)
('music-festival', 1152.0)
('street', 1135.0)
('radio-show', 1118.0)
('national-football-team', 1102.0)
('soap-character', 1087.0)
('train', 1087.0)
('mobile-phone', 1081.0)
('recurring-event', 1080.0)
('motorcycle-rider', 1079.0)
('protein-family', 1078.0)
('comics-character', 1075.0)
('language-family', 1063.0)
('venue', 1053.0)
('city', 1049.0)
('mountain-pass', 1028.0)
('hockey-team', 1020.0)
('holiday', 1019.0)
('motorcycle', 988.0)
('economist', 987.0)
('wrestling-event', 973.0)
('game', 961.0)
('rfam', 946.0)
('film-awards', 936.0)
('family-name', 934.0)
('prison', 915.0)
('zoo', 889.0)
('waterfall', 855.0)
('senator', 851.0)
('film-festival', 850.0)
('religious-biography', 846.0)
('public-transit', 838.0)
('swimming-event', 828.0)
('restaurant', 820.0)
('library', 814.0)
('racehorse', 811.0)
('body-of-water', 799.0)
('sports-team', 792.0)
('college-football-player', 786.0)
('standard', 777.0)
('engineer', 768.0)
('hiking-trail', 766.0)
('beverage', 757.0)
('mandir', 751.0)
('horseracing-personality', 745.0)
('sport-governing-body', 740.0)
('minister', 724.0)
('cricket-ground', 719.0)
('os', 709.0)
('aircraft-occurrence', 706.0)
('roller-coaster', 703.0)
('cemetery', 703.0)
('individual-golf-tournament', 698.0)
('rail', 689.0)
('speedway-rider', 681.0)
('television-channel', 676.0)
('network', 673.0)
('college-athletics', 673.0)
('cycling-race', 662.0)
('glacier', 661.0)
('poker-player', 657.0)
('convention', 655.0)
('historic-subdivision', 647.0)
('cricket-tournament', 645.0)
('astronaut', 644.0)
('london-station', 631.0)
('football-tournament', 631.0)
('ski-area', 629.0)
('camera', 627.0)
('instrument', 625.0)
('badminton-player', 622.0)
('curler', 620.0)
('curling', 617.0)
('diplomatic-mission', 610.0)
('coat-of-arms', 608.0)
('broadcasting-network', 594.0)
('information-appliance', 594.0)
('college', 593.0)
('official-post', 582.0)
('pro-wrestling-championship', 566.0)
('flag', 557.0)
('treaty', 557.0)
('golf-facility', 551.0)
('terrorist-attack', 551.0)
('programming-language', 544.0)
('brand', 542.0)
('monastery', 540.0)
('amusement-park', 531.0)
('football-official', 530.0)
('darts-player', 529.0)
('oil-field', 527.0)
('country-at-games', 525.0)
('baseball-team', 525.0)
('pro-football-player', 516.0)
('chef', 508.0)
('golf-tournament', 502.0)
('international-ice-hockey-competition', 502.0)
('kibbutz', 497.0)
('historical-event', 494.0)
('serial-killer', 491.0)
('cricket-team', 488.0)
('file-format', 488.0)
('rugby-union-season', 481.0)
('table-tennis-player', 479.0)
('monument', 479.0)
('amateur-wrestler', 474.0)
('ambassador', 474.0)
('sailor', 474.0)
('port', 467.0)
('tunnel', 467.0)
('fashion-designer', 465.0)
('sumo-wrestler', 464.0)
('presenter', 461.0)
('institute', 459.0)
('civil-conflict', 458.0)
('anthem', 458.0)
('country', 457.0)
('observatory', 455.0)
('military-installation', 451.0)
('cave', 445.0)
('dogbreed', 442.0)
('automobile-engine', 437.0)
('war-faction', 436.0)
('alpine-ski-racer', 433.0)
('wrestling-team', 431.0)
('noble', 426.0)
('news-event', 423.0)
('kommune', 422.0)
('graphic-novel', 420.0)
('handball-club', 419.0)
('rail-accident', 418.0)
('government-cabinet', 415.0)
('australian-football-club', 413.0)
('reality-music-competition', 413.0)
('casino', 410.0)
('fraternity', 410.0)
('squash-player', 409.0)
('bus-line', 407.0)
('criminal-organization', 404.0)
('motorsport-championship', 397.0)
('earthquake', 396.0)
('deity', 396.0)
('canal', 391.0)
('windmill', 389.0)
('cricket-tour', 389.0)
('writing-system', 384.0)
('bone', 383.0)
('racing-car', 381.0)
('bishop', 380.0)
('games', 378.0)
('grape-variety', 377.0)
('temple', 377.0)
('wine-region', 375.0)
('snooker-player', 373.0)
('speed-skater', 370.0)
('clergy', 367.0)
('currency', 362.0)
('rugby-football-league-season', 362.0)
('novel-series', 361.0)
('lacrosse-player', 361.0)
('artery', 358.0)
('given-name2', 354.0)
('biodatabase', 353.0)
('airliner-accident', 347.0)
('top-level-domain', 345.0)
('fencer', 342.0)
('comic-strip', 340.0)
('national-basketball-team', 340.0)
('cycling-team', 339.0)
('pharaoh', 336.0)
('orchestra', 333.0)
('tram-network', 333.0)
('athletic-conference', 333.0)
('sailboat-specifications', 331.0)
('roller-derby-league', 328.0)
('nerve', 328.0)
('police-officer', 326.0)
('winery', 324.0)
('medical-person', 323.0)
('peer', 320.0)
('cheese', 313.0)
('scholar', 312.0)
('animal', 311.0)
('horse', 310.0)
('musical-composition', 306.0)
('painting', 300.0)
('horse-race', 300.0)
('book-series', 297.0)
('font', 297.0)
('protein', 294.0)
('sports-rivalry', 285.0)
('guitar-model', 284.0)
('military-memorial', 284.0)
('high-court', 284.0)
('lt-governor', 282.0)
('muscle', 280.0)
('comics-organization', 279.0)
('comics-story-arc', 269.0)
('international-handball-competition', 261.0)
('rocket', 259.0)
('rugby-league-club', 256.0)
('surname', 254.0)
('netball-biography', 253.0)
('political-youth-organization', 252.0)
('castrum', 252.0)
('fire-department', 251.0)
('coin', 248.0)
('event', 248.0)
('volleyball-biography', 244.0)
('rowing-club', 244.0)
('badminton-event', 244.0)
('martial-art', 240.0)
('beauty-pageant', 238.0)
('bodybuilder', 237.0)
('college-sports-rivalry', 237.0)
('sport', 230.0)
('national-military', 229.0)
('esl-club', 228.0)
('color', 228.0)
('dancer', 228.0)
('comet', 228.0)
('go-player', 226.0)
('podcast', 225.0)
('regency', 224.0)
('football-country-season', 223.0)
('animanga-character', 222.0)
('mythical-creature', 222.0)
('song-contest-country', 221.0)
('college-baseball-team', 217.0)
('economy', 217.0)
('tennis-tournament', 214.0)
('wrestling-promotion', 213.0)
('cycling-championship', 212.0)
('individual-darts-tournament', 211.0)
('clan', 211.0)
('law-school', 210.0)
('ski-jumper', 210.0)
('frazione', 207.0)
('artefact', 207.0)
('rally', 205.0)
('computer', 204.0)
('organisation', 202.0)
('family', 201.0)
('field-hockey-player', 200.0)
('hockey', 199.0)
('document', 198.0)
('brewery', 195.0)
('defunct-tennis-tournament', 194.0)
('ligament', 190.0)
('exchange', 183.0)
('first-lady', 182.0)
('racecourse', 182.0)
('valley', 180.0)
('squash-tournament', 178.0)
('pipeline', 176.0)
('tornado-outbreak', 173.0)
('synthesizer', 173.0)
('card-game', 172.0)
('comics-location', 171.0)
('named-horse', 169.0)
('video-game-series', 168.0)
('rugby-tournament', 168.0)
('motor-race', 165.0)
('ecoregion', 165.0)
('flood', 164.0)
('cricket-tournament-main', 164.0)
('artifact', 163.0)
('playwright', 162.0)
('volleyball-club', 162.0)
('sport-club', 159.0)
('laboratory', 158.0)
('patriarch', 157.0)
('occupation', 157.0)
('meteorite', 157.0)
('summit', 156.0)
('dava', 156.0)
('central-bank', 155.0)
('sport-wrestler', 154.0)
('rugby-league-football-competition', 153.0)
('song-contest', 153.0)
('national-volleyball-team', 152.0)
('nuclear-weapons-test', 152.0)
('toy', 150.0)
('squash-championship', 150.0)
('sport-tournament', 148.0)
('skating-event', 148.0)
('spacecraft', 147.0)
('college-football-season', 147.0)
('aviator', 143.0)
('tennis-cup-team', 143.0)
('chess-opening', 142.0)
('residential-college', 140.0)
('rugby-league-season', 139.0)
('poultry-breed', 139.0)
('ice-hockey-award', 138.0)
('winter-storm', 138.0)
('drug', 137.0)
('cricket-club-season', 137.0)
('rocket-stage', 136.0)
('national-handball-team', 135.0)
('house', 135.0)
('wildfire', 135.0)
('country-telephone-plan', 135.0)
('convention-center', 134.0)
('mosque', 134.0)
('theologian', 134.0)
('spy', 132.0)
('trolleybus-system', 132.0)
('nebula', 132.0)
('college-marching-band', 130.0)
('pirate', 130.0)
('surfer', 129.0)
('oilfield', 128.0)
('waterlock', 128.0)
('academic-division', 127.0)
('sport-event', 127.0)
('actor', 127.0)
('telescope', 125.0)
('dog-breed', 125.0)
('polyhedron', 124.0)
('college-ice-hockey-team', 124.0)
('webcomic', 123.0)
('fictional-location', 122.0)
('pinball', 122.0)
('geopolitical-organization', 121.0)
('militant-organization', 121.0)
('speedway-league', 121.0)
('radar', 121.0)
('haplogroup', 120.0)
('seamount', 120.0)
('architectural-practice', 120.0)
('bus-company', 118.0)
('students-union', 117.0)
('college-soccer-team', 117.0)
('college-basketball-team', 116.0)
('theatre-group', 115.0)
('ice-hockey-game', 114.0)
('motorcycle-speedway-team', 112.0)
('caste', 111.0)
('department', 109.0)
('comics-meta-series', 107.0)
('international-baseball-tournament', 107.0)
('photographic-lenses', 107.0)
('national-baseball-team', 107.0)
('state', 106.0)
('governor-general', 106.0)
('heritage-railway', 105.0)
('media-franchise', 105.0)
('motorsport-venue', 104.0)
('water-park', 104.0)
('national-political-convention', 104.0)
('name-module', 103.0)
('opera', 103.0)
('former-monarchy', 103.0)
('college-basketball-tournament', 103.0)
('basketball-player', 102.0)
('referendum', 102.0)
('sheep-breed', 102.0)
('mill-building', 99.0)
('triathlete', 99.0)
('team-golf-tournament', 99.0)
('comic', 98.0)
('rebbe', 96.0)
('climber', 96.0)
('gold-mine', 96.0)
('chancellor', 95.0)
('athletics-race', 95.0)
('sporting-event-organization', 95.0)
('pig-breed', 95.0)
('leadership-election', 95.0)
('equestrian', 94.0)
('murderer', 93.0)
('football-derby', 93.0)
('medieval-text', 92.0)
('banknote', 92.0)
('nonhuman-protein', 92.0)
('hot-spring', 92.0)
('media', 92.0)
('web-browser', 91.0)
('surah', 91.0)
('noble-house', 90.0)
('mass-murderer', 89.0)
('candidate', 89.0)
('constellation', 88.0)
('launch-pad', 87.0)
('biathlete', 86.0)
('premier', 86.0)
('spacecraft-class', 86.0)
('motorway-services', 86.0)
('office-holder', 86.0)
('government-budget', 85.0)
('comics-species', 85.0)
('manhwa', 84.0)
('national-futsal-team', 84.0)
('rugby-league-cup', 84.0)
('power-transmission-line', 84.0)
('national-field-hockey-team', 83.0)
('vice-president', 83.0)
('ballet-company', 82.0)
('fragrance', 82.0)
('bus-accident', 82.0)
('floorball-club', 82.0)
('rugby-league-representative-team', 82.0)
('rugby-league-international-tournament', 82.0)
('sport-horse', 81.0)
('forest', 81.0)
('national-hockey-team', 80.0)
('rocket-engine', 79.0)
('automobile-platform', 79.0)
('pigeonbreed', 79.0)
('rugby-league-football-match', 79.0)
('lymph', 79.0)
('athletics-competition', 78.0)
('basketball-league-season', 77.0)
('product', 76.0)
('attraction', 76.0)
('comics-character-and-title', 76.0)
('field-hockey-club', 75.0)
('stage-production', 75.0)
('computer-virus', 75.0)
('ballet', 74.0)
('food', 74.0)
('poem', 74.0)
('connector', 73.0)
('academic', 73.0)
('parliament', 73.0)
('pipe-band', 72.0)
('summit-meeting', 72.0)
('chess-biography', 71.0)
('monarchy', 71.0)
('research-institute', 71.0)
('snooker-tournament', 70.0)
('academic-conference', 69.0)
('athletics-club', 69.0)
('landform', 69.0)
('sports-club', 68.0)
('tram', 68.0)
('technology-festival', 68.0)
('legislative-session', 67.0)
('aircraft', 67.0)
('townlands', 66.0)
('reality-talent-competition', 66.0)
('electric-vehicle', 66.0)
('college-softball-team', 66.0)
('motorcycle-club', 65.0)
('rugby-league-team-season', 64.0)
('themed-area', 63.0)
('cycling-team-season', 63.0)
('fictional-spacecraft', 63.0)
('tea', 62.0)
('comics-elements', 62.0)
('classical-composer', 62.0)
('song-contest-national-year', 62.0)
('ski-jumping-hill', 62.0)
('laboratory-equipment', 61.0)
('concentration-camp', 61.0)
('comics-team-and-title', 61.0)
('beer', 61.0)
('handball-league', 60.0)
('nhsc', 59.0)
('sculpture', 59.0)
('globular-cluster', 59.0)
('cluster', 58.0)
('lunar-mare', 58.0)
('comics-set-index', 58.0)
('congressional-candidate', 57.0)
('piercing', 57.0)
('disputed-islands', 57.0)
('filesystem', 57.0)
('shinty-club', 56.0)
('software-license', 56.0)
('factory', 56.0)
('superhero', 56.0)
('oil-refinery', 55.0)
('national-sports-federations', 54.0)
('aqueduct-navigable', 54.0)
('cycling-season', 54.0)
('railway-depot', 53.0)
('knot', 53.0)
('accounting-body', 51.0)
('space-telescope', 51.0)
('online-music-service', 51.0)
('boxing-match', 51.0)
('sports-division', 51.0)
('encyclical', 51.0)
('sports-announcer', 51.0)
('algorithm', 51.0)
('hurling-championship', 49.0)
('country-demographics', 49.0)
('tennis-player-season', 49.0)
('manuscript', 49.0)
('pier', 49.0)
('gotra', 49.0)
('diamond', 48.0)
('hurricane-impact', 48.0)
('hut', 48.0)
('college-lacrosse-championship', 48.0)
('tornado-single', 48.0)
('sports-conference', 48.0)
('swimming-venue', 47.0)
('college-lacrosse-team', 47.0)
('goat-breed', 47.0)
('future-infrastructure-project', 47.0)
('professional-inline-hockey-team', 47.0)
('ferry-route', 47.0)
('religion', 47.0)
('rugby-league', 47.0)
('national-motorcycle-speedway-team', 46.0)
('cricket-series-begin', 45.0)
('unit', 45.0)
('choir', 45.0)
('postage-stamp', 45.0)
('koryu', 45.0)
('particle', 45.0)
('interbank-network', 45.0)
('cattle-breed', 44.0)
('university-school', 44.0)
('folk-tale', 44.0)
('project', 44.0)
('international-hockey-competition', 44.0)
('space-agency', 44.0)
('cloud', 44.0)
('programming-block', 44.0)
('national-water-polo-team', 44.0)
('fictional-country', 43.0)
('international-floorball-competition', 43.0)
('cross-country-championships', 43.0)
('speedway-league-season', 43.0)
('operational-plan', 42.0)
('fictional-race', 42.0)
('swimming-meet', 42.0)
('tram-route', 41.0)
('island', 41.0)
('isotope', 41.0)
('rail-company', 41.0)
('water-transit', 41.0)
('interval', 41.0)
('meteorite-subdivision', 40.0)
('snooker-season', 40.0)
('climbing-route', 40.0)
('neuron', 40.0)
('region-symbols', 40.0)
('industrial-process', 39.0)
('bus-route', 39.0)
('ambulance-company', 39.0)
('martial-art-school', 39.0)
('bottled-water', 38.0)
('command-structure', 38.0)
('video-game-character', 37.0)
('comics-in-other-media', 37.0)
('superfund', 36.0)
('fictional-planet', 35.0)
('curling-club', 35.0)
('holocaust-event', 35.0)
('darts-tournament', 35.0)
('housing-project', 35.0)
('martial-art-group', 35.0)
('model-rail-scale', 35.0)
('fictional-artifact', 34.0)
('martyrs', 34.0)
('dance', 33.0)
('college-volleyball-team', 33.0)
('concert', 33.0)
('sailing-yacht', 33.0)
('swimming-pool', 33.0)
('computer-hardware', 33.0)
('sport-supporter-group', 31.0)
('whisky-distillery', 31.0)
('space-station', 31.0)
('military-attack', 31.0)
('swim-team', 31.0)
('bibliographic-database', 30.0)
('oil-spill', 30.0)
('geopolitical-organisation', 30.0)
('international-softball-tournament', 30.0)
('national-roller-hockey-team', 30.0)
('sea', 30.0)
('hockey-league', 30.0)
('custom-computer', 30.0)
('wheelchair-tennis-player', 30.0)
('royal-house', 30.0)
('drums-corps', 29.0)
('national-netball-team', 29.0)
('clothing-item', 29.0)
('limited-overs-final', 29.0)
('school-athletics', 29.0)
('manhua', 29.0)
('civil-servant', 28.0)
('urban-feature', 28.0)
('drug-class', 28.0)
('historian', 28.0)
('aerial-lift-line', 28.0)
('television-family', 27.0)
('typeface', 27.0)
('bus', 27.0)
('tennis-circuit-season', 27.0)
('short-track-speed-skater', 27.0)
('football-association', 27.0)
('athletics-event', 26.0)
('defense-minister', 26.0)
('tool', 26.0)
('speaker', 26.0)
('theatre-festival', 26.0)
('computer-hardware-bus', 26.0)
('sports-award', 26.0)
('examination', 25.0)
('festival', 25.0)
('machinima', 25.0)
('mountaineer', 25.0)
('index', 25.0)
('storm', 25.0)
('water-ride', 25.0)
('gaming-group', 25.0)
('census', 25.0)
('comics-nationality', 25.0)
('military-test-site', 25.0)
('netball-team', 24.0)
('basketball-tournament-at-games', 24.0)
('skyscraper', 24.0)
('calculator', 24.0)
('astronomical-survey', 24.0)
('deputy-prime-minister', 24.0)
('swimming-association', 24.0)
('machine', 23.0)
('electronic-component', 23.0)
('dual-roller-coaster', 23.0)
('livery-company', 23.0)
('horse-breed', 23.0)
('eruption', 23.0)
('basketball-game', 23.0)
('mtgset', 23.0)
('field-hockey-league-season', 23.0)
('olive-cultivar', 22.0)
('rare-stamps', 22.0)
('college-wrestling-team', 22.0)
('photographic-film', 22.0)
('technology-standard', 22.0)
('archaeological-culture', 22.0)
('circus', 22.0)
('storage-medium', 21.0)
('wuxia-fiction-character', 21.0)
('certification-mark', 21.0)
('cricketer-tour-biography', 21.0)
('aqueduct', 21.0)
('presidential-library', 21.0)
('paranormal-term', 20.0)
('advertising', 20.0)
('terrestrial-impact-site', 20.0)
('campground', 20.0)
('yacht-club', 20.0)
('data-structure', 20.0)
('retail-market', 20.0)
('comedy-group', 20.0)
('biography', 19.0)
('holiday-camp', 19.0)
('electricity-sector', 19.0)
('genome', 19.0)
('cycling-championships', 18.0)
('single-nucleotide-polymorphism', 18.0)
('country-geography', 18.0)
('watch', 18.0)
('space-shuttle', 18.0)
('fictional-organisation', 18.0)
('pictish-stone', 18.0)
('standardref', 18.0)
('surf-club', 18.0)
('pretender', 18.0)
('photographic-lens', 17.0)
('musician', 17.0)
('castle', 17.0)
('tribe', 17.0)
('hillclimb-venue', 17.0)
('mancala', 17.0)
('historical-era', 17.0)
('tennis-player-season-2', 17.0)
('continent', 17.0)
('synchronized-skating-team', 17.0)
('basketball-club-season', 17.0)
('polygon', 17.0)
('comics-set-and-title', 16.0)
('rugby-league-team', 16.0)
('distributed-computing-project', 16.0)
('fictional-vehicle', 16.0)
('computer-hardware-generic', 16.0)
('legislative-district', 16.0)
('whitewater-course', 16.0)
('high-school', 15.0)
('shipping-job', 15.0)
('robot', 15.0)
('boat-race', 15.0)
('property-development', 15.0)
('spree-killer', 15.0)
('yoga-school', 15.0)
('journalist', 15.0)
('chipset', 15.0)
('tornado', 14.0)
('tennis-match', 14.0)
('networking-protocol', 14.0)
('artificial-fly', 14.0)
('novel', 14.0)
('spring', 14.0)
('multichoice-referendum', 14.0)
('zone', 14.0)
('county', 13.0)
('vacuum-tube', 13.0)
('baseball-game', 13.0)
('reactor', 13.0)
('tennis-player', 13.0)
('college-football-bowl-game', 13.0)
('football-club-season2', 13.0)
('film-movement', 12.0)
('pope', 12.0)
('historic-area', 12.0)
('sports-centre', 12.0)
('province', 12.0)
('soap-opera-family', 12.0)
('meteor-shower', 12.0)
('actress', 12.0)
('gunpowder-plot', 12.0)
('college-swim-team', 12.0)
('zodiac', 12.0)
('cable', 11.0)
('martial-arts-tournament', 11.0)
('college-tennis-team', 11.0)
('alternative-medicine', 11.0)
('professional-bowler', 11.0)
('softball-team', 11.0)
('climbing-area', 11.0)
('pandemic', 11.0)
('first-minister', 11.0)
('college-track-and-field-team', 11.0)
('fictional-creature', 11.0)
('supernova', 11.0)
('statue', 11.0)
('pelotari', 11.0)
('intangible-heritage', 11.0)
('basketball-official', 11.0)
('bus-station', 10.0)
('college-field-hockey-team', 10.0)
('field-hockey-league', 10.0)
('movie-quote', 10.0)
('desalination-plant', 10.0)
('bridge-type', 10.0)
('business-park', 10.0)
('speedway-grand-prix-event', 10.0)
('hymnal', 10.0)
('cricketer-biography', 10.0)
('colour', 10.0)
('highway-system', 10.0)
('famine', 10.0)
('clothing-type', 10.0)
('college-golf-team', 10.0)
('pseudoscience', 10.0)
('expansion-draft', 10.0)
('pulps-character', 10.0)
('martial-art-form', 9.0)
('mascot', 9.0)
('school-marching-band', 9.0)
('animal-breed', 9.0)
('airline-alliance', 9.0)
('dog-crossbreed', 9.0)
('performing-art', 9.0)
('college-gymnastics-team', 9.0)
('animated-superhero', 9.0)
('docks', 9.0)
('tram-depot', 9.0)
('space-station-module', 9.0)
('computing-standard', 9.0)
('video-game-online-service', 9.0)
('motor-racing-team', 8.0)
('sailing-competition', 8.0)
('integer-sequence', 8.0)
('band', 8.0)
('director', 8.0)
('nrl-club', 8.0)
('bay', 8.0)
('bullfighting-career', 8.0)
('village', 8.0)
('water-buffalo-breed', 8.0)
('protocol', 8.0)
('diving-equipment', 8.0)
('furniture', 8.0)
('college-cross-country-team', 8.0)
('sport-overview', 8.0)
('cycling-hill-climb', 8.0)
('beer-style', 8.0)
('fictional-business', 7.0)
('botanical-product', 7.0)
('heraldic-knot', 7.0)
('mining', 7.0)
('tree', 7.0)
('donkey', 7.0)
('night-vision-device', 7.0)
('television-advert', 7.0)
('deputy-first-minister', 7.0)
('serial-publication', 7.0)
('league-commissioner', 7.0)
('television-show', 7.0)
('railway-line', 7.0)
('literary-genre', 7.0)
('tv', 6.0)
('lens-design', 6.0)
('waterway', 6.0)
('movie-camera', 6.0)
('runestone', 6.0)
('recurring-sailing-competition', 6.0)
('private-school', 6.0)
('executive-government', 6.0)
('singer', 6.0)
('rugby-match', 6.0)
('motorsport-round', 6.0)
('rail-franchise', 6.0)
('combat-robot', 6.0)
('telecommunications-network', 6.0)
('wifi-network', 6.0)
('rink-hockey-club', 5.0)
('movie', 5.0)
('software2', 5.0)
('sports-draft', 5.0)
('town', 5.0)
('cardinal', 5.0)
('military-aviation-unit', 5.0)
('mobile-suit', 5.0)
('gem', 5.0)
('financial-index', 5.0)
('bowl-series', 5.0)
('region', 5.0)
('national-football-team-season', 5.0)
('fictional-element', 5.0)
('photographer', 5.0)
('theater', 5.0)
('fieldbus-protocol', 5.0)
('spacecraft-instrument', 5.0)
('attraction-model', 5.0)
('wargame', 5.0)
('art-movement', 4.0)
('sport-centre', 4.0)
('hotspot-custom', 4.0)
('official-football-team', 4.0)
('phone', 4.0)
('room', 4.0)
('cricket-club', 4.0)
('field-hockey', 4.0)
('equestrian-championships', 4.0)
('dance-company', 4.0)
('comics-artificial-species', 4.0)
('engineering-career', 4.0)
('identifier', 4.0)
('snowboarder', 4.0)
('halftime-show', 4.0)
('peninsulas', 4.0)
('crown', 4.0)
('doll', 4.0)
('sports-series', 4.0)
('peninsula', 4.0)
('garden', 3.0)
('comics-object-and-title', 3.0)
('mobile-network', 3.0)
('author', 3.0)
('hostel', 3.0)
('terrorist-organization', 3.0)
('place', 3.0)
('character-encoding', 3.0)
('presidential-government', 3.0)
('college-rowing-team', 3.0)
('flash-series', 3.0)
('magnetosphere', 3.0)
('rugby-tour', 3.0)
('feature-on-celestial-object', 3.0)
('rebreather', 3.0)
('fictional-ship', 3.0)
('college-track-and-field', 3.0)
('location', 3.0)
('file-system', 3.0)
('comics-species-and-title', 3.0)
('military-operation', 3.0)
('country-deforestation', 2.0)
('year-in-country', 2.0)
('videogame', 2.0)
('tropical-cyclone', 2.0)
('composer', 2.0)
('dark-ride', 2.0)
('goat', 2.0)
('square', 2.0)
('books', 2.0)
('historical-continent', 2.0)
('solar-eclipse', 2.0)
('crater-data', 2.0)
('television-series', 2.0)
('drug-mechanism', 2.0)
('miniseries', 2.0)
('sports-in-region', 2.0)
('knot-details', 2.0)
('performer', 2.0)
('film-director', 2.0)
('football-biography-2', 2.0)
('conference', 2.0)
('controversial-invention', 2.0)
('football-player', 2.0)
('footballer', 2.0)
('political-coalition', 2.0)
('water-supply-and-sanitation', 2.0)
('body-process', 2.0)
('stausee', 2.0)
('urban-development-project', 2.0)
('aus-sport-club', 2.0)
('residency-show', 1.0)
('college-inline-hockey-team', 1.0)
('military', 1.0)
('planetary-system', 1.0)
('missile', 1.0)
('cometary-globule', 1.0)
('caliph', 1.0)
('horse-color', 1.0)
('archaeological-site', 1.0)
('open-cluster', 1.0)
('badminton-team', 1.0)
('television-station', 1.0)
('space-mission', 1.0)
('township', 1.0)
('particular-church', 1.0)
('cartoon-character', 1.0)
('area', 1.0)
('cricket-biography', 1.0)
('mtgblockset', 1.0)
('municipality', 1.0)
('periodical', 1.0)
('publication', 1.0)
('doge', 1.0)
('cartoon', 1.0)
('sports-announcer-details', 1.0)
('paranormal-creature', 1.0)
('cricket-player', 1.0)
('computer-peripheral', 1.0)
('automobile-generation', 1.0)
('film-actor', 1.0)
('graphics-processing-unit', 1.0)
('reservoir', 1.0)
('tabletennis-player', 1.0)
('radio-show2', 1.0)
('internet-video', 1.0)
('ice-hockey-team-season', 1.0)
('country-languages', 1.0)
('railway', 1.0)
('journal-series', 1.0)
('minister-office', 1.0)
('poet', 1.0)
('independent-baseball-team', 1.0)
('church2', 1.0)
('physician', 1.0)
('pyramid', 1.0)
('gunpowder-plotter', 1.0)
('actor-voice', 1.0)
('genetically-modified-organism', 1.0)
('victim', 1.0)
('people', 1.0)
('firearm', 1.0)
('college-ski-team', 1.0)
('faunal-age', 1.0)
('primeval-creature', 1.0)
('visual-artist', 1.0)
('cinema', 1.0)
('military-post', 1.0)
('subdivision', 1.0)
Import libraries and cleaned up graph as G, show the distribution of subgraphs.
In [1]:
from wikimap import data, stats, graph
In [2]:
import networkx as nx
In [3]:
import matplotlib.pyplot as plt
In [4]:
G = data.read_graph("./wikimap/data-folder/wikimap-live.gpickle")
In [5]:
G.connected_component_statistics(printStats=True)
One node groups are basically useless to us, so let's take a look first at a few two node groups.
Since matplotlib is such a large dependency, but only necessary for graphing networks, it is not included by default. To graph networks, including those in this notebook, simply install matplotlib:
$ pip install matplotlib
In [6]:
two_node_subgraphs = G.connected_components_with_size(2)
These graphs have a simple structure: just two nodes connected by an edge. Unfortunately, directionality is lost in the above command, but that is okay for our purposes. Let's just see what nodes there are in a few groups.
In [7]:
two_node_subgraphs[0].nodes()
Out[7]:
In [8]:
two_node_subgraphs[1].nodes()
Out[8]:
Hm... let's see what is the rendering state for these nodes.
In [9]:
G.rendering_of_graph_node('wusopen')
Out[9]:
In [10]:
G.rendering_of_graph_node("us women's open")
Out[10]:
In [11]:
G.infoboxes_of_pair('wusopen', "us women's open")
Out[11]:
In [12]:
G.rendering_of_graph_node('cartridge weight')
Out[12]:
In [13]:
G.rendering_of_graph_node('shell weight')
Out[13]:
In [14]:
G.infoboxes_of_pair('cartridge weight', "shell weight")
Out[14]:
Looks like two node subgraphs are just simple unrend ==> rend mappings. Some are valuable such as the above "cartridge weight" and "shell weight".
In [15]:
three_node_subgraphs = G.connected_components_with_size(3)
In [16]:
three_node_subgraphs[0].nodes()
Out[16]:
In [20]:
nx.draw(three_node_subgraphs[0], with_labels=True)
So "metering" is the hub between "exposure metering" and "share of household metering". BTW, edges are bi-directional directionality is eliminated when selecting a subgraph.
In [21]:
three_node_subgraphs[1].nodes()
Out[21]:
In [22]:
three_node_subgraphs[2].nodes()
Out[22]:
In [23]:
nx.draw(three_node_subgraphs[2], with_labels=True)
In [24]:
G.infoboxes_of_pair("type fauna", "principal")
Out[24]:
In [25]:
G.infoboxes_of_pair("principal", "team principal")
Out[25]:
In [26]:
three_node_subgraphs[3].nodes()
Out[26]:
In [27]:
nx.draw(three_node_subgraphs[3], with_labels=True)
In [28]:
G.infoboxes_of_pair("ms", "men's singles")
Out[28]:
In [29]:
G.infoboxes_of_pair("country ms", "men's singles")
Out[29]:
In [30]:
G.rendering_of_graph_node("country ms")
Out[30]:
In [31]:
G.rendering_of_graph_node("men's singles")
Out[31]:
In [32]:
G.rendering_of_graph_node("ms")
Out[32]:
In [33]:
three_node_subgraphs[4].nodes()
Out[33]:
In [34]:
three_node_subgraphs[5].nodes()
Out[34]:
In [35]:
three_node_subgraphs[6].nodes()
Out[35]:
In [36]:
three_node_subgraphs[7].nodes()
Out[36]:
In [37]:
nx.draw(three_node_subgraphs[7], with_labels=True)
In [40]:
three_node_subgraphs[8].nodes()
Out[40]:
In [41]:
three_node_subgraphs[9].nodes()
Out[41]:
In [42]:
three_node_subgraphs[10].nodes()
Out[42]:
In [43]:
nx.draw(three_node_subgraphs[10], with_labels=True)
In [44]:
three_node_subgraphs[11].nodes()
Out[44]:
In [45]:
nx.draw(three_node_subgraphs[11], with_labels=True)
In [46]:
G.rendering_of_graph_node("provideragency")
Out[46]:
In [47]:
G.rendering_of_graph_node("services provided by")
Out[47]:
In [48]:
G.rendering_of_graph_node("provider agency")
Out[48]:
So "services provided by" is the hub between "provideragency" and "provider agency". Interesting that "provider agency" is unrend -- looks like I actually did a good job of cleaning up nodes!
In [49]:
four_node_subgraphs = G.connected_components_with_size(4)
In [50]:
four_node_subgraphs[0].nodes()
Out[50]:
In [51]:
four_node_subgraphs[1].nodes()
Out[51]:
In [52]:
nx.draw(four_node_subgraphs[1], with_labels=True)
In [53]:
four_node_subgraphs[2].nodes()
Out[53]:
In [54]:
four_node_subgraphs[3].nodes()
Out[54]:
In [55]:
four_node_subgraphs[4].nodes()
Out[55]:
In [56]:
four_node_subgraphs[5].nodes()
Out[56]:
In [57]:
four_node_subgraphs[6].nodes()
Out[57]:
In [58]:
four_node_subgraphs[7].nodes()
Out[58]:
In [59]:
four_node_subgraphs[8].nodes()
Out[59]:
In [60]:
four_node_subgraphs[9].nodes()
Out[60]:
In [61]:
nx.draw(four_node_subgraphs[9], with_labels=True)
In [62]:
eight_node_subgraphs = G.connected_components_with_size(8)
In [63]:
eight_node_subgraphs[0].nodes()
Out[63]:
In [64]:
nx.draw(eight_node_subgraphs[0], with_labels=True)
In [65]:
eight_node_subgraphs[1].nodes()
Out[65]:
In [66]:
nx.draw(eight_node_subgraphs[1], with_labels=True)
In [67]:
G.rendering_of_graph_node("identifiers")
Out[67]:
In [68]:
G.rendering_of_graph_node("lid")
Out[68]:
In [69]:
G.rendering_of_graph_node("iata")
Out[69]:
In [70]:
G.rendering_of_graph_node("icao")
Out[70]:
In [71]:
G.rendering_of_graph_node("wmo")
Out[71]:
In [72]:
G.rendering_of_graph_node("faa")
Out[72]:
In [73]:
G.rendering_of_graph_node("tc")
Out[73]:
In [74]:
G.rendering_of_graph_node("gps")
Out[74]:
Well at least we know that those non-sense nodes are all unrend! I should probably ignore unrend entirely.
In [75]:
eight_node_subgraphs[2].nodes()
Out[75]:
In [76]:
nx.draw(eight_node_subgraphs[2], with_labels=True)
In [77]:
eight_node_subgraphs[3].nodes()
Out[77]:
In [78]:
nx.draw(eight_node_subgraphs[3], with_labels=True)
In [79]:
G.infoboxes_of_pair("sibling names", "related")
Out[79]:
In [80]:
eight_node_subgraphs[4].nodes()
Out[80]:
In [81]:
nx.draw(eight_node_subgraphs[4], with_labels=True)
In [82]:
thirteen_node_subgraphs = G.connected_components_with_size(13)
In [83]:
thirteen_node_subgraphs[0].nodes()
Out[83]:
In [84]:
thirteen_node_subgraphs[1].nodes()
Out[84]:
In [85]:
nx.draw(thirteen_node_subgraphs[1], with_labels=True)
In [86]:
fourteen_node_subgraphs = G.connected_components_with_size(14)
In [87]:
fourteen_node_subgraphs[0].nodes()
Out[87]:
In [88]:
fourteen_node_subgraphs[1].nodes()
Out[88]:
In [89]:
fifteen_node_subgraphs = G.connected_components_with_size(15)
In [90]:
fifteen_node_subgraphs[0].nodes()
Out[90]:
In [91]:
nx.draw(fifteen_node_subgraphs[0], with_labels=True)
In [97]:
fourfour_node_subgraphs = G.connected_components_with_size(33)
In [98]:
fourfour_node_subgraphs[0].nodes()
Out[98]:
In [106]:
nx.draw(fourfour_node_subgraphs[0], with_labels=True)
In [107]:
max_size = max(G.connected_component_lengths())
In [108]:
print "Size of largest subgraph: " + str(max_size) + " nodes"
In [109]:
max_graph = G.connected_components_with_size(max_size)[0]
In [110]:
max_graph.nodes()
Out[110]:
In [111]:
nx.draw(max_graph, with_labels=True)
Enough said.