Venn diagrams, Euler Diagrams, Meetups and VennEuler.jl

Harlan Harris

TrackMaven Monthly Challenge

January 19, 2015

Table of Contents:

  1. Venn and Euler Diagrams
  2. Problem: What does the overlap of Meetup members look like?
  3. Solution: VennEuler.jl

Venn Diagrams show, conceptually, how sets or ideas overlap.

  • Drives intuition; clarifies framework.
  • Show all possible combinations; can shade to show observed overlaps
  • Size usually equal; if different, qualitative.

Euler Diagrams only show observed relationships.

  • Often used with 4+ sets.
  • Size usually suggests set size.

Area-proportional Euler Diagrams show approximately show relative size

  • Sometimes used in scientific communication.
  • Not always just cirles/ovals

Degrees-of-freedom issue:

With 3 sets, power set is 2n=8, but only have 3⋅(n−1)=6 degrees of freedom.

Optimization Problem:

  • Determine location and size (and perhaps orientation) of shapes,
  • to minimize error (normalized set overlap vs normalized "pixel" overlap).

Prior Work

Wilkinson (2012) -- R/Java app for optimizing circles

  • Efficient brute-force approach, but minimal customization possible

VennEuler.jl

Julia package for drawing area-proportional Euler diagrams

https://github.com/HarlanH/VennEuler.jl

Features:

  • circles, triangles, squares, rectangles (can be per-set)
  • optionally lock parameters

Limitations:

  • hacky API
  • needs more shapes!
  • lots of room for performance improvements
  • finicky hyperparameters for optimization

Demo!


In [3]:
# packages to get JSON data from an HTTP API
using Requests
using JSON

In [4]:
# no, I'm not providing my Meetup API key -- get your own!
apikey = open(readchomp, "apikey");
apikey[1:5]


Out[4]:
"732e2"

In [5]:
# ask the Meetup API for deets about a group
function getGroupInfo(apikey, urlname) 
    request = "https://api.meetup.com/2/groups?key=$apikey&sign=true&group_urlname=$urlname"
    ret = get(request)
    dat = JSON.parse(ret.data)
    dat["results"][1]
end
gi = getGroupInfo(apikey, "TrackMaven-Monthly-Challenge")


Out[5]:
Dict{String,Any} with 21 entries:
  "lat"         => 38.90999984741211
  "visibility"  => "public"
  "who"         => "'); DROP TABLE Members;--"
  "utc_offset"  => -18000000
  "rating"      => 4.84
  "link"        => "http://www.meetup.com/TrackMaven-Monthly-Challenge/"
  "timezone"    => "US/Eastern"
  "lon"         => -77.04000091552734
  "state"       => "DC"
  "organizer"   => ["name"=>"TrackMaven","member_id"=>176927822]
  "name"        => "Monthly Challenge"
  "urlname"     => "TrackMaven-Monthly-Challenge"
  "id"          => 17538822
  "created"     => 1412977454000
  "topics"      => {["name"=>"Open Source","urlkey"=>"opensource","id"=>563],["…
  "description" => "<p>TrackMaven's Monthly Challenge is a chance to experiment…
  "country"     => "US"
  "join_mode"   => "open"
  "members"     => 192
  "category"    => ["shortname"=>"tech","name"=>"tech","id"=>34]
  "city"        => "Washington"

In [6]:
# ask the Meetup API for Member IDs from a group
# requires chunked requests (a better way would be to use the "next" field in the response)
function getMembers(apikey, group_id, memberCt; verbose=true)
    chunksize = 200 
    memberIds = Array(Int,0)
    if verbose print(group_id) end
    for page in 0:ifloor(memberCt/chunksize)
        request = "https://api.meetup.com/2/members?key=$apikey&sign=true&group_id=$group_id&page=$chunksize&offset=$page&only=id"
        ret = get(request)
        dat = JSON.parse(ret.data)
        if verbose print('.') end
        for x in dat["results"]
            push!(memberIds, x["id"])
        end
    end
    if verbose println() end
    memberIds
end
member_ids = getMembers(apikey, gi["id"], gi["members"])
member_ids[1:20]


17538822.
Out[6]:
20-element Array{Int64,1}:
 183610329
  25039192
 178081672
 183308791
 182515126
 159890452
 183630308
 182676997
 128539712
 162010302
 122890472
   3823597
  43552892
 128150592
   1494356
  86895462
   3878583
   6171938
 147919352
  12004411

In [7]:
# great, seems to work, now get all the members for relevant Meetups, storing as a dict of sets
group_names = ["stats-prog-dc", "Data-Visualization-DC",
    "DC-Hack-and-Tell", "TrackMaven-Monthly-Challenge", "hack-edu"]
group_members_struct = Dict()
for grname in group_names
    gi = getGroupInfo(apikey, grname)
    group_members_struct[grname] = 
        Set(getMembers(apikey, gi["id"], gi["members"])...)
end
group_members_struct
# takes a couple minutes -- jump to the end!


1503964............
6957082............
7361532...
17538822.
1800681..
Out[7]:
Dict{Any,Any} with 5 entries:
  "TrackMaven-Monthly-Cha… => Set{Int64}({179185442,126598012,71076272,6171938,…
  "hack-edu"               => Set{Int64}({3125661,25940092,86478972,54136702,72…
  "DC-Hack-and-Tell"       => Set{Int64}({8294317,123939182,50878822,182855518,…
  "Data-Visualization-DC"  => Set{Int64}({87735072,152363432,143026012,12443103…
  "stats-prog-dc"          => Set{Int64}({117652872,108337692,152363432,1611067…

In [8]:
# then convert that dict of sets to a bool matrix
everyone = union([v for (k,v) in group_members_struct]...)
memb_group = [in(memb, group_members_struct[group]) 
                for memb in everyone, group in group_names]


Out[8]:
4508x5 Array{Any,2}:
 false  false  false  false   true
 false   true  false  false  false
 false   true  false  false  false
  true  false  false  false  false
 false   true  false  false  false
  true  false  false  false  false
 false   true  false  false  false
 false   true  false  false  false
 false  false  false  false   true
 false  false   true  false  false
  true  false  false  false  false
  true  false  false  false  false
 false   true  false  false  false
     ⋮                            
  true  false  false  false  false
 false  false  false  false   true
  true   true  false  false  false
 false   true  false  false  false
 false   true  false  false  false
 false   true  false  false  false
 false   true  false  false  false
  true  false  false  false  false
 false  false   true  false  false
  true  false  false  false  false
  true  false  false  false  false
 false  false  false   true  false

In [9]:
# and now we're good to make a VennEuler diagram!
using VennEuler

In [14]:
eo = make_euler_object(group_names, 
    memb_group, 
    EulerSpec(:circle),                       # rectangles > circles!
    sizesum=.3)                                  # scaling in unit square

(minf,minx,ret) = optimize_iteratively(          # greedy meta-optimization algorithm
    eo,                                          # problem we're trying to solve
    random_state(eo),                            # where to start
    ftol=-1, xtol=0.0025, maxtime=5, pop=100)    # quick 'n dirty

(minf,minx,ret) = optimize(eo,                   # global optimization
    minx,                                        # start where we left off
    ftol=.00005, xtol=0.001, maxtime=40, pop=250) # more horsepower this time...

println("FINALLY:\ngot $minf at $minx (returned $ret)")


got 0.0011773963420000366 at [0.650209546980835,0.29412465794848536,0.4143890776568645,0.24038180208588278,0.635374802412045,0.7066282263498384,0.70421457527918,0.6940634608142564,0.7530034338352188,0.0778524675577283] (returned XTOL_REACHED)
got 0.001097581516628944 at [0.650209546980835,0.29412465794848536,0.4911071019432083,0.47711157718278674,0.635374802412045,0.7066282263498384,0.70421457527918,0.6940634608142564,0.7530034338352188,0.0778524675577283] (returned XTOL_REACHED)
got 0.00047159343626657915 at [0.650209546980835,0.29412465794848536,0.4911071019432083,0.47711157718278674,0.7240794617536277,0.5969255219773719,0.70421457527918,0.6940634608142564,0.7530034338352188,0.0778524675577283] (returned XTOL_REACHED)
got 0.0004316348260640822 at [0.650209546980835,0.29412465794848536,0.4911071019432083,0.47711157718278674,0.7240794617536277,0.5969255219773719,0.70421457527918,0.6940634608142564,0.8061775925103083,0.4872497840819942] (returned XTOL_REACHED)
got 0.00042398528007448605 at [0.650209546980835,0.29412465794848536,0.4911071019432083,0.47711157718278674,0.7240794617536277,0.5969255219773719,0.7315248381279312,0.6995983117612904,0.8061775925103083,0.4872497840819942] (returned XTOL_REACHED)
FINALLY:
got 0.00021240823980772351 at [0.591641983359685,0.31187005506717463,0.4377781091518288,0.4964465306252557,0.6830493211435485,0.5411796209502959,0.7680725352146964,0.6235625407053891,0.8555626166057588,0.5903671760831991] (returned FTOL_REACHED)

In [15]:
render("circles1.svg", eo, minx)


In [24]:
# fancier shapes
eo = make_euler_object(group_names, 
    memb_group, 
    [EulerSpec(:circle), EulerSpec(:square, [.5, .5], [0, 0]), EulerSpec(:triangle),
    EulerSpec(:rectangle), EulerSpec(:rectangle)],
    sizesum=.3)                                  # scaling in unit square

(minf,minx,ret) = optimize_iteratively(          # greedy meta-optimization algorithm
    eo,                                          # problem we're trying to solve
    random_state(eo),                            # where to start
    ftol=-1, xtol=0.0025, maxtime=5, pop=100)    # quick 'n dirty

(minf,minx,ret) = optimize_iteratively(          # greedy meta-optimization algorithm
    eo,                                          # problem we're trying to solve
    minx,                            # start where we left off
    ftol=-1, xtol=0.0025, maxtime=5, pop=100)    # quick 'n dirty

(minf,minx,ret) = optimize(eo,                   # global optimization
    minx,                                        # start where we left off
    ftol=.000005, xtol=0.001, maxtime=40, pop=300) # more horsepower this time...

println("FINALLY:\ngot $minf at $minx (returned $ret)")


got 0.004944795751371201 at [0.45811905420160093,0.7319118208388237,0.5,0.5,0.7913087467793177,0.27038630558537546,0.785956094131496,0.17077318174769082,0.9363293380463487,0.6164767580160874,0.6003365658444567,0.758179211959874] (returned XTOL_REACHED)
got 0.004944795751371201 at [0.45811905420160093,0.7319118208388237,0.5,0.5,0.7913087467793177,0.27038630558537546,0.785956094131496,0.17077318174769082,0.9363293380463487,0.6164767580160874,0.6003365658444567,0.758179211959874] (returned SUCCESS)
got 0.004155674367803061 at [0.45811905420160093,0.7319118208388237,0.5,0.5,0.6828625642147557,0.6531516839657346,0.785956094131496,0.17077318174769082,0.9363293380463487,0.6164767580160874,0.6003365658444567,0.758179211959874] (returned MAXTIME_REACHED)
got 0.0010821106786490355 at [0.45811905420160093,0.7319118208388237,0.5,0.5,0.6828625642147557,0.6531516839657346,0.785956094131496,0.17077318174769082,0.9363293380463487,0.8844981230792499,0.19145739927285846,0.0] (returned MAXTIME_REACHED)
got 0.0009946304264307143 at [0.45811905420160093,0.7319118208388237,0.5,0.5,0.6828625642147557,0.6531516839657346,0.8971197589856064,0.25464836519206296,0.6921985881610219,0.8844981230792499,0.19145739927285846,0.0] (returned MAXTIME_REACHED)
got 0.00046184719280331426 at [0.7297087716847865,0.42072931401159486,0.5,0.5,0.6828625642147557,0.6531516839657346,0.8971197589856064,0.25464836519206296,0.6921985881610219,0.8844981230792499,0.19145739927285846,0.0] (returned XTOL_REACHED)
got 0.00046184719280331426 at [0.7297087716847865,0.42072931401159486,0.5,0.5,0.6828625642147557,0.6531516839657346,0.8971197589856064,0.25464836519206296,0.6921985881610219,0.8844981230792499,0.19145739927285846,0.0] (returned SUCCESS)
got 0.00024860527382811323 at [0.7297087716847865,0.42072931401159486,0.5,0.5,0.5248214456200593,0.27226659341461446,0.8971197589856064,0.25464836519206296,0.6921985881610219,0.8844981230792499,0.19145739927285846,0.0] (returned XTOL_REACHED)
got 0.00019289573265385284 at [0.7297087716847865,0.42072931401159486,0.5,0.5,0.5248214456200593,0.27226659341461446,0.8971197589856064,0.25464836519206296,0.6921985881610219,0.8844981230792497,0.11916031160554405,0.8313982877824047] (returned MAXTIME_REACHED)
got 0.00019142319215200115 at [0.7297087716847865,0.42072931401159486,0.5,0.5,0.5248214456200593,0.27226659341461446,0.8971197589856064,0.2495888886840622,0.6883837215365771,0.8844981230792497,0.11916031160554405,0.8313982877824047] (returned MAXTIME_REACHED)
FINALLY:
got 0.00018900585033865315 at [0.7298741112572973,0.4222905100647036,0.5,0.5,0.5269262468305396,0.2725481412912217,0.8971197589856064,0.25046916076055137,0.688481711131562,0.8844981230792499,0.11923493328730325,0.8283348185148465] (returned FTOL_REACHED)

In [25]:
render("multi4.svg", eo, minx)

Thanks!