VennEuler.jl and the Meetup API

Harlan Harris

Statistical Programming DC

October 23, 2014

Table of Contents:

  1. Demo First!
  2. Algorithm
  3. References

Problem:

What does the overlap of Meetup members look like?


Demo!


In [1]:
# packages to get JSON data from an HTTP API
using Requests
using JSON

In [2]:
# no, I'm not providing my Meetup API key -- get your own!
apikey = open(readchomp, "apikey");
apikey[1:5]


Out[2]:
"732e2"

In [3]:
# ask the Meetup API for deets about a group
function getGroupInfo(apikey, urlname) 
    request = "https://api.meetup.com/2/groups?key=$apikey&sign=true&group_urlname=$urlname"
    ret = get(request)
    dat = JSON.parse(ret.data)
    dat["results"][1]
end
gi = getGroupInfo(apikey, "stats-prog-DC")


Out[3]:
Dict{String,Any} with 21 entries:
  "lat"         => 38.90999984741211
  "visibility"  => "public"
  "who"         => "dc-hackeRs"
  "rating"      => 4.5
  "link"        => "http://www.meetup.com/stats-prog-dc/"
  "timezone"    => "US/Eastern"
  "lon"         => -77.0199966430664
  "state"       => "DC"
  "organizer"   => ["name"=>"Data Community DC","member_id"=>69461342]
  "name"        => "Statistical Programming DC"
  "urlname"     => "stats-prog-dc"
  "id"          => 1503964
  "created"     => 1249911519000
  "topics"      => {["name"=>"Open Source","urlkey"=>"opensource","id"=>563],["…
  "description" => "<p>The DC Area R Users group has become <strong>Statistical…
  "country"     => "US"
  "join_mode"   => "open"
  "members"     => 2101
  "category"    => ["shortname"=>"tech","name"=>"tech","id"=>34]
  "city"        => "Washington"
  "group_photo" => ["thumb_link"=>"http://photos1.meetupstatic.com/photos/event…

In [4]:
# ask the Meetup API for Member IDs from a group
# requires chunked requests (a better way would be to use the "next" field in the response)
function getMembers(apikey, group_id, memberCt; verbose=true)
    chunksize = 200 
    memberIds = Array(Int,0)
    if verbose print(group_id) end
    for page in 0:ifloor(memberCt/chunksize)
        request = "https://api.meetup.com/2/members?key=$apikey&sign=true&group_id=$group_id&page=$chunksize&offset=$page&only=id"
        ret = get(request)
        dat = JSON.parse(ret.data)
        if verbose print('.') end
        for x in dat["results"]
            push!(memberIds, x["id"])
        end
    end
    if verbose println() end
    memberIds
end
member_ids = getMembers(apikey, gi["id"], gi["members"])
member_ids[1:20]


1503964...........
Out[4]:
20-element Array{Int64,1}:
   9042718
 126079712
  18800821
  52767982
  13617288
  53551712
   9788971
  11793079
  10903209
  49594582
   8783792
  45403702
 174018832
  25039192
   5725085
  25168992
   4192734
  14190659
  69315702
  68982492

In [5]:
# great, seems to work, now get all the members for relevant Meetups, storing as a dict of sets
group_names = ["stats-prog-dc", "data-science-dc",
    "DC-Hack-and-Tell", "code-for-dc", "hack-edu"]
group_members_struct = Dict()
for grname in group_names
    gi = getGroupInfo(apikey, grname)
    group_members_struct[grname] = 
        Set(getMembers(apikey, gi["id"], gi["members"])...)
end
group_members_struct
# takes a couple minutes -- jump to the end!


1503964...........
2215331......................
7361532..
9724492.....
1800681..
Out[5]:
Dict{Any,Any} with 5 entries:
  "data-science-dc"  => Set{Int64}({25940092,2209921,62380632,13049952,71267812…
  "hack-edu"         => Set{Int64}({3125661,25940092,86478972,54136702,72680042…
  "DC-Hack-and-Tell" => Set{Int64}({8294317,9009918,10814564,13282990,138029572…
  "stats-prog-dc"    => Set{Int64}({152363432,108337692,1611067,65378112,982128…
  "code-for-dc"      => Set{Int64}({87207012,23412601,79967,85232752,118766812,…

In [6]:
# then convert that dict of sets to a bool matrix
everyone = union([v for (k,v) in group_members_struct]...)
memb_group = [in(memb, group_members_struct[group]) 
                for memb in everyone, group in group_names]


Out[6]:
6164x5 Array{Any,2}:
 false   true  false  false   true
 false   true  false  false  false
 false   true  false  false  false
 false   true  false  false  false
 false   true  false  false  false
  true  false  false  false  false
 false   true  false  false  false
 false   true  false  false  false
 false   true  false  false  false
 false   true  false  false  false
 false  false  false  false   true
 false  false  false   true  false
 false  false   true  false  false
     ⋮                            
 false  false  false  false   true
 false  false  false   true  false
  true   true  false  false  false
 false   true  false  false  false
 false   true  false  false  false
 false  false  false   true  false
  true  false  false   true  false
 false  false   true  false  false
  true  false  false  false  false
 false   true  false  false  false
  true  false  false  false  false
  true   true  false  false  false

In [7]:
# and now we're good to make a VennEuler diagram!
using VennEuler

In [9]:
eo = make_euler_object(group_names, 
    memb_group, 
    EulerSpec(:rectangle),                       # rectangles > circles!
    sizesum=.4)                                  # scaling in unit square

(minf,minx,ret) = optimize_iteratively(          # greedy meta-optimization algorithm
    eo,                                          # problem we're trying to solve
    random_state(eo),                            # where to start
    ftol=-1, xtol=0.0025, maxtime=5, pop=100)    # quick 'n dirty

(minf,minx,ret) = optimize(eo,                   # global optimization
    minx,                                        # start where we left off
    ftol=.0001, xtol=0.0025, maxtime=20, pop=200) # more horsepower this time...

println("FINALLY:\ngot $minf at $minx (returned $ret)")


got 0.0113237122320136 at [0.4741004373902662,0.3825840845903019,0.8782651078836512,0.4715779753148258,0.5247271884376687,0.0,0.5387456623730961,0.3800214151500866,0.134473787789543,0.7243367342342032,0.5190731728821334,0.8868721170806606,0.4884720710925132,0.6877304249127528,0.683954226467042] (returned MAXTIME_REACHED)
got 0.007015433061189635 at [0.590384506112969,0.6279980834557536,0.7032552127681516,0.4715779753148258,0.5247271884376687,0.0,0.5387456623730961,0.3800214151500866,0.134473787789543,0.7243367342342032,0.5190731728821334,0.8868721170806606,0.4884720710925132,0.6877304249127528,0.683954226467042] (returned MAXTIME_REACHED)
got 0.0014034413985353305 at [0.590384506112969,0.6279980834557536,0.7032552127681516,0.4715779753148258,0.5247271884376687,0.0,0.5387456623730961,0.3800214151500866,0.134473787789543,0.4416809624167229,0.30852856269771484,0.9852162725860679,0.4884720710925132,0.6877304249127528,0.683954226467042] (returned MAXTIME_REACHED)
got 0.001315343678035554 at [0.590384506112969,0.6279980834557536,0.7032552127681516,0.4715779753148258,0.5247271884376687,0.0,0.3398688689791761,0.3855958226252807,0.0,0.4416809624167229,0.30852856269771484,0.9852162725860679,0.4884720710925132,0.6877304249127528,0.683954226467042] (returned XTOL_REACHED)
got 0.00036347997740958273 at [0.590384506112969,0.6279980834557536,0.7032552127681516,0.4715779753148258,0.5247271884376687,0.0,0.3398688689791761,0.3855958226252807,0.0,0.4416809624167229,0.30852856269771484,0.9852162725860679,0.5125506219364654,0.8774113594189542,0.49005950813688737] (returned MAXTIME_REACHED)
FINALLY:
got 0.00034416320520549385 at [0.5957019444424811,0.6240471610454105,0.7042858368455648,0.4715779753148258,0.5246593795287489,0.0,0.3616229811060519,0.37909313675135886,0.0,0.4510939511920299,0.30300489799881863,0.9808305198908757,0.5202562246620901,0.8712123903033603,0.49079902250618945] (returned FTOL_REACHED)

In [10]:
render("lkjh.svg", eo, minx)


Algorithm

Due to Wilkinson (2012).

Previous implementations in Java with R wrapper.

My work: Julia. More shapes.

Venn Diagrams and Euler Diagrams

  • Venn: Show all possible combinations.
  • Euler: Show only extant combinations.
  • Area-proportional Euler: Areas are approximately proportional.

Nota Bene: If using circles, can't generally do this perfectly if more than 2 sets! (With 3 sets, power set is $2^n = 8$, with $3 \cdot (n-1) = 6$ degrees of freedom.)

But, we can approximate and minimize a cost function!

Cost Function

  1. Suppose we have (for circles) X, Y, R for each set.
  2. Plot 'em on a virtual screen -- boolean 2-D array per set.
  3. Iterate over the powerset and compute overlapping pixels with logical AND.
  4. Normalize and compare with actual overlap seen in the data.
  5. Cost is squared mismatch in overlap.

Cool bit: This works with any shape we can plot. No need for a fancy-pants geometric overlap algorithm.

Less cool bits: There's some error if the virtual screen's too small. I use 200px square, which is good enough to be a smaller issue than the inadequate DoF problem.

Optimization

  1. Non-convex; not super-fast evaluation, so use global optimization.
  2. Using Controlled Random Search (CRS) with local mutation, from NLopt, an open-source nonlinear optimization package out of MIT, with a Julia wrapper.

Other Features

  1. Lock one or more sets into place. (E.g., target set in the center.)
  2. Use all-the-same or specific shapes per set.

TODO

  1. Wilkinson's force-directed initial state heuristic
  2. Better API and documentation
  3. More shapes!
  4. Performance enhancements

Thanks!