This notebook showcases the general improvement performance of the flow algorithm. We will be considering graphs with one million vertices generated by the chimera function. The maximum allowed size for a component given by localImprove will be 10,000. We will also be considering only cases where the set generated by prn has at least 30 vertices, in order to give consistency to our results.


In [1]:
using Laplacians

In [2]:
function condTest(minSize)
    a = chimera(1000000)
    s = prn(a, [1,2,3], 0.5, 5);
    conds = compConductance(a, s)
    if length(s) < minSize
        return -1,0,0,0
    end
    
    minEpsSigma = getVolume(a, s) / getVolume(a, setdiff(collect(1:max(a.n, a.m)), s))
    cut, flow = localImprove(a, s, epsSigma = minEpsSigma, maxSize = 10000)
    condcut = compConductance(a, cut)
    impr_cut = (conds - condcut) / conds * 100
    
    heur_refcut = refineCut(a, s)
    condref = compConductance(a, heur_refcut)
    impr_ref = (conds - condref) / conds * 100
    
    heur_dumb = dumb(a, s)
    conddumb = compConductance(a, heur_dumb)
    impr_dumb = (conds - conddumb) / conds * 100
    
    return conds, (condcut, impr_cut), (condref, impr_ref), (conddumb, impr_dumb)
end

initial = []
flowbased = []
heurbased = []
dumbbased = []

print("Progress: ")
@time for i in 1:500
    x,y,z,t = condTest(30)
    if x == -1
        continue
    end
    
    print("*")
    
    push!(initial, x)
    push!(flowbased, y)
    push!(heurbased, z)
    push!(dumbbased, t)
end
print("\n")


Progress: *****************************************************************************************************************************************************************7508.104603 seconds (19.87 G allocations: 1.375 TB, 7.58% gc time)

Below are the mean and median values for conductance given in order by prn (initial clustering), the flow improvement and the two heuristic improvements. The smaller the conductance the better the result.


In [3]:
println(length(initial), " successful tests.")
println("Initial values (by prn): ", mean(map(x -> x[1], initial)), " ", median(map(x -> x[1], initial)))
println("Flow values: ", mean(map(x -> x[1], flowbased)), " ", median(map(x -> x[1], flowbased)))
println("refineCut values: ", mean(map(x -> x[1], heurbased)), " ", median(map(x -> x[1], heurbased)))
println("dumb values: ", mean(map(x -> x[1], dumbbased)), " ", median(map(x -> x[1], dumbbased)))


161 successful tests.
Initial values (by prn): 0.49714463155963573 0.49825783972125437
Flow values: 0.2770073389226315 0.2496676492688284
refineCut values: 0.518430278451099 0.5
dumb values: 0.513380534483483 0.5161290322580645

Below are the mean and median improvements given by the flow clustering and the two heuristics in this order.


In [4]:
println("Flow improvement: ", mean(map(x -> x[2], flowbased)), "% ", median(map(x -> x[2], flowbased)), "%")
println("refineCut improvement: ", mean(map(x -> x[2], heurbased)), "% ", median(map(x -> x[2], heurbased)), "%")
println("dumb improvement: ", mean(map(x -> x[2], dumbbased)), "% ", median(map(x -> x[2], dumbbased)), "%")


Flow improvement: 44.291589880932136% 49.81257084189314%
refineCut improvement: -4.256256501276661% -0.7531137164551578%
dumb improvement: -3.249088843396663% -3.787375415282394%

Notice that the heuristics have negative mean and median improvement. This is because they ultimately construct their answer sets by a greedy criterion that may or may not eventually lead to a better conductance. However, as they are reasonably fast, they are still worth running.


In [8]:
println("Maximum refineCut improvement: ", maximum(map(x -> x[2], heurbased)), "%")
println("Maximum dumb improvement: ", maximum(map(x -> x[2], dumbbased)), "%")


Maximum refineCut improvement: 59.77011494252874%
Maximum dumb improvement: 54.22116527942925%

In [ ]: