The Graph500 Kronecker graph generator in Julia

The Graph500 Benchmark includes a generator for a random Graph which is the basis for the benchmark computation. The theory for the generator is explained in Leskovic et al.'s "Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication" but the Kronecker structure is not present in the Graph500 and, more importantly, the Kronecker structure is almost impossible to see in the Octave reference code.

Here is a Julia implementation that uses the Kronecker product structure while maintaining speed comparable to the reference C++ implementation.

In terms of the Kronecker structure, the problem of generating the random graph consists of two parts

Generate an edge
Sum $n$ edges

and the Kronecker structure shows up in step 1. The procedure for 1. is the following. Let $E^{(1)}$ be a $2\times 2$ random one-hot-matrix, i.e. with a single unit element and three zeros, e.g. $\left( \begin{smallmatrix}0 & 1\\ 0 & 0\end{smallmatrix}\right)$ then the adjacency matrix of a single edge of the Kronecker graph is $$ E^{(n)}=\bigotimes_{i=1}^n E^{(1)}_i $$ and the complete graph of (about) $m$ edges is therefore $$ \sum_{i=1}^m E^{(n)}_i = \sum_{i=1}^m \bigotimes_{j=1}^n E^{(1)}_j $$

Fairly simpel, but for a usual matrix implementation, a direct calculation of this quantity by, say two mapreduce operations, would be prohitively expensive if the $X$s were stored in a normal dense format, e.g. [0 1; 0 0]. For each of the Kronecker products, a new matrix would be allocated. Storing all elements would be out of question both in terms of storage ($2^{n+1}$ stored elements) and arithmetic operation ($2(2^n - 2)$ multiplications).

Typical sparse representations such as compressed sparse column (CSC) or coordinate (COO) would in theory reduce the storage costs to a constant factor but these formats still carries more baggage than needed. Instead, we can define a custom type



In [1]:

    
immutable OneHotMatrix <: AbstractMatrix{Int}
    n::Int
    i::Int
    j::Int
end

Base.size(A::OneHotMatrix)    = (A.n, A.n)
Base.size(A::OneHotMatrix, i) = i < 1 ? error("") :
                                i < 3 ? A.n : 1

Base.getindex(A::OneHotMatrix, i::Integer, j::Integer) = Int(i == A.i && j == A.j)

Base.kron(A::OneHotMatrix, B::OneHotMatrix) = OneHotMatrix(A.n * B.n, (A.i - 1) * B.n + B.i, (A.j - 1) * B.n + B.j)

const ⊗ = kron;

When a type is a subtype of AbstractMatrix, the only two methods needed for minimal functionality are size and getindex. With this, we are able to create, show



In [2]:

    
A = OneHotMatrix(4,3,2)









    Out[2]:





4×4 OneHotMatrix:
 0  0  0  0
 0  0  0  0
 0  1  0  0
 0  0  0  0

and e.g. multiply



In [3]:

    
A*randn(4)









    Out[3]:





4-element Array{Float64,1}:
  0.0     
  0.0     
 -0.577848
  0.0

Mext, we define a type we can use for generating random instances of the OneHotMatrix. We call is OneHotMatrixRNG and it simply stores three probabilities



In [4]:

    
immutable OneHotMatrixRNG
    p1::Float64
    p2::Float64
    p3::Float64
    OneHotMatrixRNG(p1, p2, p3) = new(p1, p1 + p2, p1 + p2 + p3)
end

@inline function Base.rand(X::OneHotMatrixRNG)
    u = rand(Float64)
    ifelse(u < X.p1, OneHotMatrix(2, 1, 2),
    ifelse(u < X.p2, OneHotMatrix(2, 2, 1),
    ifelse(u < X.p3, OneHotMatrix(2, 2, 2), OneHotMatrix(2, 1, 1))))
end

We can now generate random instances



In [5]:

    
for i = 1:3
    display(rand(OneHotMatrixRNG(0.57, 0.19, 0.19)))
end









    





2×2 OneHotMatrix:
 0  1
 0  0






    





2×2 OneHotMatrix:
 0  0
 0  1






    





2×2 OneHotMatrix:
 0  1
 0  0

We now define a matrix type accumulating the indices of the non-zero elements. Internally, it is simply a vector of the index tuples and a size. Again, we define size and getindex but here we also define + for two OneHotMatrix types which generates a COOSquare and then an in-place addition function add! to avoid excessive realloactions when accumulating instances of OneHotMatrix.



In [6]:

    
# Define a sparse matrix type to store the accumulation of Kronecker terms
immutable COOSquare <: AbstractSparseMatrix{Int}
    keys::Vector{Tuple{Int,Int}}
    n::Int
end

Base.size(A::COOSquare)    = (A.n, A.n)
Base.size(A::COOSquare, i) = i < 1 ? error("") :
                             i < 3 ? A.n : 1

Base.getindex(A::COOSquare, i::Integer, j::Integer) = mapreduce(a -> i == a[1] && j == a[2], +, A.keys)

import Base: +
function (+)(A::OneHotMatrix, B::OneHotMatrix)
    if A.n != B.n
        error("")
    end
    return COOSquare(Tuple{Int,Int}[(A.i, A.j), (B.i, B.j)], A.n)
end
function add!(A::COOSquare, B::OneHotMatrix)
    push!(A.keys, (B.i, B.j))
    return A
end
function add!(A::COOSquare, B::COOSquare)
    append!(A.keys, B.keys)
    return A
end

# Make sure that the reduction initialized correctly for our new type
Base.r_promote(::typeof(add!), x::OneHotMatrix) = COOSquare([(x.i, x.j)], x.n)



In [7]:

    
X = OneHotMatrixRNG(0.57, 0.19, 0.19)









    Out[7]:





OneHotMatrixRNG(0.57,0.76,0.95)



In [8]:

    
E₁, E₂, E₃, E₄ = rand(X), rand(X), rand(X), rand(X)
E₁⊗E₂ + E₃⊗E₄









    Out[8]:





4×4 COOSquare:
 0  0  0  2
 0  0  0  0
 0  0  0  0
 0  0  0  0

Finally, we can define the graph generator



In [9]:

    
immutable Kronecker
    m::Int
    n::Int
    rng::OneHotMatrixRNG
end

function Base.rand(x::Kronecker)
    E1(_) = rand(x.rng)
    En(_) = mapreduce(E1, ⊗, 1:x.m)
    return mapreduce(En , add!, 1:x.n)
end

To show the sparsity pattern, we define a few convenience functions for printing sparse matrices



In [10]:

    
type SpyMatrix{T}
    data::Matrix{T}
    sz::Tuple{Int,Int}
end
function SpyMatrix(A::SparseMatrixCSC, s1 = 40, s2 = 80)
    m, n = size(A)
    c = A.colptr
    r = A.rowval
    B = zeros(Int, s1, s2)
    for j = 1:length(c) - 1
        for k = c[j]:c[j + 1] - 1
            i = r[k]
            iB = ceil(Int, i/m*s1)
            jB = ceil(Int, j/n*s2)
            B[iB, jB] += 1
        end
    end
    SpyMatrix(B/size(A,1)/size(A,2)*s1*s2, (m,n))
end
function show(io::IO, A::SpyMatrix)
    charArray = [' ', '·', '•', '●']
    AA = A.data
    m, n = size(AA)
    B = similar(AA, Char)
    mm, nn = div(A.sz[1], size(AA, 1)), div(A.sz[2], size(AA, 2))
    md2 = div(size(AA, 1), 2)

    print(io, '┌')
    for j = 1:n
        print(io, '─')
    end
    println(io, '┐')
    my, ny = A.sz
    for i = 1:m
        print(io, '│')
        for j = 1:n
            v = AA[i,j]
            print(io, charArray[ifelse(2mm*nn/min(mm, nn)*v < 1, 1,
                ifelse(sqrt(mm*nn)*v < 1, 2, 
                ifelse(2*v < 1, 3, 4)))])
        end
        print(io, '│')
        if i == md2 - 3
            println(io, "  ┌──────────────────┐")
        elseif i == md2 - 2
            println(io, "  │$(charArray[1]) density ≈ 0     │")
        elseif i == md2 - 1
            println(io, "  │$(charArray[2]) density < 1/√mn │")
        elseif i == md2
            println(io, "  │$(charArray[3]) density < 50%   │")
        elseif i == md2 + 1
            println(io, "  │$(charArray[4]) density ≥ 50%   │")
        elseif i == md2 + 2
            println(io, "  └──────────────────┘")
        else
            println(io)
        end
    end
    print(io, '└')
    for j = 1:size(AA, 2)
        print(io, '─')
    end
    println(io, '┘')
    nothing
end
function spy(io::IO, A::SparseMatrixCSC, sz = 40)
    println(io, summary(A))
    show(io, SpyMatrix(A, div(sz, 2), sz))
    nzs = @sprintf("density: %.3e", nnz(A)/length(A))
    lpad = div(sz - length(nzs), 2)
    for i = 1:lpad
        print(io, ' ')
    end
    println(io, nzs)
end
spy(A::SparseMatrixCSC, sz = 40) = spy(STDOUT, A, sz)
spy(A::COOSquare, sz = 40) =
    spy(sparse([x[1] for x in A.keys], [x[2] for x in A.keys], ones(length(A.keys)), size(A)...), sz)









    Out[10]:





spy (generic function with 6 methods)

Now we are ready to generate a Kronecker graph and display the sparcity pattern its adjacency matrix. This one will be $1048576\times 1048576$ and have $16777216$ edges



In [11]:

    
spy(rand(Kronecker(20, 2^20*16, OneHotMatrixRNG(0.57, 0.19, 0.19))), 80)









    



1048576×1048576 SparseMatrixCSC{Float64,Int64}
┌────────────────────────────────────────────────────────────────────────────────┐
│                                      ·•                  ·•        ·•   ·• ••••│
│                                    ···•                ···•      ···• ···•·••••│
│                                  ·    ·              ·    ·    ·    · ··•••••••│
│                                  ·    ·              ·    ·    ·    · ·••• ·•••│
│                                                                      ·••••·••••│
│                             •         •         •         •    • ·•••    • ·•••│
│                             ·         ·         ·         ·    · ••••    · ••••│
│                                                                •····•    •····•│
│                                                              ·••  ·••  ·••  ·••│
│                                                             ···· ···· ···· ····│
│                   •                   •         •    • ·•••         •    • ·•••│
│                   ·                   ·         ·    · ••••         ·    · ••••│
│                                                      •····•              •····•│
│                                                    ·••  ·••            ·••  ·••│
│                                                   ···· ····           ···· ····│
│         ·         ·         ·         ·    ·   ••    ·   ••    ·  ·••    ·   ••│
│                                              ·•·•      ·•·•      ·•·•      ·•·•│  ┌──────────────────┐
│                                            •    •    •    •    •    •    •    •│  │  density ≈ 0     │
│                                           ··   ··   ··   ··   ··   ··   ··   ··│  │· density < 1/√mn │
│                                                                                │  │• density < 50%   │
│                   •         •    • ·•••                   •         •    • ·•••│  │● density ≥ 50%   │
│                   ·         ·    · ••••                   ·         ·    · ••••│  └──────────────────┘
│                                 ·•····•                                  •····•│
│                                ·••  ·••                                ·••  ·••│
│                               ···· ····                               ···· ····│
│         ·         ·    ·  ·••    ·  ·••         ·         ·    ·  ·••    ·  ·••│
│                          ·•·•      ·•·•                          ·•·•      ·•·•│
│                        •    •    •    •                        •    •    •    •│
│                       ··   ··   ··   ··                       ··   ··   ··   ··│
│                                                                                │
│         ·    ·  ·••         ·    ·   ••         ·    ·   ••         ·    ·  ·••│
│                ·•·•                ·•·•                ·•·•                ·•·•│
│              •    •              •    •              •    •              •    •│
│             ··   ··             ··   ··             ··   ··             ··   ··│
│                                                                                │
│        ·•        ·•        ·•        ·•        ·•        ·•        ·•        ·•│
│       · ·       · ·       · ·       · ·         ·       · ·       · ·       · ·│
│         ·         ·    ·    ·         ·         ·    ·    ·         ·         ·│
│                                                                                │
│                                                                                │
└────────────────────────────────────────────────────────────────────────────────┘
                               density: 1.463e-05