The Graph500 Benchmark includes a generator for a random Graph which is the basis for the benchmark computation. The theory for the generator is explained in Leskovic et al.'s "Realistic, mathematically tractable graph generation and evolution, using kronecker multiplication" but the Kronecker structure is not present in the Graph500 and, more importantly, the Kronecker structure is almost impossible to see in the Octave reference code.
Here is a Julia implementation that uses the Kronecker product structure while maintaining speed comparable to the reference C++ implementation.
In terms of the Kronecker structure, the problem of generating the random graph consists of two parts
and the Kronecker structure shows up in step 1. The procedure for 1. is the following. Let $E^{(1)}$ be a $2\times 2$ random one-hot-matrix, i.e. with a single unit element and three zeros, e.g. $\left( \begin{smallmatrix}0 & 1\\ 0 & 0\end{smallmatrix}\right)$ then the adjacency matrix of a single edge of the Kronecker graph is $$ E^{(n)}=\bigotimes_{i=1}^n E^{(1)}_i $$ and the complete graph of (about) $m$ edges is therefore $$ \sum_{i=1}^m E^{(n)}_i = \sum_{i=1}^m \bigotimes_{j=1}^n E^{(1)}_j $$
Fairly simpel, but for a usual matrix implementation, a direct calculation of this quantity by, say two mapreduce
operations, would be prohitively expensive if the $X$s were stored in a normal dense format, e.g. [0 1; 0 0]
. For each of the Kronecker products, a new matrix would be allocated. Storing all elements would be out of question both in terms of storage ($2^{n+1}$ stored elements) and arithmetic operation ($2(2^n - 2)$ multiplications).
Typical sparse representations such as compressed sparse column (CSC) or coordinate (COO) would in theory reduce the storage costs to a constant factor but these formats still carries more baggage than needed. Instead, we can define a custom type
In [1]:
immutable OneHotMatrix <: AbstractMatrix{Int}
n::Int
i::Int
j::Int
end
Base.size(A::OneHotMatrix) = (A.n, A.n)
Base.size(A::OneHotMatrix, i) = i < 1 ? error("") :
i < 3 ? A.n : 1
Base.getindex(A::OneHotMatrix, i::Integer, j::Integer) = Int(i == A.i && j == A.j)
Base.kron(A::OneHotMatrix, B::OneHotMatrix) = OneHotMatrix(A.n * B.n, (A.i - 1) * B.n + B.i, (A.j - 1) * B.n + B.j)
const ⊗ = kron;
When a type is a subtype of AbstractMatrix
, the only two methods needed for minimal functionality are size
and getindex
. With this, we are able to create, show
In [2]:
A = OneHotMatrix(4,3,2)
Out[2]:
and e.g. multiply
In [3]:
A*randn(4)
Out[3]:
Mext, we define a type we can use for generating random instances of the OneHotMatrix
. We call is OneHotMatrixRNG
and it simply stores three probabilities
In [4]:
immutable OneHotMatrixRNG
p1::Float64
p2::Float64
p3::Float64
OneHotMatrixRNG(p1, p2, p3) = new(p1, p1 + p2, p1 + p2 + p3)
end
@inline function Base.rand(X::OneHotMatrixRNG)
u = rand(Float64)
ifelse(u < X.p1, OneHotMatrix(2, 1, 2),
ifelse(u < X.p2, OneHotMatrix(2, 2, 1),
ifelse(u < X.p3, OneHotMatrix(2, 2, 2), OneHotMatrix(2, 1, 1))))
end
We can now generate random instances
In [5]:
for i = 1:3
display(rand(OneHotMatrixRNG(0.57, 0.19, 0.19)))
end
We now define a matrix type accumulating the indices of the non-zero elements. Internally, it is simply a vector of the index tuples and a size. Again, we define size
and getindex
but here we also define +
for two OneHotMatrix
types which generates a COOSquare
and then an in-place addition function add!
to avoid excessive realloactions when accumulating instances of OneHotMatrix
.
In [6]:
# Define a sparse matrix type to store the accumulation of Kronecker terms
immutable COOSquare <: AbstractSparseMatrix{Int}
keys::Vector{Tuple{Int,Int}}
n::Int
end
Base.size(A::COOSquare) = (A.n, A.n)
Base.size(A::COOSquare, i) = i < 1 ? error("") :
i < 3 ? A.n : 1
Base.getindex(A::COOSquare, i::Integer, j::Integer) = mapreduce(a -> i == a[1] && j == a[2], +, A.keys)
import Base: +
function (+)(A::OneHotMatrix, B::OneHotMatrix)
if A.n != B.n
error("")
end
return COOSquare(Tuple{Int,Int}[(A.i, A.j), (B.i, B.j)], A.n)
end
function add!(A::COOSquare, B::OneHotMatrix)
push!(A.keys, (B.i, B.j))
return A
end
function add!(A::COOSquare, B::COOSquare)
append!(A.keys, B.keys)
return A
end
# Make sure that the reduction initialized correctly for our new type
Base.r_promote(::typeof(add!), x::OneHotMatrix) = COOSquare([(x.i, x.j)], x.n)
In [7]:
X = OneHotMatrixRNG(0.57, 0.19, 0.19)
Out[7]:
In [8]:
E₁, E₂, E₃, E₄ = rand(X), rand(X), rand(X), rand(X)
E₁⊗E₂ + E₃⊗E₄
Out[8]:
Finally, we can define the graph generator
In [9]:
immutable Kronecker
m::Int
n::Int
rng::OneHotMatrixRNG
end
function Base.rand(x::Kronecker)
E1(_) = rand(x.rng)
En(_) = mapreduce(E1, ⊗, 1:x.m)
return mapreduce(En , add!, 1:x.n)
end
To show the sparsity pattern, we define a few convenience functions for printing sparse matrices
In [10]:
type SpyMatrix{T}
data::Matrix{T}
sz::Tuple{Int,Int}
end
function SpyMatrix(A::SparseMatrixCSC, s1 = 40, s2 = 80)
m, n = size(A)
c = A.colptr
r = A.rowval
B = zeros(Int, s1, s2)
for j = 1:length(c) - 1
for k = c[j]:c[j + 1] - 1
i = r[k]
iB = ceil(Int, i/m*s1)
jB = ceil(Int, j/n*s2)
B[iB, jB] += 1
end
end
SpyMatrix(B/size(A,1)/size(A,2)*s1*s2, (m,n))
end
function show(io::IO, A::SpyMatrix)
charArray = [' ', '·', '•', '●']
AA = A.data
m, n = size(AA)
B = similar(AA, Char)
mm, nn = div(A.sz[1], size(AA, 1)), div(A.sz[2], size(AA, 2))
md2 = div(size(AA, 1), 2)
print(io, '┌')
for j = 1:n
print(io, '─')
end
println(io, '┐')
my, ny = A.sz
for i = 1:m
print(io, '│')
for j = 1:n
v = AA[i,j]
print(io, charArray[ifelse(2mm*nn/min(mm, nn)*v < 1, 1,
ifelse(sqrt(mm*nn)*v < 1, 2,
ifelse(2*v < 1, 3, 4)))])
end
print(io, '│')
if i == md2 - 3
println(io, " ┌──────────────────┐")
elseif i == md2 - 2
println(io, " │$(charArray[1]) density ≈ 0 │")
elseif i == md2 - 1
println(io, " │$(charArray[2]) density < 1/√mn │")
elseif i == md2
println(io, " │$(charArray[3]) density < 50% │")
elseif i == md2 + 1
println(io, " │$(charArray[4]) density ≥ 50% │")
elseif i == md2 + 2
println(io, " └──────────────────┘")
else
println(io)
end
end
print(io, '└')
for j = 1:size(AA, 2)
print(io, '─')
end
println(io, '┘')
nothing
end
function spy(io::IO, A::SparseMatrixCSC, sz = 40)
println(io, summary(A))
show(io, SpyMatrix(A, div(sz, 2), sz))
nzs = @sprintf("density: %.3e", nnz(A)/length(A))
lpad = div(sz - length(nzs), 2)
for i = 1:lpad
print(io, ' ')
end
println(io, nzs)
end
spy(A::SparseMatrixCSC, sz = 40) = spy(STDOUT, A, sz)
spy(A::COOSquare, sz = 40) =
spy(sparse([x[1] for x in A.keys], [x[2] for x in A.keys], ones(length(A.keys)), size(A)...), sz)
Out[10]:
Now we are ready to generate a Kronecker graph and display the sparcity pattern its adjacency matrix. This one will be $1048576\times 1048576$ and have $16777216$ edges
In [11]:
spy(rand(Kronecker(20, 2^20*16, OneHotMatrixRNG(0.57, 0.19, 0.19))), 80)