We will measure the running time for enumerating the neighbor vertices for three different data structures for representing the graph: adjacency matrix adjacency list edge list
First, we import all the R packages that we will need for this exercise:
In [ ]:
library(igraph)
Define a function that enumerates the neighbors of a vertex i, when the
graph is stored in adjacency matrix format (ones and zeroes). Use the which
function.
In [ ]:
enumerate_matrix <- function(gmat, i) {
which(gmat[i, ] == 1)
}
Define a function that enumerates the neighbors of a vertex i, when the
graph is stored in adjacency list format. Use the [[
function (i..e, array indexing).
In [ ]:
enumerate_adj_list <- function(adj_list, i) {
adj_list[[i]]
}
Define a function that enumerates the neighbors of a vertex i, when the
graph is stored in edge-list format. Use the which
function and then the c
(concatenate) function.
In [ ]:
enumerate_edge_list <- function(edge_list, i) {
inds1 <- which(edge_list[,1] == i)
inds2 <- which(edge_list[,2] == i)
c(edge_list[inds1,2], edge_list[inds2,1])
}
This next function is the simulation funtion. "n" is the number of vertices. It returns a length-three list containing the average running time for enumerating the neighbor vertices of a vertex in the graph.
In [19]:
dosim <- function(n) {
nrep <- 10
nsubrep <- 10
simdf <- do.call(rbind,
replicate(nrep, {
# make a random undirected graph with fixed (average) vertex degree = 5
g <- sample_pa(n, out.seq=rep(5, n), directed=FALSE)
g_matrix <- as.matrix(as_adjacency_matrix(g))
g_adj_list <- as_adj_list(g)
g_edge_list <- as_edgelist(g)
time_mat <- system.time(
replicate(nsubrep, {
lapply(1:n,
function(i) {
enumerate_matrix(g_matrix, i)
})
})
)[1]
time_adj_list <- system.time(
replicate(nsubrep, {
lapply(1:n,
function(i) {
enumerate_adj_list(g_adj_list, i)
})
})
)[1]
time_edge_list <- system.time(
replicate(nsubrep, {
lapply(1:n,
function(i) {
enumerate_edge_list(g_edge_list, i)
})
})
)[1]
rowdf <- data.frame(matrix=time_mat,
adjlist=time_adj_list,
edgelist=time_edge_list)
rowdf
}, simplify=FALSE)
)
# average over replicates
simres <- apply(simdf, 2, mean)
# divide by n so that the running time results are on a per-vertex basis
simres/n
}
Run the function for the graphs of 1000, 2000, 3000, 4000, and 5000 vertices, and gather the results in a list:
In [20]:
nvals <- c(1000, 2000, 3000, 4000, 5000)
sim_data_list <- lapply(netvals, dosim)
Rescale the list to make the numbers in milliseconds:
In [110]:
sim_data_list_rescaled <- lapply(sim_data_list, function(v) {v*1000})
Let's have a look at the data in the list format:
In [28]:
sim_data_list_rescaled
Convert the list to a wide data frame, using do.call and rbind; make the network size the first column of the data frame (and as a factor)
In [78]:
sim_data_list_df <- data.frame(network_size=as.factor(nvals),
do.call(rbind, sim_data_list_rescaled))
Let's have a look at the wide data frame format:
In [79]:
sim_data_list_df
"Melt" the data into a narrow format:
In [80]:
library(dplyr)
library(reshape2)
sim_data_list_melted <- melt(sim_data_list_df, id.vars="network_size")
Let's have a look at the narrow data frame format:
In [81]:
sim_data_list_melted
Let's plot the data. The "barchart" function in lattice graphics is a nice way to plot a grouped bar plot from a narrow data frame. (Note from SAR: ggplot2 seems to not be playing nice with JupyterHub/IRkernel; that's an issue I will work on fixing....)
In [108]:
library(lattice)
colors = c('red', 'orange', 'yellow')
barchart(value~network_size, data=sim_data_list_melted, groups=variable,
auto.key=list(space='right'),
par.settings=list(superpose.polygon=list(col=colors)))