Class Session 4 Exercise:

Comparing asymptotic running time for enumerating neighbors of all vertices in a graph

We will measure the running time for enumerating the neighbor vertices for three different data structures for representing the graph: adjacency matrix adjacency list edge list

First, we import all the R packages that we will need for this exercise:


In [ ]:
library(igraph)

Define a function that enumerates the neighbors of a vertex i, when the graph is stored in adjacency matrix format (ones and zeroes). Use the which function.


In [ ]:
enumerate_matrix <- function(gmat, i) {
    which(gmat[i, ] == 1)
}

Define a function that enumerates the neighbors of a vertex i, when the graph is stored in adjacency list format. Use the [[ function (i..e, array indexing).


In [ ]:
enumerate_adj_list <- function(adj_list, i) {
    adj_list[[i]]
}

Define a function that enumerates the neighbors of a vertex i, when the graph is stored in edge-list format. Use the which function and then the c (concatenate) function.


In [ ]:
enumerate_edge_list <- function(edge_list, i) {
    inds1 <- which(edge_list[,1] == i)
    inds2 <- which(edge_list[,2] == i)
    c(edge_list[inds1,2], edge_list[inds2,1])
}

This next function is the simulation funtion. "n" is the number of vertices. It returns a length-three list containing the average running time for enumerating the neighbor vertices of a vertex in the graph.


In [19]:
dosim <- function(n) {

    nrep <- 10
    nsubrep <- 10
    
    simdf <- do.call(rbind,
                     replicate(nrep, {
                        # make a random undirected graph with fixed (average) vertex degree = 5
                         g <- sample_pa(n, out.seq=rep(5, n), directed=FALSE)
                         
                         g_matrix <- as.matrix(as_adjacency_matrix(g))
                         g_adj_list <- as_adj_list(g)
                         g_edge_list <- as_edgelist(g)

                         time_mat <- system.time(
                             replicate(nsubrep, {
                                 lapply(1:n,
                                        function(i) {
                                            enumerate_matrix(g_matrix, i)
                                        })
                             })
                         )[1]
                         
                         time_adj_list <- system.time(
                             replicate(nsubrep, {
                                 lapply(1:n,
                                        function(i) {
                                            enumerate_adj_list(g_adj_list, i)
                                        })
                             })
                         )[1]
                         
                         time_edge_list <- system.time(
                             replicate(nsubrep, {
                                 lapply(1:n,
                                        function(i) {
                                            enumerate_edge_list(g_edge_list, i)
                                        })
                             })
                         )[1]

                         rowdf <- data.frame(matrix=time_mat,
                                             adjlist=time_adj_list,
                                             edgelist=time_edge_list)
                         
                         rowdf
                     }, simplify=FALSE)
                     )
    
    # average over replicates
    simres <- apply(simdf, 2, mean)
    
    # divide by n so that the running time results are on a per-vertex basis
    simres/n
}

Run the function for the graphs of 1000, 2000, 3000, 4000, and 5000 vertices, and gather the results in a list:


In [20]:
nvals <- c(1000, 2000, 3000, 4000, 5000)
sim_data_list <- lapply(netvals, dosim)

Rescale the list to make the numbers in milliseconds:


In [110]:
sim_data_list_rescaled <- lapply(sim_data_list, function(v) {v*1000})

Let's have a look at the data in the list format:


In [28]:
sim_data_list_rescaled


  1. matrix
    0.0287599999999975
    adjlist
    0.00176000000000386
    edgelist
    0.18432
  2. matrix
    0.0696000000000026
    adjlist
    0.00171999999999798
    edgelist
    0.353239999999998
  3. matrix
    0.10284
    adjlist
    0.00173333333332721
    edgelist
    0.516826666666667
  4. matrix
    0.138470000000001
    adjlist
    0.00184999999999832
    edgelist
    0.675100000000001
  5. matrix
    0.168968
    adjlist
    0.00185600000000068
    edgelist
    0.806472

Convert the list to a wide data frame, using do.call and rbind; make the network size the first column of the data frame (and as a factor)


In [78]:
sim_data_list_df <- data.frame(network_size=as.factor(nvals),
                               do.call(rbind, sim_data_list_rescaled))

Let's have a look at the wide data frame format:


In [79]:
sim_data_list_df


network_sizematrixadjlistedgelist
1000 0.028760 0.0017600000.1843200
2000 0.069600 0.0017200000.3532400
3000 0.102840 0.0017333330.5168267
4000 0.138470 0.0018500000.6751000
5000 0.168968 0.0018560000.8064720

"Melt" the data into a narrow format:


In [80]:
library(dplyr)
library(reshape2)
sim_data_list_melted <- melt(sim_data_list_df, id.vars="network_size")

Let's have a look at the narrow data frame format:


In [81]:
sim_data_list_melted


network_sizevariablevalue
1000 matrix 0.028760000
2000 matrix 0.069600000
3000 matrix 0.102840000
4000 matrix 0.138470000
5000 matrix 0.168968000
1000 adjlist 0.001760000
2000 adjlist 0.001720000
3000 adjlist 0.001733333
4000 adjlist 0.001850000
5000 adjlist 0.001856000
1000 edgelist 0.184320000
2000 edgelist 0.353240000
3000 edgelist 0.516826667
4000 edgelist 0.675100000
5000 edgelist 0.806472000

Let's plot the data. The "barchart" function in lattice graphics is a nice way to plot a grouped bar plot from a narrow data frame. (Note from SAR: ggplot2 seems to not be playing nice with JupyterHub/IRkernel; that's an issue I will work on fixing....)


In [108]:
library(lattice)
colors = c('red', 'orange', 'yellow')
barchart(value~network_size, data=sim_data_list_melted, groups=variable,
         auto.key=list(space='right'),
        par.settings=list(superpose.polygon=list(col=colors)))