CS446/546 Class Session 5 - Adjacency Forests

Comparing asymptotic running time for testing two vertices for an edge

In this exercise, we'll compare the asymptotic computational running time for testing if there is an edge between a pair of vertices, averaged over all pairs of vertices in the graph. We'll do it for a series of undirected graphs (each generated using an Barabasi-Albert model), each with 1000 vertices. We will vary the number of edges in the graph; each graph will have a different average number of vertex neighbors for a vertex (i.e., each graph will have a different average vertex degree). We will time how long it takes to test all possible pairs of vertices for whether or not there is an edge between them, for each of four different graph data structures (adjacency matrix, adjacency list, edge list, and adjacency forest).

First, we import all the R packages that we will need for this exercise:


In [12]:
suppressPackageStartupMessages(library(igraph))

We'll need to start by creating a function get_adj_tree that will accept an adjacency list data structure and will create an "adjacency forest" data structure representing the graph. NOTE: I have deleted the line of code that creates a new environment; see ?new.env for help.


In [2]:
get_adj_tree <- function(adj_list) {
    n <- length(adj_list)
    myforest <- list()
    for (i in 1:n) {
        newenv <- new.env()
        for (j in as.vector(adj_list[[i]])) {
            newenv[[as.character(j)]] <- 1
        }
        myforest[[i]] <- newenv
    }
    myforest
}

Now, define a function that will test whether vertices i and j are connected, using an adjacency matrix:


In [3]:
find_matrix <- function(gmat, i, j) {
    gmat[i, j] == 1
}

Now, define a function that will test whether vertices i and j are connected, using an adjacency list. You may find the function %in% useful:


In [4]:
find_adj_list <- function(adj_list, i, j) {
    j %in% adj_list[[i]]
}

Now, define a function that will test whether vertices i and j are connected, using an edge list. You may find the function any useful:


In [5]:
find_edge_list <- function(edge_list, i, j) {
    any((edge_list[,1] == i) & (edge_list[,2] == j)) | 
        any((edge_list[,2] == i) & (edge_list[,1] == j))
}

Now, define a function that will test whether vertices i and j are connected, using an adjacency forest. You may find the function is.null useful:


In [6]:
find_adj_tree <- function(adj_tree, i, jstr) {
    ! is.null(adj_tree[[i]][[jstr]])
}

This is the simulation code; note that we now have two parameters, "n" and "k" (n is the number of vertices in the graph, and k is the average vertex degree. We'll actually be keeping n fixed and varying k for this exercise.


In [9]:
do_sim <- function(n, k) {

    nrep <- 1
    nsubrep <- 1
    
    simdf <- do.call(rbind,
                     replicate(nrep, {
                         g <- sample_pa(n, out.seq=rep(k, n), directed=FALSE)
                         
                         g_matrix <- as.matrix(as_adjacency_matrix(g))
                         g_adj_list <- as_adj_list(g)
                         g_edge_list <- as_edgelist(g)
                         g_adj_tree <- get_adj_tree(g_adj_list)
                                                 
                         # this is for setting up the (admittedly weird) R way of doing a 
                         # double "for" loop (see "mapply" below)
                         allvals <- expand.grid(1:n, 1:n)
                         
                         # need this because "as.character" is kind of slow
                         jstrs <- as.character(1:n)
                         
                         time_mat <- system.time(
                             replicate(nsubrep, {
                                 mapply(function(i, j) {
                                            find_matrix(g_matrix, i, j)
                                        }, allvals$Var1, allvals$Var2)
                             })
                         )[1]
                         
                         time_adj_list <- system.time(
                             replicate(nsubrep, {
                                 mapply(function(i, j) {
                                            find_adj_list(g_adj_list, i, jstrs[j])
                                        }, allvals$Var1, allvals$Var2)
                             })
                         )[1]
                         
                         time_adjacency_forest <- system.time(
                             replicate(nsubrep, {
                                 mapply(function(i, j) {
                                     find_adj_tree(g_adj_tree, i, jstrs[j])
                                     }, allvals$Var1, allvals$Var2)
                             })                           
                         )[1]
                         
                         rowdf <- data.frame(matrix=time_mat,
                                             adjlist=time_adj_list,
                                             adjforest=time_adjacency_forest)
                         

                         rowdf
                     }, simplify=FALSE)
                     )
    
    # average over replicates
    simres <- apply(simdf, 2, mean)
    
    # get results in microseconds, on a per-vertex-pair basis
    1000000*simres/(n*(n-1)/2)
}

Call the do_sim function for three different values of "k" (the average vertex degree), and convert the resulting list (of single-row data frames) to a three-row data frame:


In [10]:
sim_data_df <- do.call(rbind, lapply(c(1, 5, 10, 100), 
                                     function(k) {do_sim(1000, k)}))

In [11]:
sim_data_df


matrixadjlistadjforest
5.421421 9.4174175.029029
5.381381 13.7817825.341341
5.325325 18.9869875.205205
5.493493 97.4734736.334334