CS446/546 Class Session 5 - Adjacency Forests

Comparing asymptotic running time for testing two vertices for an edge

In this exercise, we'll compare the asymptotic computational running time for testing if there is an edge between a pair of vertices, averaged over all pairs of vertices in the graph. We'll do it for a series of undirected graphs (each generated using an Barabasi-Albert model), each with 1000 vertices. We will vary the number of edges in the graph; each graph will have a different average number of vertex neighbors for a vertex (i.e., each graph will have a different average vertex degree). We will time how long it takes to test all possible pairs of vertices for whether or not there is an edge between them, for each of four different graph data structures (adjacency matrix, adjacency list, edge list, and adjacency forest).

First, we import all the R packages that we will need for this exercise:


In [1]:
library(igraph)


Attaching package: 'igraph'

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

We'll need to start by creating a function get_adj_tree that will accept an adjacency list data structure and will create an "adjacency forest" data structure representing the graph. NOTE: I have deleted the line of code that creates a new environment; see ?new.env for help.


In [1]:
get_adj_tree <- function(adj_list) {
    n <- length(adj_list)
    myforest <- list()
    for (i in 1:n) {
        ## EVIL PERSON DELETED A LINE OF CODE HERE;  ADD CODE TO CREATE A NEW ENVIRONMENT AND CALL IT `newenv`
        for (j in as.vector(adj_list[[i]])) {
            newenv[[as.character(j)]] <- 1
        }
        myforest[[i]] <- newenv
    }
    myforest
}

Now, define a function that will test whether vertices i and j are connected, using an adjacency matrix:


In [3]:
find_matrix <- function(gmat, i, j) {

}

Now, define a function that will test whether vertices i and j are connected, using an adjacency list. You may find the function %in% useful:


In [4]:
find_adj_list <- function(adj_list, i, j) {

}

Now, define a function that will test whether vertices i and j are connected, using an edge list. You may find the function any useful:


In [5]:
find_edge_list <- function(edge_list, i, j) {

}

Now, define a function that will test whether vertices i and j are connected, using an adjacency forest. You may find the function is.null useful:


In [6]:
find_adj_tree <- function(adj_tree, i, jstr) {

}

This is the simulation code; note that we now have two parameters, "n" and "k" (n is the number of vertices in the graph, and k is the average vertex degree. We'll actually be keeping n fixed and varying k for this exercise.


In [7]:
do_sim <- function(n, k) {

    nrep <- 1
    nsubrep <- 1
    
    simdf <- do.call(rbind,
                     replicate(nrep, {
                         g <- sample_pa(n, out.seq=rep(k, n), directed=FALSE)
                         
                         g_matrix <- as.matrix(as_adjacency_matrix(g))
                         g_adj_list <- as_adj_list(g)
                         g_edge_list <- as_edgelist(g)
                         g_adj_tree <- get_adj_tree(g_adj_list)
                                                 
                         # this is for setting up the (admittedly weird) R way of doing a 
                         # double "for" loop (see "mapply" below)
                         allvals <- expand.grid(1:n, 1:n)
                         
                         # need this because "as.character" is kind of slow
                         jstrs <- as.character(1:n)
                         
                         time_mat <- system.time(
                             replicate(nsubrep, {
                                 mapply(function(i, j) {
                                            find_matrix(g_matrix, i, j)
                                        }, allvals$Var1, allvals$Var2)
                             })
                         )[1]
                         
                         time_adj_list <- system.time(
                             replicate(nsubrep, {
                                 mapply(function(i, j) {
                                            find_adj_list(g_adj_list, i, jstrs[j])
                                        }, allvals$Var1, allvals$Var2)
                             })
                         )[1]
                         
                         time_adjacency_forest <- system.time(
                             replicate(nsubrep, {
                                 mapply(function(i, j) {
                                     find_adj_tree(g_adj_tree, i, jstrs[j])
                                     }, allvals$Var1, allvals$Var2)
                             })                           
                         )[1]
                         
                         rowdf <- data.frame(matrix=time_mat,
                                             adjlist=time_adj_list,
                                             adjforest=time_adjacency_forest)
                         

                         rowdf
                     }, simplify=FALSE)
                     )
    
    # average over replicates
    simres <- apply(simdf, 2, mean)
    
    # get results in microseconds, on a per-vertex-pair basis
    1000000*simres/(n*(n-1)/2)
}

Call the do_sim function for three different values of "k" (the average vertex degree), and convert the resulting list (of single-row data frames) to a three-row data frame:


In [ ]:
sim_data_df <- do.call(rbind, lapply(c(1, 5, 10, 100), 
                                     function(k) {do_sim(1000, k)}))

In [87]:
sim_data_df


matrixadjlistadjforest
11.707708 20.54855 9.641642
9.001001 26.37037 9.545546
9.609610 32.3603612.324324
12.668669211.5635610.978979