Class Session 4 Exercise:

Comparing asymptotic running time for enumerating neighbors of all vertices in a graph

We will measure the running time for enumerating the neighbor vertices for three different data structures for representing an undirected graph: adjacency matrix adjacency list edge list

Let's assume that each vertex is labeled with a unique integer number. So if there are N vertices, the vertices are labeled 0, 2, 3, 4, ..., N-1.

First, we will import all of the Python modules that we will need for this exercise: note how we assign a short name, "np" to the numpy module. This will save typing.


In [1]:
import numpy as np
import igraph
import timeit
import itertools

Now, define a function that returns the index numbers of the neighbors of a vertex i, when the graph is stored in adjacency matrix format. So your function will accept as an input a NxN numpy matrix.


In [2]:
def enumerate_matrix(gmat, i):
    return np.nonzero(gmat[i,:])[1].tolist()

Define a function that enumerates the neighbors of a vertex i, when the graph is stored in adjacency list format (a list of lists):


In [3]:
def enumerate_adj_list(adj_list, i):
    return adj_list[i]

Define a function that enumerates the neighbors of a vertex i, when the graph is stored in edge-list format (a numpy array of length-two-lists); use numpy.where and numpy.unique:


In [4]:
def enumerate_edge_list(edge_list, i):
    inds1 = np.where(edge_list[:,0] == i)[0]
    elems1 = edge_list[inds1, 1].tolist()
    inds2 = np.where(edge_list[:,1] == i)[0]
    elems2 = edge_list[inds2, 0].tolist()
    return np.unique(elems1 + elems2).tolist()

This next function is the simulation funtion. "n" is the number of vertices. It returns a length-three list containing the average running time for enumerating the neighbor vertices of a vertex in the graph.


In [5]:
def do_sim(n):

    retlist = []
    
    nrep = 10
    nsubrep = 10
    
    # this is (sort of) a Python way of doing the R function "replicate":
    for _ in itertools.repeat(None, nrep):
        
        # make a random undirected graph with fixed (average) vertex degree = 5
        g = igraph.Graph.Barabasi(n, 5)
        
        # get the graph in three different representations
        g_matrix = np.matrix(g.get_adjacency().data)
        g_adj_list = g.get_adjlist()
        g_edge_list = np.array(g.get_edgelist())
    
        start_time = timeit.default_timer()
    
        for _ in itertools.repeat(None, nsubrep):
            for i in range(0, n):
                enumerate_matrix(g_matrix, i)
    
        matrix_elapsed = timeit.default_timer() - start_time
        
        start_time = timeit.default_timer()
        
        for _ in itertools.repeat(None, nsubrep):
             for i in range(0, n):
                enumerate_adj_list(g_adj_list, i)        
        
        adjlist_elapsed = timeit.default_timer() - start_time
        
        start_time = timeit.default_timer()
        
        for _ in itertools.repeat(None, nsubrep):
             for i in range(0, n):
                enumerate_edge_list(g_edge_list, i)        
        
        edgelist_elapsed = timeit.default_timer() - start_time
        
        retlist.append([matrix_elapsed, adjlist_elapsed, edgelist_elapsed])

        # average over replicates and then
        # divide by n so that the running time results are on a per-vertex basis
    return np.mean(np.array(retlist), axis=0)/n

A simulation with 1000 vertices clearly shows that adjacency list is fastest: (I multiply by 1000 just so the results are in ms.)


In [6]:
do_sim(1000)*1000


Out[6]:
array([ 0.13381058,  0.00126239,  0.36279364])

We see the expected behavior, with the running time for the adjacency-matrix and edge-list formats going up when we increase "n", but there is hardly any change in the running time for the graph stored in adjacency list format:


In [7]:
do_sim(2000)*1000


Out[7]:
array([ 0.20205601,  0.00127165,  0.51099831])

In [ ]: