We will measure the running time for enumerating the neighbor vertices for three different data structures for representing an undirected graph: adjacency matrix adjacency list edge list
Let's assume that each vertex is labeled with a unique integer number. So if there are N vertices, the vertices are labeled 0, 2, 3, 4, ..., N-1.
First, we will import all of the Python modules that we will need for this exercise: note how we assign a short name, "np" to the numpy module. This will save typing.
In [1]:
import numpy as np
import igraph
import timeit
import itertools
Now, define a function that returns the index numbers of the neighbors of a vertex i, when the graph is stored in adjacency matrix format. So your function will accept as an input a NxN numpy matrix.
In [2]:
def enumerate_matrix(gmat, i):
return np.nonzero(gmat[i,:])[1].tolist()
Define a function that enumerates the neighbors of a vertex i, when the graph is stored in adjacency list format (a list of lists):
In [3]:
def enumerate_adj_list(adj_list, i):
return adj_list[i]
Define a function that enumerates the neighbors of a vertex i, when the
graph is stored in edge-list format (a numpy array of length-two-lists); use numpy.where
and numpy.unique
:
In [4]:
def enumerate_edge_list(edge_list, i):
inds1 = np.where(edge_list[:,0] == i)[0]
elems1 = edge_list[inds1, 1].tolist()
inds2 = np.where(edge_list[:,1] == i)[0]
elems2 = edge_list[inds2, 0].tolist()
return np.unique(elems1 + elems2).tolist()
This next function is the simulation funtion. "n" is the number of vertices. It returns a length-three list containing the average running time for enumerating the neighbor vertices of a vertex in the graph.
In [5]:
def do_sim(n):
retlist = []
nrep = 10
nsubrep = 10
# this is (sort of) a Python way of doing the R function "replicate":
for _ in itertools.repeat(None, nrep):
# make a random undirected graph with fixed (average) vertex degree = 5
g = igraph.Graph.Barabasi(n, 5)
# get the graph in three different representations
g_matrix = np.matrix(g.get_adjacency().data)
g_adj_list = g.get_adjlist()
g_edge_list = np.array(g.get_edgelist())
start_time = timeit.default_timer()
for _ in itertools.repeat(None, nsubrep):
for i in range(0, n):
enumerate_matrix(g_matrix, i)
matrix_elapsed = timeit.default_timer() - start_time
start_time = timeit.default_timer()
for _ in itertools.repeat(None, nsubrep):
for i in range(0, n):
enumerate_adj_list(g_adj_list, i)
adjlist_elapsed = timeit.default_timer() - start_time
start_time = timeit.default_timer()
for _ in itertools.repeat(None, nsubrep):
for i in range(0, n):
enumerate_edge_list(g_edge_list, i)
edgelist_elapsed = timeit.default_timer() - start_time
retlist.append([matrix_elapsed, adjlist_elapsed, edgelist_elapsed])
# average over replicates and then
# divide by n so that the running time results are on a per-vertex basis
return np.mean(np.array(retlist), axis=0)/n
A simulation with 1000 vertices clearly shows that adjacency list is fastest: (I multiply by 1000 just so the results are in ms.)
In [6]:
do_sim(1000)*1000
Out[6]:
We see the expected behavior, with the running time for the adjacency-matrix and edge-list formats going up when we increase "n", but there is hardly any change in the running time for the graph stored in adjacency list format:
In [7]:
do_sim(2000)*1000
Out[7]:
In [ ]: