Exploring the Labyrinth

Chapter 2 of Real World Algorithms.


Panos Louridas
Athens University of Economics and Business

Graphs in Python

The most common way to represent graphs in Python is with adjacency lists.

The adjacency lists are put into a Python dictionary.

The keys of the dictionary are the nodes, and the value for each node is its adjacency list.

With a slight abuse of terminology, we could use other data structures instead of a list to represent an adjacency list: for example, a set is a sensible choice, as we don't care about the order of the items in the list, and checking for membership (i.e., checking if a node is a neighbor of another node) is much faster in a set than in a list. In a well-implemented set it takes constant time, while in a list the time is linear and depends on the length of the list.

For example, here is a graph with 8 nodes (from 0 to 7) and its adjacency lists, represented as lists:


In [1]:
g = {
    0: [1, 2, 3],
    1: [0, 4],
    2: [0],
    3: [0, 5],
    4: [1, 5],
    5: [3, 4, 6, 7],
    6: [5],
    7: [5],
}

# print whole graph
print(g)

# print adjacency list of node 0
print(g[0])

# print adjacency list of node 5
print(g[5])


{0: [1, 2, 3], 1: [0, 4], 2: [0], 3: [0, 5], 4: [1, 5], 5: [3, 4, 6, 7], 6: [5], 7: [5]}
[1, 2, 3]
[3, 4, 6, 7]

Similarly, here is the same graph, but this time the nodes are strings (single-character strings, which are still strings in Python).

Nodes can be anything: numbers, strings, or anything else that can be used as a key in a Python dictionary.


In [2]:
g = {
    'a': ['b', 'c', 'd'],
    'b': ['a', 'e'],
    'c': ['a'],
    'd': ['a', 'f'],
    'e': ['b', 'f'],
    'f': ['d', 'e', 'g', 'h'],
    'g': ['f'],
    'h': ['f'],
}

# print whole graph
print(g)

# print adjacency list of node 'a'
print(g['a'])

# print adjacency list of node 'e'
print(g['e'])


{'a': ['b', 'c', 'd'], 'b': ['a', 'e'], 'c': ['a'], 'd': ['a', 'f'], 'e': ['b', 'f'], 'f': ['d', 'e', 'g', 'h'], 'g': ['f'], 'h': ['f']}
['b', 'c', 'd']
['b', 'f']

Depth-first Search

Suppose we have the following graph and we want to explore it depth-first.

In depth-first search, we follow a path as far as we can; when we reach a dead-end, that is, a node with no-unvisited neighbours, we backgrack to the previous unvisited node.

The graph is represented in Python as follows:


In [3]:
g = {
    0: [1, 2, 3],
    1: [0, 4],
    2: [0, 4],
    3: [0, 5],
    4: [5],
    5: [4, 6, 7],
    6: [],
    7: []
}

The depth-first recursive search algorithm is then simply:


In [4]:
visited = [ False ] * len(g)

def dfs(g, node):
    print("Visiting", node)
    visited[node] = True
    for v in g[node]:
        if not visited[v]:
            dfs(g, v)
            
dfs(g, 0)


Visiting 0
Visiting 1
Visiting 4
Visiting 5
Visiting 6
Visiting 7
Visiting 2
Visiting 3

It is possible to implement depth-first search without recursion.

To do that, we have to emulate recursion ourselves, by using a stack.


In [5]:
def dfs_stack(g, node):
    s = []
    visited = [ False ] * len(g)

    s.append(node)
    while len(s) != 0:
        print("Stack", s)
        c = s.pop()
        print("Visiting", c)
        visited[c] = True
        for v in g[c]:
            if not visited[v]:
                s.append(v)
    return visited

dfs_stack(g, 0)


Stack [0]
Visiting 0
Stack [1, 2, 3]
Visiting 3
Stack [1, 2, 5]
Visiting 5
Stack [1, 2, 4, 6, 7]
Visiting 7
Stack [1, 2, 4, 6]
Visiting 6
Stack [1, 2, 4]
Visiting 4
Stack [1, 2]
Visiting 2
Stack [1]
Visiting 1
Out[5]:
[True, True, True, True, True, True, True, True]

The stack-based depth-first search may insert a node in the stack multiple times.

For example, consider the following graph:

The graph is represented as follows:


In [6]:
g2 = {
  0: [1, 2, 3],
  1: [0, 4],
  2: [0],
  3: [0, 5],
  4: [1, 5],
  5: [3, 4, 6, 7],
  6: [5],
  7: [5]
}

Then we can traverse with with the stack-based version of depth-first search:


In [7]:
dfs_stack(g2, 0)


Stack [0]
Visiting 0
Stack [1, 2, 3]
Visiting 3
Stack [1, 2, 5]
Visiting 5
Stack [1, 2, 4, 6, 7]
Visiting 7
Stack [1, 2, 4, 6]
Visiting 6
Stack [1, 2, 4]
Visiting 4
Stack [1, 2, 1]
Visiting 1
Stack [1, 2]
Visiting 2
Stack [1]
Visiting 1
Out[7]:
[True, True, True, True, True, True, True, True]

You may notice that node 1 enters the stack twice.

That does not affect the correctness of the algorithm, as the algorithm will explore the whole graph, but we can fix it anyway.

One way to fix it would be to search in the stack and if the node is already there, we would not put it.

However, searching in a list takes place in linear time, depending on the length of the list.

It is faster to keep a separate structure in which we record if something is in the stack.

That requires more space: an instance of speed-space trade-off.


In [8]:
def dfs_nd_stack(g, node):
    s = []
    visited = [ False ] * len(g)
    instack = [ False ] * len(g)

    s.append(node)
    instack[node] = True
    while len(s) != 0:
        print("Stack", s)
        c = s.pop()
        instack[c] = False
        print("Visiting", c)
        visited[c] = True
        for v in g[c]:
            if not visited[v] and not instack[v]:
                s.append(v)
                instack[v] = True
    return visited
  
dfs_nd_stack(g2, 0)


Stack [0]
Visiting 0
Stack [1, 2, 3]
Visiting 3
Stack [1, 2, 5]
Visiting 5
Stack [1, 2, 4, 6, 7]
Visiting 7
Stack [1, 2, 4, 6]
Visiting 6
Stack [1, 2, 4]
Visiting 4
Stack [1, 2]
Visiting 2
Stack [1]
Visiting 1
Out[8]:
[True, True, True, True, True, True, True, True]

Breadth-first Search

In breadth-first search we visit all neighbours of a node, then all the neighbours of the neighbours, and so on.

The exploration is like a ripple spreading outwards.

We can implement breadth-first search using a First-In First-Out (FIFO) queue; in Python this is provided by collections.deque.


In [9]:
from collections import deque 

g = {
    0: [1, 2, 3],
    1: [0, 4],
    2: [0, 4],
    3: [0, 5],
    4: [5],
    5: [4, 6, 7],
    6: [],
    7: []
}

def bfs(g, node):
    
    q = deque()
    
    visited = [ False ] * len(g)
    inqueue = [ False ] * len(g)
    
    q.appendleft(node)
    inqueue[node] = True
    
    while not (len(q) == 0):
        print("Queue", q)
        c = q.pop()
        print("Visiting", c)
        inqueue[c] = False
        visited[c] = True
        for v in g[c]:
            if not visited[v] and not inqueue[v]:
                q.appendleft(v)
                inqueue[v] = True

    
bfs(g, 0)


Queue deque([0])
Visiting 0
Queue deque([3, 2, 1])
Visiting 1
Queue deque([4, 3, 2])
Visiting 2
Queue deque([4, 3])
Visiting 3
Queue deque([5, 4])
Visiting 4
Queue deque([5])
Visiting 5
Queue deque([7, 6])
Visiting 6
Queue deque([7])
Visiting 7

Reading a Graph from a File

Usually we read graphs from files, typically text files.

A common way to store graphs is in text files where each line contains a link between two nodes.

For example, the file containing the first graph we saw would be:

0 1
0 2
0 3
1 0
1 4
2 0
2 4
3 0
3 5
4 5
5 4
5 6
5 7

To read this file we would go line-by-line.

We would split each line on whitespace.

We would then get the two parts and treat them as nodes.

Note that we assume that the nodes are integers, so we convert the split pieces with int(x). If nodes were strings, this would not be required.

We also assume that the graph is directed.

The following example will read file example_graph_1.txt, the directed graph we used for the depth-first example, which has the following contents:

0 1
0 2
0 3
1 0
1 4
2 0
2 4
3 0
3 5
4 5
5 4
5 6
5 7

In [10]:
input_filename = "example_graph_1.txt"

g = {}

with open(input_filename) as graph_input:
    for line in graph_input:
        # Split line and convert line parts to integers.
        nodes = [int(x) for x in line.split()]
        if len(nodes) != 2:
            continue
        # If a node is not already in the graph
        # we must create a new empty list.
        if nodes[0] not in g:
            g[nodes[0]] = []
        if nodes[1] not in g:
            g[nodes[1]] = []
        # We need to append the "to" node
        # to the existing list for the "from" node.
        g[nodes[0]].append(nodes[1])

print(g)


{0: [1, 2, 3], 1: [0, 4], 2: [0, 4], 3: [0, 5], 4: [5], 5: [4, 6, 7], 6: [], 7: []}

Printing a graph like that is not very convenient.

Python offers the pprint (pretty-print) library that can help us output stuff in a more meaningful manner.


In [11]:
import pprint

pprint.pprint(g)


{0: [1, 2, 3],
 1: [0, 4],
 2: [0, 4],
 3: [0, 5],
 4: [5],
 5: [4, 6, 7],
 6: [],
 7: []}

For undirected graphs, the code is pretty much the same; we only need to take care to enter the edge $(v, u)$ for every edge $(u, v)$ that we meet in the file.

Here is the equivalent for the file example_graph_2.txt, which is the undirected graph we used for depth-first search.


In [12]:
input_filename = "example_graph_2.txt"

g = {}

with open(input_filename) as graph_input:
    for line in graph_input:
        # Split line and convert line parts to integers.
        nodes = [int(x) for x in line.split()]
        if len(nodes) != 2:
            continue
        # If a node is not already in the graph
        # we must create a new empty list.
        if nodes[0] not in g:
            g[nodes[0]] = []
        if nodes[1] not in g:
            g[nodes[1]] = []
        # We need to append the "to" node
        # to the existing list for the "from" node.
        g[nodes[0]].append(nodes[1])
        # And also the other way round.
        g[nodes[1]].append(nodes[0])

pprint.pprint(g)


{0: [1, 2, 3],
 1: [0, 4],
 2: [0],
 3: [0, 5],
 4: [1, 5],
 5: [3, 4, 6, 7],
 6: [5],
 7: [5]}

Real Graph Processing in Python

In a real program we would not use our own hand-crafted code for handling graphs in Python.

we would use instead use a respected library, used by many developers around the world, and optimized over time.

For that purpose, check the NetworkX library.