In [1]:
%load_ext watermark
%watermark -a 'Sebastian Raschka' -u -d -v


Sebastian Raschka 
last updated: 2017-08-06 

CPython 3.6.1
IPython 6.1.0

Breadth-First Search

Breadt-first search (BFS) is algorithm that can find the closest members in a graph that match a certain search criterion.

BFS requires that we model our problem as a graph (nodes connected through edges). BFS can be applied to directed and undirected graph, where it can be applied to answer to types of question:

  1. Is there are connection between a particular pair of nodes?
  2. Which is the closest node to a given node that satisfies a certain criterion?

To answer these questions, BFS starts by checking all direct neighbors of a given node -- neighbors are nodes that have a direct connection to a particular node. Then, if none of those neighbors satisfies the criterion that we are looking for, the search is expanded to the neighbors of the nodes we just checked, and so on, until a match is found or all nodes in the graph were checked.

To keep track of the nodes that we have already checked and that we are going to check, we need two additional data structures:

1) A hash table to keep track of nodes we have already checked. If we don't check for previously checked nodes, we may end up in cycles depending on the structure of the graph.

2) A queue that stores the items to be checked.

Representing the graph

To represent the graph, its nodes and edges, we can simply use a hash table such as Python's built-in dictionaries. Imagine we have an undirected, social network graph that lists our direct friends (Elijah, Marissa, Nikolai) and friends of friends:

Say we are going to move to a new apartment next weekend, and we want to ask our friends if they have a pick-up truck that can be helpful in this endeavor. First, we would reach out to our directed friends (or 1st degree connections). If none of these have a pick-up truck, we ask them to ask their 1st degree connections (which are our 2nd degree connections), and so forth:

We can represent such a graph using a simple hash table (here: Python dictionary) as follows:


In [2]:
graph = {}

graph['You'] = ['Elijah', 'Marissa', 'Nikolai', 'Cassidy']
graph['Elijah'] = ['You']
graph['Marissa'] = ['You']
graph['Nikolai'] = ['John', 'Thomas', 'You']
graph['Cassidy'] = ['John', 'You']
graph['John'] = ['Cassidy', 'Nikolai']
graph['Thomas'] = ['Nikolai', 'Mario']
graph['Mario'] = ['Thomas']

The Queue data structure

Next, let's setup a simple queue data structure. Of course, we can also use a regular Python list like a queue (using .insert(0, x) and .pop(), but this way, our breadth-first search implementation is maybe more illustrative. For more information about queues, please see the Queues and Deques notebook.


In [3]:
class QueueItem():
    def __init__(self, value, pointer=None):
        self.value = value
        self.pointer = pointer

class Queue():
    def __init__(self):
        self.last = None
        self.first = None
        self.length = 0
    
    def enqueue(self, value):
        item = QueueItem(value, None)
        if self.last:
            self.last.pointer = item
        if not self.first:
            self.first = item
        self.last = item
        self.length += 1
    
    def dequeue(self):
        if self.first is not None:
            value = self.first.value
            self.first = self.first.pointer
            self.length -= 1
        else:
            value = None
        return value

In [4]:
qe = Queue()
qe.enqueue('a')

print('First element:', qe.first.value)
print('Last element:', qe.last.value)
print('Queue length:', qe.length)


First element: a
Last element: a
Queue length: 1

In [5]:
qe.enqueue('b')

print('First element:', qe.first.value)
print('Last element:', qe.last.value)
print('Queue length:', qe.length)


First element: a
Last element: b
Queue length: 2

In [6]:
qe.enqueue('c')

print('First element:', qe.first.value)
print('Last element:', qe.last.value)
print('Queue length:', qe.length)


First element: a
Last element: c
Queue length: 3

In [7]:
val = qe.dequeue()

print('Dequeued value:', val)
print('Queue length:', qe.length)


Dequeued value: a
Queue length: 2

In [8]:
val = qe.dequeue()

print('Dequeued value:', val)
print('Queue length:', qe.length)


Dequeued value: b
Queue length: 1

In [9]:
val = qe.dequeue()

print('Dequeued value:', val)
print('Queue length:', qe.length)


Dequeued value: c
Queue length: 0

In [10]:
val = qe.dequeue()

print('Dequeued value:', val)
print('Queue length:', qe.length)


Dequeued value: None
Queue length: 0

In [11]:
qe.enqueue('c')

print('First element:', qe.first.value)
print('Last element:', qe.last.value)
print('Queue length:', qe.length)


First element: c
Last element: c
Queue length: 1

Implementing freadth-first search to find the shortest path

Now, back to the graph, where we want to identify the closest connection that owns a truck, which can be helpful for moving (if we are allowed to borrow it, that is):


In [12]:
graph = {}

graph['You'] = ['Elijah', 'Marissa', 'Nikolai', 'Cassidy']
graph['Elijah'] = ['You']
graph['Marissa'] = ['You']
graph['Nikolai'] = ['John', 'Thomas', 'You']
graph['Cassidy'] = ['John', 'You']
graph['John'] = ['Cassidy', 'Nikolai']
graph['Thomas'] = ['Nikolai', 'Mario']
graph['Mario'] = ['Thomas']

For simplicity, let's assume we have function that checks if a person ows a pick-up truck. (Say, Mario owns a pick-up truck, the check function knows it but we don't know it.)


In [13]:
def has_truck(person):
    if person == 'Mario':
        return True
    else:
        return False

Now, the breadth_first_search implementation below will check our closest neighbors first, then, it will check the neighnors of our neighbors and so forth. We will make use both of the graph we constructed and the Queue data structure that we implemented. Also, note that we are keeping track of people we already checked to prevent cycles in our search:


In [14]:
def breadth_first_search(graph):

    # initialize queue
    queue = Queue()
    for person in graph['You']:
        queue.enqueue(person)

    people_checked = set()
    degree = 0
    
    while queue.length:
        
        person = queue.dequeue()

        if has_truck(person):
            return person
        else:
            degree += 1
            people_checked.add(person)
            for next_person in graph[person]:
                # check to prevent endless cycles
                if next_person not in people_checked:
                    queue.enqueue(next_person)

In [15]:
breadth_first_search(graph)


Out[15]:
'Mario'