Djikstra Shortest Path Algorithm

The file djikstraData.txt contains an adjacency list representation of an undirected weighted graph with 200 vertices labeled 1 to 200. Each row consists of the node tuples that are adjacent to that particular vertex along with the length of that edge. For example, the 6th row has 6 as the first entry indicating that this row corresponds to the vertex labeled 6. The next entry of this row "141,8200" indicates that there is an edge between vertex 6 and vertex 141 that has length 8200. The rest of the pairs of this row indicate the other vertices adjacent to vertex 6 and the lengths of the corresponding edges.

Your task is to run Dijkstra's shortest-path algorithm on this graph, using 1 (the first vertex) as the source vertex, and to compute the shortest-path distances between 1 and every other vertex of the graph. If there is no path between a vertex v and vertex 1, we'll define the shortest-path distance between 1 and v to be 1000000.

You should report the shortest-path distances to the following ten vertices, in order: 7,37,59,82,99,115,133,165,188,197. You should encode the distances as a comma-separated string of integers. So if you find that all ten of these vertices except 115 are at distance 1000 away from vertex 1 and 115 is 2000 distance away, then your answer should be 1000,1000,1000,1000,1000,2000,1000,1000,1000,1000. Remember the order of reporting DOES MATTER, and the string should be in the same order in which the above ten vertices are given. The string should not contain any spaces. Please type your answer in the space provided.

IMPLEMENTATION NOTES: This graph is small enough that the straightforward O(mn) time implementation of Dijkstra's algorithm should work fine. OPTIONAL: For those of you seeking an additional challenge, try implementing the heap-based version. Note this requires a heap that supports deletions, and you'll probably need to maintain some kind of mapping between vertices and their positions in the heap.

**TODO**:

  1. Fix the Remove function for a middle element in a heap
  2. finish the implementation of Djikstra with Heap structure

Djisktra O(mn) algorithm, No Heap


In [3]:
class djikstra(object):
    MAX_WEIGHT = 1000000
    
    def __init__(self, graph, vertices, edges):
        self.graph = graph
        self.vertices = vertices
        self.edges = edges 
        
        self.X = [] # Vertices processed so far
        self.unprocessed_vertices = vertices.copy() # V-X

        self.A = {}  # computed shortest path from source s to key


    def compute_next_min_edge(self):
        minD = djikstra.MAX_WEIGHT
        minE = ""
        minS = ""
        #print ("X", self.X, "V-X", self.unprocessed_vertices )
        
        for edge in self.edges:
            src = edge[0]
            dst = edge[1]
            weight = edge[2]


            if src in self.X and dst in self.unprocessed_vertices:

                d = self.A[src] + weight
                #print ("Consider edge", src, dst, d)

                if d < minD:
                    #print ("add Edge", src, dst, weight)
                    minD = d                    
                    minE = dst
                    minS = src
        
        #print ("final choice", minS, minE, self.A[minS], minD)
        if minE:
            self.A[minE] = minD           
            return minE
        else:
            return None
        
    
    def reinit(self, s):
        self.X = [s]        
        self.unprocessed_vertices = vertices.copy()
        self.unprocessed_vertices.remove(s)
        self.A[s] = 0


    def run(self, s, d):
        self.reinit(s)
        
        n = len(self.vertices)
        v = s
        
        while (n > 0):
            
            w = self.compute_next_min_edge()
            
            if w is None:
                # No more edges between X and V-X to process. Set all other edges to MAX
                #print ("unprocessed", self.unprocessed_vertices)
                for i in self.unprocessed_vertices:
                    self.A[i] = djikstra.MAX_WEIGHT
                    
                break
                
            #print ("pick", w)
            #print ("processed", self.X, self.A)

            self.unprocessed_vertices.remove(w)
            self.X.append(w)
            n -= 1
            
#             if w == d:
#                 break
    
        return self.A

Heap Data Structure


In [400]:
# heap structure. a binary tree where the key values of children is larger than the key value of the parent
        
class myHeapArray(object):
    
    def __init__(self):        
        self.heap = []
        self.elem_idx = {}
        self.idx_to_elem = {}

    """
    Utilities to handle information associated with the tree nodes
    """
    def move_element_info(self, src_i, dst_i):
        """
            Move elements info from src_i index to dst_i index and delete src_i information
            Before:
               elem_idx[src] = src_i
               idx_to_elem[src_i] = src

               elem_idx[dst] = dst_i
               idx_to_elem[dst_i] = dst
               
            After:
               delete elem_idx[src]
               delete idx_to_elem[src_i]
               
               elem_idx[src] = dst_i            
               idx_to_elem[dst_i] = src
               
        """
        src = self.idx_to_elem[src_i]
        
        del self.idx_to_elem[src_i]
        del self.elem_idx[src]

        self.idx_to_elem[dst_i] = src
        self.elem_idx[src] = dst_i

    def add_element_info(self, elem, info):
        self.heap.append(elem)
            
        n = len(self.heap) - 1
        self.elem_idx[info] = n
        self.idx_to_elem[n] = info


    def remove_element_info(self, a_i):
        a = self.idx_to_elem[a_i]
        
        del self.idx_to_elem[a_i]
        del self.elem_idx[a]


    def swap_elements_info(self, a_i, b_i):
        """
            We want to switch elements at index a and index b
            Before:
               elem_idx[a] = a_i
               idx_to_elem[a_i] = a

               elem_idx[b] = b_i            
               elem_idx[b_i] = b
               
            After:
               elem_idx[a] = b_i
               idx_to_elem[a_i] = b

               elem_idx[b] = a_i            
               elem_idx[b_i] = a
        """
        a = self.idx_to_elem[a_i]
        b = self.idx_to_elem[b_i]

        self.elem_idx[a] = b_i
        self.idx_to_elem[a_i] = b

        self.elem_idx[b] = a_i            
        self.idx_to_elem[b_i] = a

    """
    Main Routines for Heap Structure
    """
    def bubbleDown(self, p=None):   
        
        n = len(self.heap)-1    
        
        if not p:
            p = 0
        
        while (2*p <= n):
            c1 = 2*p
               
            if 2*p + 1 <= n: 
                c2 = 2*p + 1
                # Second child exists
                if self.heap[c1] < self.heap[c2]:                
                    c_val, c = (self.heap[c1], c1)
                else:
                    c_val, c = (self.heap[c2], c2)
            else:
                c_val, c = (self.heap[c1], c1)


            # Swap parent with the child with the smallest key value
            if self.heap[p] >  c_val:
                tmp = self.heap[p]
                self.heap[p] = c_val
                self.heap[c] = tmp
                
                # swap the ids
                self.swap_elements_info(p, c)
                
                p = c
                
            else:
                break                 
        
    def extractMin(self):
        if not self.heap:
            return None, None
        
        val = self.heap[0]
        elem = self.idx_to_elem[0]
                
        # copy last value to root, then remove and discard.
        last = len(self.heap) - 1
        self.heap[0] = self.heap[last]        
        self.heap.pop()
        self.move_element_info(last, 0)
        
        self.bubbleDown()
        
        return val,elem
    
    def bubbleUp(self, c=None):
        """
            for parent at node i, children is at node 2*i and 2*i + 1
            for child i, parent is at i/2 if i is even or floor(i/2) if i is odd
        
            Bubble Up starting the newly added key at the end of the heap
        """
        if not c:
            c = len(self.heap) - 1
        while c!= 0:
            p = int(c / 2)
            if self.heap[p] > self.heap[c]:
                tmp = self.heap[p]
                self.heap[p] = self.heap[c]
                self.heap[c] = tmp                
                self.swap_elements_info(p, c)
                c = p
            else:
                break 
                
    def remove(self, info):
        n = len(self.heap) - 1        
        dst_i = self.elem_idx[info]
        
        print ("remove elem {} dst_i {}".format(info, dst_i))

        if dst_i == n:
            self.remove_element_info(n)
            self.heap.pop()
        else:              
            self.heap[dst_i] = self.heap[n]
            
            self.remove_element_info(dst_i)
            self.move_element_info(n, dst_i) 
            self.heap.pop()
            
            self.bubbleUp(dst_i)
         
        
    def insert(self, elem, info):        
        self.add_element_info(elem, info)            
        self.bubbleUp()

    def insertList(self, elemList, elemListInfo):
        for elem, info in zip(elemList, elemListInfo):
            self.add_element_info(elem, info)            
            self.bubbleUp()

    def get_ordered_list(self):
        ordered = []        
        elements = []
        while True:
            n , elem = self.extractMin()
            if n:
                ordered.append(n)
                elements.append(elem)
            else:
                break

        return ordered, elements

Test Heap with info structure


In [416]:
arr = [1, 10, 9, 14, 7 ,19, 24, 16]

#arr =[1,9,11,33,27,21,19,17, 22]

info = [str(i) for i in range(len(arr))]

h = myHeapArray()
h.insertList(arr, info)

print ("arr",arr)
print ("heap", h.heap)
print ("elem_idx", h.elem_idx)
print ("idx_to_elem", h.idx_to_elem)

h.remove("4")
print ("heap", h.heap)
print ("elem_idx", h.elem_idx)
print ("idx_to_elem", h.idx_to_elem)

#sortedHS, infoList = h.get_ordered_list()      
#print ("sorted list with indices from original array", sortedHS, infoList)


arr [1, 10, 9, 14, 7, 19, 24, 16]
heap [1, 7, 9, 14, 10, 19, 24, 16]
elem_idx {'4': 1, '7': 7, '6': 6, '1': 4, '5': 5, '2': 2, '3': 3, '0': 0}
idx_to_elem {0: '0', 1: '4', 2: '2', 3: '3', 4: '1', 5: '5', 6: '6', 7: '7'}
remove elem 4 dst_i 1
heap [1, 16, 9, 14, 10, 19, 24]
elem_idx {'7': 1, '6': 6, '1': 4, '5': 5, '2': 2, '3': 3, '0': 0}
idx_to_elem {0: '0', 1: '7', 2: '2', 3: '3', 4: '1', 5: '5', 6: '6'}

Djikstra with Heap


In [12]:
class djikstraHEAP_old(object):
    MAX_WEIGHT = 1000000
    
    def __init__(self, graph, vertices, edges):
        self.graph = graph
        self.vertices = vertices
        self.edges = edges 
        
        self.X = [] # Vertices processed so far
        self.unprocessed_vertices = vertices.copy() # V-X

        self.A = {}  # computed shortest path from source s to key


    def compute_next_min_edge(self):
        minD = djikstra.MAX_WEIGHT
        minE = ""
        minS = ""

        minD, minE = self.heap.extractMin()

        
    
    def reinit(self, s):
        self.X = [s]        
        self.unprocessed_vertices = vertices.copy()
        self.unprocessed_vertices.remove(s)
        self.A[s] = 0
        
        self.heap = myHeapArray() # heap over vertices in V-X
        for v in self.unprocessed_vertices:
            src = v[0]
            dst = v[1]
            w = v[2]            
            if src == s:                
                self.heap.insert(w, elem=dst)


    def run(self, s, d):
        self.reinit(s)
        
        n = len(self.vertices)
        v = s
        
        while (n > 0):
            
            w = self.compute_next_min_edge()
            
            if w is None:
                # No more edges between X and V-X to process. Set all other edges to MAX
                #print ("unprocessed", self.unprocessed_vertices)
                for i in self.unprocessed_vertices:
                    self.A[i] = djikstra.MAX_WEIGHT
                break
                
            #print ("pick", w)
            #print ("processed", self.X, self.A)

            self.unprocessed_vertices.remove(w)
            self.X.append(w)
            n -= 1
            
#             if w == d:
#                 break
    
        return self.A   
    
    
def get_edges(graph):
    edges = []
    for s, adj in graph.items():
        for v in adj:
            edges.append([s, v[0], v[1]])
    return edges
        
graph ={
    "1": [["2",1], ["3", 4]],
    "2": [["3", 2], ["4",6]],
    "3": [["4",3]],
    "4": []
}

vertices = ["1", "3", "2", "4"]
edges = get_edges(graph)

d = djikstraHEAP_old(graph, vertices, edges)

print(d.run("1", "4"))


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-12-a4a3a9209e25> in <module>()
     86 d = djikstraHEAP_old(graph, vertices, edges)
     87 
---> 88 print(d.run("1", "4"))

<ipython-input-12-a4a3a9209e25> in run(self, s, d)
     38 
     39     def run(self, s, d):
---> 40         self.reinit(s)
     41 
     42         n = len(self.vertices)

<ipython-input-12-a4a3a9209e25> in reinit(self, s)
     28         self.A[s] = 0
     29 
---> 30         self.heap = myHeapArray() # heap over vertices in V-X
     31         for v in self.unprocessed_vertices:
     32             src = v[0]

NameError: name 'myHeapArray' is not defined

2nd implementation with heap


In [13]:
import heapq 
import itertools 
import numpy as np

class djikstraHEAP(object):
    MAX_WEIGHT = np.inf
    
    def __init__(self, graph, vertices, edges):
        self.graph = graph
        self.vertices = vertices
        self.edges = edges 
        
        self.X = [] # Vertices processed so far
        self.unprocessed_vertices = vertices.copy() # V-X

        self.A = dict()  # computed shortest path from source s to key

    #### Init Heap Functions
    def init_heap(self):
        
        self.pq = []                         # list of entries arranged in a heap
        self.entry_finder = {}               # mapping of tasks to entries
        self.REMOVED = '<removed-task>'      # placeholder for a removed task
        self.counter = itertools.count()     # unique sequence count

    def add_task(self, task, priority=0):
        'Add a new task or update the priority of an existing task'
        if task in self.entry_finder:
            self.remove_task(task)
        count = next(self.counter)
        entry = [priority, count, task]
        self.entry_finder[task] = entry
        heapq.heappush(self.pq, entry)

    def remove_task(self, task):
        'Mark an existing task as REMOVED.  Raise KeyError if not found.'
        entry = self.entry_finder.pop(task)
        entry[-1] = self.REMOVED

    def pop_task(self):
        'Remove and return the lowest priority task. Raise KeyError if empty.'
        while self.pq:
            priority, count, task = heapq.heappop(self.pq)
            if task is not self.REMOVED:            
                del self.entry_finder[task]
                return task, priority
        return None, None   
#         raise KeyError('pop from an empty priority queue')

    #####
    
    def compute_next_min_edge(self):
        minD = djikstraHEAP.MAX_WEIGHT
        minE = ""
        minS = ""

#         minD, minE = heapq.heappop(self.heap)
        minE, minD = self.pop_task() # returns closest edge to X task, priority 
#         print ("Next Min Edge:", minE, minD)
        
        if not minE:
            return None
        
        for edges in self.graph[minE]:
            u = edges[0]
            w = edges[1]
            
            if u in self.X:
                # don't handle destination that are alreayd in X
                continue
                
            elif u not in self.A:
                self.add_task(u, w)
                self.A[u] = self.A[minE] + w
                
            elif (self.A[u] > self.A[minE] + w):
                self.A[u] = self.A[minE] + w
                
#                 heapq.decrease-key(self.heap, )
                self.remove_task(u)
                self.add_task(u, self.A[u])

        return minE
        
    
    def reinit(self, s):
        self.X = [s]        
        self.unprocessed_vertices = self.vertices.copy()
        self.unprocessed_vertices.remove(s)
        self.A[s] = 0
        
#         self.heap = [] # minheap for the edges out of X 
        self.init_heap()
    
        for v in self.edges:
            src = v[0]
            dst = v[1]
            w = v[2]            
            if src == s:                
#                 heapq.heappush(self.heap, (w, dst))
                self.add_task(dst, w)
                self.A[dst] = w

#         print ("self.pq", self.pq)
        
    def run(self, s):
        self.reinit(s)
        
        n = len(self.vertices)
        v = s
        
        while (n > 0):
            
            w = self.compute_next_min_edge()
            
            if w is None:
                # No more edges between X and V-X to process. Set all other edges to MAX
                #print ("unprocessed", self.unprocessed_vertices)
                for i in self.unprocessed_vertices:
                    self.A[i] = djikstraHEAP.MAX_WEIGHT
                break
                
            #print ("pick", w)
            #print ("processed", self.X, self.A)

            self.unprocessed_vertices.remove(w)
            self.X.append(w)
            n -= 1
            
    
        return self.A   
    
    
def get_edges(graph):
    edges = []
    for s, adj in graph.items():
        for v in adj:
            edges.append([s, v[0], v[1]])
    return edges
        
graph ={
    "1": [["2",1], ["3", 4]],
    "2": [["3", 2], ["4",6]],
    "3": [["4",3]],
    "4": []
}

vertices = ["1", "3", "2", "4"]
edges = get_edges(graph)

d = djikstraHEAP(graph, vertices, edges)

print(d.run("1"))


{'2': 1, '1': 0, '3': 3, '4': 6}

Test Djikstra with no Heap


In [5]:
#Example
def get_edges(graph):
    edges = []
    for s, adj in graph.items():
        for v in adj:
            edges.append([s, v[0], v[1]])
    return edges
        
graph ={
    "1": [["2",1], ["3", 4]],
    "2": [["3", 2], ["4",6]],
    "3": [["4",3]],
    "4": []
}

vertices = ["1", "3", "2", "4"]
edges = get_edges(graph)

d = djikstra(graph, vertices, edges)

print(d.run("1", "4"))


{'2': 1, '1': 0, '3': 3, '4': 6}

In [16]:
FILE = "dijkstraData.txt"

fp = open(FILE, 'r')

graph = {}
vertices = set()
edges = []

for line in fp.readlines():
    v = line.strip().split("\t")
    
    vertices.add(v[0])
    graph.setdefault(v[0], [])
    
    for u in v[1:]:
        t = u.split(",")
        
        graph[v[0]].append([t[0], int(t[1])])
        edges.append([v[0], t[0], int(t[1])])
    
print ("Vertex 1 adj:", graph["1"])
print ("First 5 Edges:", edges[:5])


Vertex 1 adj: [['80', 982], ['163', 8164], ['170', 2620], ['145', 648], ['200', 8021], ['173', 2069], ['92', 647], ['26', 4122], ['140', 546], ['11', 1913], ['160', 6461], ['27', 7905], ['40', 9047], ['150', 2183], ['61', 9146], ['159', 7420], ['198', 1724], ['114', 508], ['104', 6647], ['30', 4612], ['99', 2367], ['138', 7896], ['169', 8700], ['49', 2437], ['125', 2909], ['117', 2597], ['55', 6399]]
First 5 Edges: [['1', '80', 982], ['1', '163', 8164], ['1', '170', 2620], ['1', '145', 648], ['1', '200', 8021]]

In [17]:
d = djikstra(graph, vertices, edges)

d.run("1", "7")
print (d.A["7"])

res = []
for k in ["7","37","59","82","99","115","133","165","188","197"]:
#     print (d.A[k])
    res.append(str(d.A[k]))

print (",".join(res))


2599
2599,2610,2947,2052,2367,2399,2029,2442,2505,3068

Djisktra Implementation with Heap


In [18]:
d = djikstraHEAP(graph, vertices, edges)

d.run("1")
print (d.A["7"])

res = []
for k in ["7","37","59","82","99","115","133","165","188","197"]:
#     print (d.A[k])
    res.append(str(d.A[k]))

print (",".join(res))


2599
2599,2610,2947,2052,2367,2399,2029,2442,2505,3068

In [220]:
# compare to SelectionSort

arr = [1, 10, 5, 7, 20, 3, 100, 99, 45, 13]

def InsertSort(arr):
    arr = arr.copy()
    
    n = len(arr)
    i=1
    while i < n:
        val = arr[i]
        for j in range(0, i):
            if arr[j] >= val:
                # insert at location j, and shift all by one                
                arr[j+1:i+1] = arr[j:i]                
                arr[j]=val
                break
           
        i += 1
    return arr

s = datetime.now()
sortedArray = InsertSort(arr)
print (sortedArray)


[1, 3, 5, 7, 10, 13, 20, 45, 99, 100]

In [332]:
import numpy as np
arr = [1, 10, 5, 7, 20, 3, 100, 99, 45, 13]

N = 10 
arr = np.random.randint(0, 10000, size=N)

s = datetime.now()
h = myHeapArray()
h.insertList(arr)
sortedHS = h.get_ordered_list()      
e = datetime.now()
print ("heapSort(ms)", (e-s).microseconds / 1000, "ms")
        
s = datetime.now()
sortedIS = InsertSort(arr)
e = datetime.now()
print ("InsertSort(ms)", (e-s).microseconds / 1000, "ms")


arrsort = arr.copy()
arrsort.sort()

assert sum(sortedHS == arrsort) == N, "Don't match! HS {} arr {}, orig {}".format(sortedHS, arrsort, arr) 
assert sum(sortedIS == arrsort) == N, "Don't match! HS {} arr {}, orig {}".format(sortedHS, arrsort, arr)


heapSort(ms) 0.201 ms
InsertSort(ms) 0.295 ms

In [253]:



---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-253-c32040281844> in <module>()
----> 1 sorted(arr)

TypeError: 'list' object is not callable

In [ ]: