In [ ]:
# Assignment
For this assignment you will calculate and plot the distribution of the path lengths of a graph. First we will generate the graphs:
In [3]:
import networkx as nx
import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
from collections import deque
%matplotlib inline
#random_graph = nx.erdos_renyi_graph(10, 0.6)
random_graph = nx.erdos_renyi_graph(1200, 0.008)
print(nx.info(random_graph))
Let's write an algorithm that will calculate the shortest paths between all the nodes and then place the lengths of those shortest paths in a sequence (such as a list or array).
The first task will be creating a function that calculate the shortest path lengths from a single node to everyone else, like the following:
In [6]:
def all_shortest_path_lengths_from(G, node):
"""
BSF implementation copied from MIT Open Courseware Lecture 13, Breadth-First Search (BSF):
https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-006-introduction-to-algorithms-fall-2011/lecture-videos/lecture-13-breadth-first-search-bfs
"""
level = {node: 0}
parent = {node: None}
i = 1
frontier = [node]
while frontier:
next_frontier = []
for u in frontier:
for v in G[u]:
if v not in level:
level[v] = i
parent[v] = u
next_frontier.append(v)
frontier = next_frontier
i += 1
return level
def all_shortest_path_lengths_from_deque(G, node):
"""
BSF implementation adapted from Grokking algorithms, Adity Y. Bhargava. Manning Publications, 2016.
Uses the deque collection object to store the search queue.
I added a for-loop to the original implementation in order to properly record the level of each
cluster of neighbors.
"""
search_queue = deque([node])
current = deque()
searched = {node}
level = {node: 0}
parent = {node: None}
i = 1
while search_queue:
current = deque()
for neighbor in search_queue:
for n in G[neighbor]:
if not n in searched:
level[n] = i
parent[n] = neighbor
current.append(n)
searched.add(n)
search_queue = current
node = neighbor
i += 1
return level
In [37]:
p = all_shortest_path_lengths_from_deque(random_graph, 0)
print(p)
You need a queue that stores the nodes to visit at the next hop. You pull one from this queue, say i
, assign the current distance, and then go through each neighbor of i
and see whether we have seen the node before or not. If we have seen it, we can ignore it (we already know the shortest distance). If not, we should put this node to the queue. You may want to have two queues (or lists or sets) for the "current" level and the "next" level of the search.
After getting the single-source shortest path algorithm right, you can apply it to every node in the graph and obtain the list of shortest path length values.
In [78]:
def all_shortest_path_lengths(G):
"""Creates a list of path lengths starting from each node in the graph."""
paths = deque()
nodes = deque()
processed = deque()
for node in G:
nodes = all_shortest_path_lengths_from_deque(G, node)
processed.append(node)
for k, v in nodes.items():
if not k in processed:
paths.append(v)
return paths
Apply your algorithm to the random_graph
.
In [80]:
paths = all_shortest_path_lengths(random_graph)
print(len(paths))
Now that you have a list of the shortest paths for the graph, make a histogram for it. This can be done with matplotlibs histogram function
In [3]:
%matplotlib inline
import matplotlib.pyplot as plt
In [81]:
data = paths #all_shortest_path_lengths(random_graph)
bins = list(range(0, 20, 2))
plt.title("Histogram of shortest path lengths in random graph")
plt.xlabel("Path lengths")
plt.ylabel("Number of paths")
plt.hist(data, bins, histtype="bar", rwidth=0.8)
plt.show()