In [1]:
import networkx as nx
Tab completion is highly useful in an interactive session to explore an object's attributes and methods. Here, we look for ways to open our files from the Facebook data.
In [2]:
nx.read
In [3]:
cd fbdata
The read_edgelist method looks adapted here as the .edges files contain list of person identifiers. This method returns a graph.
In [4]:
g = nx.read_edgelist('0.edges')
Let's display the number of nodes and edges in the graph.
In [5]:
len(g.nodes()), len(g.edges())
Out[5]:
Let's try to compute the radius of the graph.
In [6]:
nx.radius(g)
The error comes from the fact that the graph is not connected, so that the radius is infinite. We can try to obtain a connected component instead. Tab completion can help us finding the right function for that.
In [7]:
nx.connected
In [8]:
sg = nx.connected_component_subgraphs(g)
In [9]:
[len(s) for s in sg]
Out[9]:
We take the largest connected component.
In [10]:
sg = sg[0]
Now we can compute the radius and diameter of the graph.
In [11]:
nx.radius(sg), nx.diameter(sg)
Out[11]:
Appendind ? to any object in IPython gives information about it.
In [12]:
nx.eccentricity?
The %pdef, %pdoc and %psource magic commands give different pieces of information about objects: the definition, the docstring, and the source code.
In [13]:
%pdef nx.eccentricity
In [14]:
%pdoc nx.eccentricity
In [15]:
%psource nx.eccentricity
We can use the %timeit magic command to evaluate the time an instruction takes.
In [16]:
%timeit nx.center(sg)
In [17]:
nx.center(sg)
Out[17]:
Now we write our own, unoptimized function that computes a graph's center. Here is the code contained in center.py:
import networkx as nx
g = nx.read_edgelist('0.edges')
sg = nx.connected_component_subgraphs(g)[0]
center = [node for node in sg.nodes() if nx.eccentricity(sg, node) == nx.radius(sg)]
print(center)
We can benchmark and profile it to find hotspots that should be optimized.
In [18]:
cd ../..
In [19]:
run -t center.py
In [21]:
run -p center.py
We repeatedly call the exact same functions which explain why this function is so slow. Let's optimize it by caching the outputs of these functions. Here is the code contained in the file center2.py:
import networkx as nx
g = nx.read_edgelist('0.edges')
sg = nx.connected_component_subgraphs(g)[0]
# we compute the eccentricity once, for all nodes
ecc = nx.eccentricity(sg)
# we compute the radius once
r = nx.radius(sg)
center = [node for node in sg.nodes() if ecc[node] == r]
print(center)
In [22]:
run -t center2.py
NetworkX allows to plot graphs with the help of Matplotlib. We use several options to make the graph nicer. You can find the full documentation here.
In [23]:
nx.draw_networkx(sg, node_size=15, edge_color='y', with_labels=False, alpha=.4, linewidths=0)