In [1]:
#!pip3 install networkx
#!pip3 install matplotlib
#!pip3 install numpy
In [2]:
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import re
This group exercise is designed to develop an understanding of basic network measures and to start participants thinking about interesting research questions that can be enabled by network science.
There is only one requirement: the group member with the least amount of experience coding should be responsible for typing the code into a computer. After 40 minutes you should be prepared to give a 3 minute presentation of your work. Remember that these daily exercises are for you to get to know each other better, are not expected to be fully-fleshed out research project, and a way for you to explore research areas that may be new to you.
Visit the ICON website (link). You can search the index using the checkboxes under the tabs "network domain," "subdomain," "graph properties," and "size". You can also type in keywords related to the network you would like to find. Here is a screenshot:
To download a network, click the small yellow downward arrow and follow the link listed under "source". Importing this data into Python using networkx
will depend on the file type of the network you download. (Check out the package's documentation for how to import networks from different file types.)
Here's what it looks like to import the Zachary Karate Club from the edglist provided:
In [3]:
with open('karate_edges_77.txt', 'rb') as file:
karate_club = nx.read_edgelist(file) # Read in the edges
groups = {}
with open('karate_groups.txt', 'r') as file:
for line in file:
[node, group] = re.split(r'\t+', line.strip())
groups[node] = int(group)
nx.set_node_attributes(karate_club, name = 'group', values = groups) # Add attributes to the nodes (e.g. group membership)
For very small networks, it can be helpful to visualize the nodes and edges. Below we have colored the nodes with respect to their group within the karate club.
In [4]:
position = nx.spring_layout(karate_club)
nx.draw_networkx_labels(karate_club, pos = position)
colors = [] # Color the nodes acording to their group
for attr in nx.get_node_attributes(karate_club, 'group').values():
if attr == 1: colors.append('blue')
else: colors.append('green')
nx.draw(karate_club, position, node_color = colors) # Visualize the graph
A natural question you might like to ask about a network, is what are the most "important" nodes? There are many definitions of network importance or centrality. Here let's just consider one of the most straightforward measures: degree centrality -- the number of edges that start or end at a given node.
In [5]:
print([(n, karate_club.degree(n)) for n in karate_club.nodes()])
NetworkX can be used to return a normalized (divided by the maximum possible degree of the network) degree centrality for all nodes in the network.
In [6]:
degrees = nx.degree_centrality(karate_club)
print(degrees)
From both measures, we can see that nodes 1 and 34 have the highest degree. (These happen to be the two leaders from the two groups within the club.)
On large networks, you might want to look at the degree distribution of your network ...
In [7]:
# Enron email data set: http://snap.stanford.edu/data/email-Enron.html.
# (You can search "Email network (Enron corpus)" in ICON.)
with open('email_enron.txt', 'rb') as file:
enron = nx.read_edgelist(file, comments='#') # Read in the edges
In [8]:
print("Enron network contains {0} nodes, and {1} edges.".format(len(enron.nodes()), len(enron.edges())))
In [9]:
degree_sequence = list(dict(enron.degree()).values())
print("Average degree: {0}, Maximum degree: {1}".format(np.mean(degree_sequence), max(degree_sequence)))
plt.hist(degree_sequence, bins=30) # Plots histogram of degree sequence
plt.show()
Another network feature that you might like to know about your network, is how assortative or modular is it. Another way of asking this, is how likely is it for similar nodes to be connected to each other? This similarity can be measured along any number of network attributes. Here we ask, how much more likely are nodes from the same group within the karate club connected to each, than we would expect at random?
In [10]:
assort = nx.attribute_assortativity_coefficient(karate_club, 'group')
print("Assortativity coefficient: {0}".format(assort))
You can also add edge attributes, either all at once using set_edge_attributes
(like we did above for set_node_attributes
), or on an edge by edge basis as shown below. The shortest path between two nodes using that weight can then be calculated.
In [11]:
# Example borrowed from: https://www.cl.cam.ac.uk/teaching/1314/L109/tutorial.pdf
g = nx.Graph()
g.add_edge('a', 'b', weight=0.1)
g.add_edge('b', 'c', weight=1.5)
g.add_edge('a', 'c', weight=1.0)
g.add_edge('c', 'd', weight=2.2)
In [12]:
print(nx.shortest_path(g, 'b', 'd'))
print(nx.shortest_path(g, 'b', 'd', weight='weight'))
Lastly, one might want to create a function on top of these networks. For example, to measure the average degree of a node's neighbors:
In [13]:
# Example borrowed from: https://www.cl.cam.ac.uk/teaching/1314/L109/tutorial.pdf
def avg_neigh_degree(g):
data = {}
for n in g.nodes():
if g.degree(n):
data[n] = float(sum(g.degree(i) for i in g[n]))/g.degree(n)
return data
In [14]:
avg_neigh_degree(g) # Can you confirm that this is returning the correct results?
Out[14]:
Now, run a similar analysis on the network from ICON (or a network that a group member has ready) you have chosen. Feel free to not be confined by the networkx
functionality I have shown above; tap into your group's expertise and academic disciplines to identify other network measures you might be interested in.
In [ ]: