In [ ]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
try:
import networkx as nx
except:
# Install NetworkX
!pip install networkx
A network is made up of two main components: nodes and edges.
The nodes are the individuals, words, or entities that compose a network, and the edges are the links that connect one node to another.
Here is an example of a node and an edge.
Now we can try drawing our own network using the NetworkX package. Let's start off by initializing a Network, and call it G.
In [ ]:
# Initialize Graph object
G = nx.Graph()
Now we can add a node using the .add_node('nodename') method
In [ ]:
G = nx.Graph()
G.add_node('Jin')
G.add_node('Luke')
G.add_node('Eshin')
plt.figure(figsize=(5,3))
nx.draw(G,with_labels=True,node_size=5000,font_size=20,alpha=.5)
Notice there are no connections between the two nodes because we haven't added any edges yet.
To add edges between nodes, for example node1 and node 2, we can use the .add_edge('node1','node2') method on our graph.
In [ ]:
G = nx.Graph()
G.add_edge('Jin','Luke',weight=1)
G.add_edge('Jin','Eshin',weight=1)
G.add_edge('Luke','Eshin',weight=1)
plt.figure(figsize=(5,3))
pos = nx.spring_layout(G) # One way of specifiying a layout for nodes.
nx.draw(G,pos,with_labels=True,node_size=5000,font_size=20,alpha=.5)
Now you can see that we have an edge connecting Jin and Luke.
There are several different types of graphs.
For simplicity, today we will only cover unweighted and undirected graphs.
There are multiple ways to indicate nodes and edges in a graph.
As illustrated above, we can manually add each node and edge. This approach is fine for small graphs, but won't scale well. It is also possible to add nodes is to use a python dictionary.
The key is the node and the values indicate what that key(node) connects to.
In [ ]:
d = {'Jin':['Luke','Eshin','Antonia','Andy','Sophie','Rob','Charlie','Vanessa'],
'Luke':['Eshin','Antonia','Seth','Andy','Sophie','Rob','Charlie','Vanessa'],
'Antonia':['Luke','Jin','Eshin','Seth','Andy'],
'Eshin':['Heidi','Antonia','Jin','Sam','Andy'],
'Sophie':['Rob','Vanessa'],
'Rob':['Sophie','Vanessa']}
G = nx.Graph(d)
plt.figure(figsize=(15,8))
np.random.seed(2) # Just to keep things same
pos = pos = nx.fruchterman_reingold_layout(G) # Another way of specifiying a layout for nodes.
nx.draw(G,with_labels=True,node_size=1500,font_size=20,alpha=.3,width=2)
Centrality measures track the importance of a node derived from its position in the network.
There are four main groups of centrality depending on the type of statistics used:
<img src="Figures/centrality.png",width=300>
Reference:
Social and Economic Networks by Matthew O. Jackson. Princeton University Press.
Picture: Wikipedia article
Other examples: http://www.orgnet.com/sna.html
The degree centrality measures the number of edges of a given node.
In other words, it measures how connected a node is.
In [ ]:
def print_network(val_map,title=None):
print('\x1b[1;31m'+title+'\x1b[0m')
for k, v in val_map.iteritems():
print k.ljust(15), str(round(v,3)).ljust(30)
def plot_network(val_map, title=None):
values = [val_map.get(node, 0.25) for node in G.nodes()]
plt.figure(figsize=(12,5))
np.random.seed(2)
# pos = nx.spring_layout(G)
pos = nx.fruchterman_reingold_layout(G,dim=2)
ec = nx.draw_networkx_edges(G,pos,alpha=.2)
nc = nx.draw_networkx_nodes(G,pos,node_size=1000,with_labels=True,alpha=.5,cmap=plt.get_cmap('jet'),node_color=values)
nx.draw_networkx_labels(G,pos,font_size=18)
# nx.draw(G,pos,node_size=400,with_labels=True,alpha=.5,cmap=plt.get_cmap('jet'),node_color=values)
plt.colorbar(nc)
plt.axis('off')
plt.suptitle(title,fontsize=18)
plt.show()
# Get degree centrality values
d = nx.degree_centrality(G)
title = 'Degree Centrality Map'
# print centrality values
print_network(d,title)
# plot graph with values
plot_network(d,title)
Luke has the highest number of connections and therefore has the highest degree centrality. Jin is next, and then Eshin.
Think of Between Centrality as a bottleneck or a broker of two separate networks.
The measure is the fraction of the total number of shortest paths a node lie on between two other nodes.
Here Eshin has the highest betweeness centrality because he connects the most different people.
In [ ]:
d = nx.betweenness_centrality(G)
title = "Betweenness Centrality"
print_network(d,title)
plot_network(d,title)
In [ ]:
d = nx.closeness_centrality(G)
title = "Closeness Centrality"
print_network(d,title)
plot_network(d,title)
The underlying assumption of the Eigenvector centrality is that a node's importance is determined by how important its neighbors are.
For instance, having more influential friends (betweeness centrality) can be more important than just having more number of friends (degree centrality).
Now we can observe that Luke is back to being more important than Eshin (who had higher betweeness centrality), because Luke is friends with Eshin who also has high centrality.
In [ ]:
d = nx.eigenvector_centrality(G)
title = "Eigenvector Centrality"
print_network(d,title)
plot_network(d,title)
Macromeasures are useful to understand the network as a whole or to compare networks.
We will cover three fundamental measures of the network structure.
One fundamental characteristic is the distribution of degrees, number of edges for each node, which we can easily plot.
From this distribution we can discern attributes such as whether everyone is connected to everybody are there many lonely nodes and such.
In [ ]:
degree_sequence=sorted([d for n,d in G.degree().iteritems()], reverse=True) # degree sequence
G2=nx.complete_graph(5)
degree_sequence2=sorted([d for n,d in G2.degree().iteritems()], reverse=True) # degree sequence
fig, [(ax1,ax2),(ax3,ax4)] = plt.subplots(2,2,figsize=(15,10))
ax1.hist(degree_sequence,bins=range(0,7),rwidth=.8)
ax1.set_title("Degree Histogram for Graph 1",fontsize=16)
ax1.set_ylabel("Count",fontsize=14)
ax1.set_xlabel("Degree",fontsize=14)
ax1.set_xticks([d+0.4 for d in degree_sequence])
ax1.set_xticklabels([d for d in degree_sequence])
ax1.set_ylim((0,6))
nx.draw(G,pos=nx.circular_layout(G),ax=ax2)
ax2.set_title('Network 1: Graph of our class',fontsize=16)
ax3.hist(degree_sequence2,bins=range(0,7),rwidth=.8)
ax3.set_title("Degree Histogram for Complete Graph",fontsize=16)
ax3.set_ylabel("Count",fontsize=14)
ax3.set_xlabel("Degree",fontsize=14)
ax3.set_xticks([d+0.4 for d in degree_sequence2])
ax3.set_xticklabels([d for d in degree_sequence2])
ax3.set_ylim((0,6))
nx.draw(G2,pos=nx.circular_layout(G2),ax=ax4)
ax4.set_title('Network 2: Graph of a Complete Graph',fontsize=16)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
plt.show()
In [ ]:
print 'Network 1 average shortest path: ' + str(round(nx.average_shortest_path_length(G),2))
print 'Network 2 average shortest path: ' + str(nx.average_shortest_path_length(G2))
Another way to look at how tight-knit a network is to look at how clustered it is.
To estimate clustering, we start from cliques, which can be considered as a subnetwork composed of three nodes forming a triangle.
We can first calculate for each node, how many cliques(triangles) it is a member of.
Then we calculate transitivity, which is simply the number of triangles in the network divided by the number of all possible traingles.
In [ ]:
d = nx.triangles(G)
print_network(d,'Number of cliques(triangles)')
print
print 'Transitivity : ' + str(nx.transitivity(G))
plot_network(d,'Transitivity')
A similar approach is to calculate clustering coefficient for each node and for the graph. Clustering is the measure of how likely it is if A has edges to B and C, that B and C related?
For instance, Charlie form a triangle with Luke and Jin, but Sam and Heidi don't form a single triangle.
We can calculate this for each node and then get an average value overall to get the average clustering coefficient.
In [ ]:
d = nx.clustering(G)
print_network(d,'Clustering Coefficients per node')
print
print 'Average clustering coefficient : ' + str(nx.average_clustering(G))
In [ ]:
Your assignment this week is to construct your own social network. It could be based on a sports team, your close friends, or family. It could also reflect more abstract aspects of relationships such as social support, gossip, problem solver, etc.
Calculate two different metrics of the network to provide insight into how different members play different roles.
In [ ]: