Graph Analysis - II

Imports


In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import MinMaxScaler
import networkx as nx
%matplotlib inline

Centrality measures for the nodes


In [ ]:
Gk=nx.karate_club_graph()

We start by computing the different centrality measures for our graph.


In [ ]:
degree_c = nx.degree_centrality(Gk)
pagerank_c = nx.pagerank(Gk)
eigenvector_c = nx.eigenvector_centrality(Gk)
betweenness_c = nx.betweenness_centrality(Gk)

In [ ]:
n = len(Gk.nodes())
deg = np.zeros(n)
pr = np.zeros(n)
eig = np.zeros(n)
bw = np.zeros(n)
i=0
for node in Gk:
    deg[i] = degree_c[node]
    pr[i] = pagerank_c[node]
    eig[i] = eigenvector_c[node]
    bw[i] = betweenness_c[node]
    i+=1
    
measures = pd.DataFrame()
measures['nodes'] = Gk.nodes()
measures.set_index(['nodes'], inplace=True)
measures['eigenvector_c'] = pd.DataFrame.from_dict(eigenvector_c, orient='index')
measures['pagerank_c'] = pd.DataFrame.from_dict(pagerank_c, orient='index')
measures['degree_c'] = pd.DataFrame.from_dict(degree_c, orient='index')
measures['betweenness_c'] = pd.DataFrame.from_dict(betweenness_c, orient='index')

We can plot the correlation of the different centralities. Notice the strong positive correlation between the degree centrality and the pagerank centrality.


In [ ]:
sns.corrplot(measures)

We can also do a scatterplot for all the different pairs of centralities measures and try to see if there are any strong trends.


In [ ]:
with sns.axes_style('white'):
    sns.pairplot(measures)

In [ ]:
plt.scatter(deg,pr)
plt.show()

In [ ]:
plt.scatter(deg,bw)
plt.show()

When plotting the graph, we can choose to represent the centrality of each node as its size.


In [ ]:
# plotting the graph 
scaler = MinMaxScaler((50,800))
eig_scaled = scaler.fit_transform(eig)
node_size = eig_scaled
nx.draw(Gk, node_size=node_size, node_color='#6699cc')

Let's see how the above apply to directed graphs.


In [ ]:
G = nx.read_gml('celegansneural.gml')

In [ ]:
print len(G.nodes()), len(G.edges())

In [ ]:
print nx.is_strongly_connected(G)

If the graph is not strongly connected, we can keep its largest strongly connected component.


In [ ]:
scc = nx.strongly_connected_component_subgraphs(G)
sizemax = 0
Gmax = G
for g in scc:
    if len(g.nodes())>sizemax:
        Gmax = g
        sizemax = len(Gmax.nodes())
print len(Gmax.nodes())

In [ ]:
Gmax = nx.DiGraph(Gmax)
degree_c = nx.degree_centrality(Gmax)
pagerank_c = nx.pagerank(Gmax)
eigenvector_c = nx.eigenvector_centrality(Gmax)
betweenness_c = nx.betweenness_centrality(Gmax)

In [ ]:
n = len(Gmax.nodes())
deg = np.zeros(n)
pr = np.zeros(n)
eig = np.zeros(n)
bw = np.zeros(n)
i=0
for node in Gmax:
    deg[i] = degree_c[node]
    pr[i] = pagerank_c[node]
    eig[i] = eigenvector_c[node]
    bw[i] = betweenness_c[node]
    i+=1
measures = pd.DataFrame()
measures['nodes'] = Gmax.nodes()
measures.set_index(['nodes'], inplace=True)
measures['eigenvector_c'] = pd.DataFrame.from_dict(eigenvector_c, orient='index')
measures['pagerank_c'] = pd.DataFrame.from_dict(pagerank_c, orient='index')
measures['degree_c'] = pd.DataFrame.from_dict(degree_c, orient='index')
measures['betweenness_c'] = pd.DataFrame.from_dict(betweenness_c, orient='index')

In [ ]:
sns.corrplot(measures)

In [ ]:
with sns.axes_style('white'):
    sns.pairplot(measures)

In [20]:
# Code for setting the style of the notebook
from IPython.core.display import HTML
def css_styling():
    styles = open("../theme/custom.css", "r").read()
    return HTML(styles)
css_styling()


Out[20]:

In [ ]: