Dense Subgroups in Networks, Communities and Motif counting

At this time there is no rigorous and universally accepted definition of community in field of network analysis. However, one could formulate some properties that all communities are desired to exhibit, for example:

  • Low overlapping
  • Density
  • Low distance (diameter)
  • Connectivity

In [ ]:
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster import hierarchy
from scipy.spatial.distance import pdist, squareform
import itertools
from itertools import starmap
# plt.xkcd()
%matplotlib inline

Cliques

Taking into account the requirements above, one of the suitable extreme cases is a clique (complete subgraph).

However, if cliques with 3 nodes can be found in real networks with a very frequent rate, cliques of bigger size are rare. Besides, it does not worth to cross off from the list of potential communities a structure that lack of only one edge between vertices to be a clique. So... <br> Pros:

  • Dense
  • Low overlapping (as size increases)

Cons:

  • Very strict

k-clique, k-club, k-clan, k-plex

The are several relaxations of the clique concept.

Given a graph $G = (V,E)$ k-clique is a maximal subset $S \subseteq V$ s.t.

$$\forall\ u, v\ \in S,\ d_G(u,v) <= k\text{,}$$

where $d_G(\cdot, \cdot)$ denotes the length of the shortest path between nodes in graph $G$.

What about overlapping and cohesion here? Non-membership fail, overlapping


In [ ]:
G = nx.cycle_graph(5)
G.add_node(5)
G.add_edges_from([(4,5), (3, 5)])
nx.draw_spectral( G )

In [ ]:
# Complete function that exhastively searches for k-cliques in the graph G
# Use itertools.combinations to get combinations of nodes for subgraphs and node pairs
def FindKCliques(G, k) :
    D = nx.floyd_warshall_numpy( G )
    n = G.number_of_nodes( )
    kCliques = [ ]
    # Iterate over sizes
    for grSize in xrange( n, 1, -1 ) :
        # Iterate over subgraphs
        for subV in itertools.combinations( G, grSize ) :
            if any( [ U.issuperset( subV ) for U in kCliques ] ) : continue
            # Not included in maximal and all distances are <= k
            if np.all( D[ np.ix_( subV, subV ) ] <= k ) :
                kCliques.append( set( subV ) )
    return kCliques

In [ ]:
G = nx.barabasi_albert_graph( 20, 2 )

In [ ]:
# %lprun -f FindKCliques FindKCliques(G, 1)

Let's move to k-clans: k-clan is a n-clique $S$ s.t. diameter of induced subgraph $G[S] <= k$

What it gives us? More cohesion, non-membership fail fixed


As an analogy to k-cores and another relaxation of cliques k-plex comes on the scene.

k-plex is a subset of vertices $S$ if the minimum degree in the induced subgraph $d_\min(G[S]) >= |S| - k$. <br> In English: every node in k-plex is connected to every other node, except $k$ of them.

k-pleces are harder to compute, however they posses some good properties. If $G_S$ is k-plex then:

  • Every subgraph of $G_S$ is a k-plex
  • If $k < \frac{n+2}{2} \rightarrow \text{diameter}(G_S) \leq 2$
  • $\mathfrak{K}(G) > n - 2k +2$ <br> where $\mathfrak{K}(G)$ is the minimum number of vertices whose removal results in a disconnected or trivial graph

K-Cores

Generate some graph and draw its core decomposition


In [ ]:
# G = nx.barabasi_albert_graph( 40, 3 )
G = nx.davis_southern_women_graph( )
nx.draw_circular( G, node_size = 20 )

In [ ]:
n_core = nx.core_number( G )
n_core

In [ ]:
pos = nx.spring_layout( G )
nx.draw( G, pos = pos,
    node_list = n_core.keys( ),
    node_color = n_core.values( ),
    cmap = plt.get_cmap( "Blues" ) )

Motifs

Generate directed scale free graph nx.scale_free_graph() and set some motif template. Write a function that check the presence of the motif in the graph.


In [ ]:
params = np.array([1.,5.,1.])
params /= np.sum(params)
G = nx.scale_free_graph(20, alpha=params[0], beta=params[1], gamma=params[2])
plt.figure(figsize=(10,5))
plt.subplot(121)
nx.draw_spring(G)

motif = nx.DiGraph([(1,2), (1,3), (2,3)])
plt.subplot(122)
nx.draw(motif)

In [ ]:
## Continue your code here
result = list( )
for S in itertools.combinations( G, motif.number_of_nodes( ) ) :
    if nx.is_isomorphic( motif, G.subgraph( S ) ) :
        result.append( S )

In [ ]:
result

Hierarichal Clustering

Take some toy-graph, compute any kind of similarity matrix and perform hierarichal clustering.

Helpful functions: pdist(A,'cosine'), hierarchy.average(), hierarchy.dendrogram()


In [ ]:
## Continue your code here
G = nx.erdos_renyi_graph( 100, .65 )
G = nx.karate_club_graph( )

In [ ]:
A = nx.to_numpy_matrix( G )
A = np.minimum( A, A+A.T )
plt.imshow( A )

In [ ]:
D = squareform( pdist( A, metric =  "cosine" ) )

In [ ]:
plt.figure( figsize = (16, 9))
hier = hierarchy.dendrogram( hierarchy.average( D ) )

In [ ]: