Structural Similarity

During this seminar we will:

  1. Consider some node similarity measures, particularly Euclidean Distance, Correlation Coefficient and Cosine Distance
  2. Take a look at Cuthill-McKee node reordering procedure
  3. Calculate Assortative mixing coefficient for some Game Of Thrones network

In [ ]:
import numpy as np
import pandas as pd
import scipy.spatial as spt
import matplotlib.pyplot as plt
# plt.xkcd()
import networkx as nx
%matplotlib inline

Task 1 - Similarities Calculation

  1. Calculate Euclidean Distance, Correlation Coefficient and Cosine Distance for some toy-network (Zachary again?) and for Les Miserables dataset
  2. Visualize them

HINT: For correlation coeficient you can use np.corrcoef(), for the distances you may implement them on your own or use scipy.spatial.distance.pdist()


In [ ]:
def pair_plot( G, order = None ) :
## Get the adjacnecy matrix 
    A = nx.to_numpy_matrix( G )
    if order is not None :
        A = A[ np.ix_( order, order ) ]
#     plt.figure( figsize = (12,12) )
    plt.subplot( 221 )
    plt.imshow( A, cmap='Greys' )
    plt.title( 'Adjacency matrix' )
## Correlation between 
    plt.subplot( 222 )
    plt.imshow( np.corrcoef( A ), cmap = "Greys" )
    plt.title( 'Correlation metric' )
## Euclidean mteric
    plt.subplot( 223 )
    D = spt.distance.squareform( spt.distance.pdist( A, metric = 'Euclidean' ) )
    plt.imshow( D, cmap="Greys" )
    plt.title( 'Euclidean distance' )
## Cosine of the angle between djacency columns
    plt.subplot( 224 )
    D = spt.distance.squareform( spt.distance.pdist( A, metric = 'Cosine' ) )
    plt.imshow( D, cmap="Greys" )
    plt.title( 'Cosine metric' )

In [ ]:
## Load Zachary's karate club and the coappearacnes of les Miserables characters.
G_0 = nx.karate_club_graph()
pair_plot( G_0 )

In [ ]:
G_1 = nx.read_gml( './data/lesmis/lesmis.gml' )
pair_plot( G_1 )

Task 2 - Node Reordering

Without special preprocess procidures graph adjacency matrix can look very noisy and hide network's structure (just look at the matrices above). Offcourse usually you don't know the structure itself (communities, groups of closelly connected nodes, etc.) unless it is given, however there are some procedures of node reordering that provides a better view of the network's adjacency matrix.

Recerse Cuthill-McKee finds permutation of the nodes that minimizes the bandwidth of the matrix, which is calculated as: $$ \theta = \max_{a_{ij} > 0}|i-j|$$ Unformally, this algorithm puts some mass on the diagonal of adjacency matrix.

Run this reordering with nx.utils.reverse_cuthill_mckee_ordering(G) and compare with the results above


In [ ]:
order = [ i for i in nx.utils.reverse_cuthill_mckee_ordering( G_0 ) ]
pair_plot( G_0, order )

In [ ]:
order = [ i for i in nx.utils.reverse_cuthill_mckee_ordering( G_1 ) ]
pair_plot( G_1, order )

Task 3 - Assortative Mixing

For this task you should download some data, convert it to network and calculate assortative mixing coefficient. Particularly, download characters and events datasets.

The first dataset provides information on characters of the Game Of Thrones universe. The second one -- describes some events that have occured with them during the story. We are interested about kill events since they can be considered as binary relations and consequently -- graphs. The attribute wrt which we are going to compute assortative mixing is called "Team".

We will explore datasets with pandas module. The list of usefull functions:

  • read_csv()
  • characters.head()
  • dropna
  • set_index('characterID')['Team'].to_dict()
  • events[events['event'] == 'killed']

In [ ]:
# Put your code here
#
#
events = pd.read_csv( './data/events.csv' )

characters = pd.read_csv( './data/characters.csv' )
characters.set_index('characterID')

## Select the events with specific attribute: one kills another
kills = pd.DataFrame( events[ events['event'] == 'killed' ], columns = ['characterID', 'withID'])
kills = kills.dropna()

In [ ]:
## Create a graph of killer-victim relations
G = nx.DiGraph( )
G.add_edges_from( ( u, v ) for u,v in kills.itertuples( index = False ) )

In [ ]:
## Deduce the colouring of the graph
allegiance = characters['Team'].to_dict()
for k in allegiance.keys():
    if k not in G :
        del allegiance[ k ]

In [ ]:
G.nodes()
# .set_attribute()

In [ ]:
nx.draw( G, node_size = 2 )

In [ ]: