During this seminar we will:
In [ ]:
import numpy as np
import pandas as pd
import scipy.spatial as spt
import matplotlib.pyplot as plt
# plt.xkcd()
import networkx as nx
%matplotlib inline
HINT: For correlation coeficient you can use np.corrcoef(), for the distances you may implement them on your own or use scipy.spatial.distance.pdist()
In [ ]:
def pair_plot( G, order = None ) :
## Get the adjacnecy matrix
A = nx.to_numpy_matrix( G )
if order is not None :
A = A[ np.ix_( order, order ) ]
# plt.figure( figsize = (12,12) )
plt.subplot( 221 )
plt.imshow( A, cmap='Greys' )
plt.title( 'Adjacency matrix' )
## Correlation between
plt.subplot( 222 )
plt.imshow( np.corrcoef( A ), cmap = "Greys" )
plt.title( 'Correlation metric' )
## Euclidean mteric
plt.subplot( 223 )
D = spt.distance.squareform( spt.distance.pdist( A, metric = 'Euclidean' ) )
plt.imshow( D, cmap="Greys" )
plt.title( 'Euclidean distance' )
## Cosine of the angle between djacency columns
plt.subplot( 224 )
D = spt.distance.squareform( spt.distance.pdist( A, metric = 'Cosine' ) )
plt.imshow( D, cmap="Greys" )
plt.title( 'Cosine metric' )
In [ ]:
## Load Zachary's karate club and the coappearacnes of les Miserables characters.
G_0 = nx.karate_club_graph()
pair_plot( G_0 )
In [ ]:
G_1 = nx.read_gml( './data/lesmis/lesmis.gml' )
pair_plot( G_1 )
Without special preprocess procidures graph adjacency matrix can look very noisy and hide network's structure (just look at the matrices above). Offcourse usually you don't know the structure itself (communities, groups of closelly connected nodes, etc.) unless it is given, however there are some procedures of node reordering that provides a better view of the network's adjacency matrix.
Recerse Cuthill-McKee finds permutation of the nodes that minimizes the bandwidth of the matrix, which is calculated as: $$ \theta = \max_{a_{ij} > 0}|i-j|$$ Unformally, this algorithm puts some mass on the diagonal of adjacency matrix.
Run this reordering with nx.utils.reverse_cuthill_mckee_ordering(G) and compare with the results above
In [ ]:
order = [ i for i in nx.utils.reverse_cuthill_mckee_ordering( G_0 ) ]
pair_plot( G_0, order )
In [ ]:
order = [ i for i in nx.utils.reverse_cuthill_mckee_ordering( G_1 ) ]
pair_plot( G_1, order )
For this task you should download some data, convert it to network and calculate assortative mixing coefficient. Particularly, download characters and events datasets.
The first dataset provides information on characters of the Game Of Thrones universe. The second one -- describes some events that have occured with them during the story. We are interested about kill events since they can be considered as binary relations and consequently -- graphs. The attribute wrt which we are going to compute assortative mixing is called "Team".
We will explore datasets with pandas module. The list of usefull functions:
In [ ]:
# Put your code here
#
#
events = pd.read_csv( './data/events.csv' )
characters = pd.read_csv( './data/characters.csv' )
characters.set_index('characterID')
## Select the events with specific attribute: one kills another
kills = pd.DataFrame( events[ events['event'] == 'killed' ], columns = ['characterID', 'withID'])
kills = kills.dropna()
In [ ]:
## Create a graph of killer-victim relations
G = nx.DiGraph( )
G.add_edges_from( ( u, v ) for u,v in kills.itertuples( index = False ) )
In [ ]:
## Deduce the colouring of the graph
allegiance = characters['Team'].to_dict()
for k in allegiance.keys():
if k not in G :
del allegiance[ k ]
In [ ]:
G.nodes()
# .set_attribute()
In [ ]:
nx.draw( G, node_size = 2 )
In [ ]: