Complex Network Profiles: Exploratory Network Analysis with NetworKit

Network analysis is largely descriptive statistics of network data. Network analysis software packages present themselves to the user as a collection of graph algorithms. However, such a collection is a rather low-level tool if the target users are scientists and data analysts from various fields. Making the most of NetworKit as a library requires writing some amount of custom code and some expertise in selecting algorithms and their parameters.

This is one reason why we also provide an interface that makes exploratory analysis of large networks easy and fast even for non-expert users, and provides an extensive overview. The underlying module assembles many algorithms into one program, automates analysis tasks and produces a graphical report to be displayed in the Jupyter Notebook or exported to an HTML or LATEX report document. Such a network profile gives a statistical overview over the properties of the network. It consists of the following parts: First global properties such as size and density are reported. The report then focuses on a variety of node centrality measures, showing an overview of their distributions in the network. Detailed views for centrality measures follow: Their distributions are plotted in histograms and characterized with standard statistics, and network-specific measures such as cen- tralization and assortativity are shown. We propose that correlations between centralities are per se interesting empirical features of a network. For instance, betweenness may or may not be positively correlated with increasing node degree. The prevalence of low- degree, high-betweenness nodes may influence the resilience of a transport network, as only few links then need to be severed in order to significantly disrupt transport processes following shortest paths. For the purpose of studying such aspects, the report displays a matrix of Spearman’s correlation coefficients, showing how node ranks derived from the centrality measures correlate with each other. Furthermore, scatter plots for each combination of centrality measure are shown, suggesting the type of correlation (see. The report continues with different ways of partitioning the network, showing histograms and pie charts for the size distributions of connected components, modularity- based communities and k-shells, respectively. Absent on purpose is a node- edge diagram of the graph, since graph drawing (apart from being computationally expensive) is usually not the preferred method to explore large complex networks. Rather, we consider networks first of all to be statistical data sets whose properties should be determined via graph algorithms and the results summarized via statistical graphics. The default configuration of the module is such that even networks with hundreds of millions of edges can be characterized in minutes on a parallel workstation. Furthermore, it can be configured by the user depending on the desired choice of analytics and level of detail, so that custom reports can be generated.

Generating a Network Profile

This notebook shows how the profiling module currently can be used.

First, set the directory, import matplotlib and networkit.



In [1]:

    
cd ../../









    



/Users/Henning/Documents/workspace/NetworKit



In [2]:

    
%matplotlib inline



In [3]:

    
from networkit import *

Read the graph of which you want the profile to be generated.



In [4]:

    
G = readGraph("input/MIT8.edgelist", Format.EdgeListTabZero)
G.isDirected()









    Out[4]:





False

With setVerbose you can control how much informational data, e.g. which kernel runs now and how long it takes, will be printed. Check the docstring for the parameters.



In [5]:

    
profiling.Profile.setVerbose(True, level=0)
profiling.Profile.getVerbose()









    Out[5]:





(True, 0, '')

You can control the set of analytics measures appearing in the profile with the presetargument. Currently, minimal, default and complete are available.



In [6]:

    
pf = profiling.Profile.create(G, preset="default")









    



Diameter: 0.03 s

Connected Components: 0.01 s

Centrality.Degree: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.01 s
    Centralization: 0.00 s
Centrality.CoreDecomposition: 0.01 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Centrality.ClusteringCoefficient: 0.14 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Centrality.PageRank: 0.15 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Centrality.Katz: 0.02 s
    Sort: 0.00 s
    Rank: 0.01 s
    Assortativity: 0.00 s
    Centralization: Centrality.centralization not properly defined for Centrality.Katz. 0.01 s
Centrality.Betweenness: 0.21 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: Centrality.centralization not properly defined for Centrality.Betweenness. 0.00 s
Partition.Communities: 0.23 s
    Sort: 0.00 s
    Rank: 0.00 s
Partition.ConnectedComponents: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
Partition.CoreDecomposition: 0.02 s
    Sort: 0.00 s
    Rank: 0.00 s

........................

total time (measures + stats + correlations): 2.45 s
total speed: 102557.7 edges/s

The statistical data and the results of the kernels can be visualized in the IPython notebook environment with the show-method. The only, currently available style is light, but you can specify whatever color you want.

Without a Config-object only the Global properties output will be produced. Depending on the amount of kernels and correlation measures, it may take a while until the output is produced.



In [7]:

    
pf.show()









    



....................................





    





	
		MIT8
		6440
		251252
		0.0121181
		False
		False
		0
		(8, 8)
		18
	
	
		
			

		
		
			

		
	
	
		
	
		Node centrality index which ranks nodes by their degree.
		DegreeCentrality
		
	
	
	
		
			Properties
			
				
					$ n = $
					
						6440
					
				
				
					$ f_{BC} = $
					
						1.00016
					
				
				
					$ r = $
					
						0.12006
					
				
				
					$ C = $
					
						0.099037
					
				
			
		
		
			Location
			
				
					$ x_{(1)} = $
					
						1
					
				
				
					$ x_{(n)} = $
					
						708
					
				
				
					$ \widetilde{x}_{O_{min}} = $
					
						1
					
				
				
					$ \widetilde{x}_{N_{min}} = $
					
						1
					
				
				
					$ \widetilde{x}_{0.25} = $
					
						19
					
				
				
					$ \widetilde{x}_{Med} = $
					
						56
					
				
				
					$ \widetilde{x}_{0.75} = $
					
						112
					
				
				
					$ \widetilde{x}_{N_{max}} = $
					
						251
					
				
				
					$ \widetilde{x}_{O_{max}} = $
					
						391
					
				
				
					$ \overline{x} = $
					
						78.0286
					
				
				
					$ \overline{x}_{IQ} = $
					
						58.9543
					
				
				
					$ \overline{x}_{R} = $
					
						354.5
					
				
				
					$ \overline{x}_{H^{-1}} = $
					
						10.3075
					
				
				
					$ \overline{x}_{H^{2}} = $
					
						111.034
					
				
				
					$ \overline{x}_{H^{3}} = $
					
						142.569
					
				
			
		
		
			Dispersion
			
				
					$ s^{2}_{x} = $
					
						6241.03
					
				
				
					$ s_{x} = $
					
						79.0002
					
				
				
					$ v_{x} = $
					
						1.01245
					
				
				
					$ R = $
					
						707
					
				
				
					$ R_{IQ} = $
					
						93
					
				
			
		
		
			Shape
			
				
					$ S_{YP} = $
					
						0.836526
					
				
				
					$ S_{M} = $
					
						1.95132
					
				
				
					$ \gamma = $
					
						6.25905
					
				
			
		
		
			Rank
			
				
					$ s^{2}_{rk_{x}} = $
					
						3.4561e+06
					
				
				
					$ s_{rk_{x}} = $
					
						1859.06
					
				
				
					$ v_{rk_{x}} = $
					
						0.577258
					
				
			
		
		
			Binning
			
				
					$ k_{PDF} = $
					
						20
					
				
				
					$ k_{CDF} = $
					
						166
					
				
				
					$ x_{D_{PDF}} = $
					
						18.675
					
				
			
		
	
	


	
		k-cores result from successively peeling away nodes of degree k. It also categorizes nodes according to the highest-order core in which they are contained, assigning a core number to each node.
		CoreDecomposition
		
	
	
	
		
			Properties
			
				
					$ n = $
					
						6440
					
				
				
					$ f_{BC} = $
					
						1.00016
					
				
				
					$ r = $
					
						0.356373
					
				
				
					$ C = $
					
						0.00489226
					
				
			
		
		
			Location
			
				
					$ x_{(1)} = $
					
						1
					
				
				
					$ x_{(n)} = $
					
						72
					
				
				
					$ \widetilde{x}_{O_{min}} = $
					
						1
					
				
				
					$ \widetilde{x}_{N_{min}} = $
					
						1
					
				
				
					$ \widetilde{x}_{0.25} = $
					
						18
					
				
				
					$ \widetilde{x}_{Med} = $
					
						44
					
				
				
					$ \widetilde{x}_{0.75} = $
					
						65.5
					
				
				
					$ \widetilde{x}_{N_{max}} = $
					
						72
					
				
				
					$ \widetilde{x}_{O_{max}} = $
					
						72
					
				
				
					$ \overline{x} = $
					
						40.6978
					
				
				
					$ \overline{x}_{IQ} = $
					
						42.841
					
				
				
					$ \overline{x}_{R} = $
					
						36.5
					
				
				
					$ \overline{x}_{H^{-1}} = $
					
						9.57722
					
				
				
					$ \overline{x}_{H^{2}} = $
					
						47.6448
					
				
				
					$ \overline{x}_{H^{3}} = $
					
						51.7998
					
				
			
		
		
			Dispersion
			
				
					$ s^{2}_{x} = $
					
						613.809
					
				
				
					$ s_{x} = $
					
						24.7752
					
				
				
					$ v_{x} = $
					
						0.608759
					
				
				
					$ R = $
					
						71
					
				
				
					$ R_{IQ} = $
					
						47.5
					
				
			
		
		
			Shape
			
				
					$ S_{YP} = $
					
						-0.399857
					
				
				
					$ S_{M} = $
					
						-0.220208
					
				
				
					$ \gamma = $
					
						-1.37969
					
				
			
		
		
			Rank
			
				
					$ s^{2}_{rk_{x}} = $
					
						3.45024e+06
					
				
				
					$ s_{rk_{x}} = $
					
						1857.48
					
				
				
					$ v_{rk_{x}} = $
					
						0.576768
					
				
			
		
		
			Binning
			
				
					$ k_{PDF} = $
					
						20
					
				
				
					$ k_{CDF} = $
					
						72
					
				
				
					$ x_{D_{PDF}} = $
					
						70.225
					
				
			
		
	
	


	
		The local clustering coefficient of a node is the fraction of edges that exist between neighbors of that node (or, equivalently, closed triangles centered on that node)
		LocalClusteringCoefficient
		
	
	
	
		
			Properties
			
				
					$ n = $
					
						6440
					
				
				
					$ f_{BC} = $
					
						1.00016
					
				
				
					$ r = $
					
						0.160824
					
				
				
					$ C = $
					
						1
					
				
			
		
		
			Location
			
				
					$ x_{(1)} = $
					
						0
					
				
				
					$ x_{(n)} = $
					
						1
					
				
				
					$ \widetilde{x}_{O_{min}} = $
					
						0
					
				
				
					$ \widetilde{x}_{N_{min}} = $
					
						0
					
				
				
					$ \widetilde{x}_{0.25} = $
					
						0.158209
					
				
				
					$ \widetilde{x}_{Med} = $
					
						0.224324
					
				
				
					$ \widetilde{x}_{0.75} = $
					
						0.333333
					
				
				
					$ \widetilde{x}_{N_{max}} = $
					
						0.593846
					
				
				
					$ \widetilde{x}_{O_{max}} = $
					
						0.857143
					
				
				
					$ \overline{x} = $
					
						0.271219
					
				
				
					$ \overline{x}_{IQ} = $
					
						0.230469
					
				
				
					$ \overline{x}_{R} = $
					
						0.5
					
				
				
					$ \overline{x}_{H^{-1}} = $
					
						nan
					
				
				
					$ \overline{x}_{H^{2}} = $
					
						0.332504
					
				
				
					$ \overline{x}_{H^{3}} = $
					
						0.395906
					
				
			
		
		
			Dispersion
			
				
					$ s^{2}_{x} = $
					
						0.037005
					
				
				
					$ s_{x} = $
					
						0.192367
					
				
				
					$ v_{x} = $
					
						0.709268
					
				
				
					$ R = $
					
						1
					
				
				
					$ R_{IQ} = $
					
						0.175124
					
				
			
		
		
			Shape
			
				
					$ S_{YP} = $
					
						0.731328
					
				
				
					$ S_{M} = $
					
						1.68566
					
				
				
					$ \gamma = $
					
						3.57816
					
				
			
		
		
			Rank
			
				
					$ s^{2}_{rk_{x}} = $
					
						3.45571e+06
					
				
				
					$ s_{rk_{x}} = $
					
						1858.95
					
				
				
					$ v_{rk_{x}} = $
					
						0.577225
					
				
			
		
		
			Binning
			
				
					$ k_{PDF} = $
					
						20
					
				
				
					$ k_{CDF} = $
					
						229
					
				
				
					$ x_{D_{PDF}} = $
					
						0.175
					
				
			
		
	
	


	
		PageRank assigns relative importance to nodes according to their connections, incorporating the idea that edges to high-scoring nodes contribute more.
		PageRank
		
	
	
	
		
			Properties
			
				
					$ n = $
					
						6440
					
				
				
					$ f_{BC} = $
					
						1.00016
					
				
				
					$ r = $
					
						0.065109
					
				
				
					$ C = $
					
						0.00104193
					
				
			
		
		
			Location
			
				
					$ x_{(1)} = $
					
						2.46052e-05
					
				
				
					$ x_{(n)} = $
					
						0.00119705
					
				
				
					$ \widetilde{x}_{O_{min}} = $
					
						2.46052e-05
					
				
				
					$ \widetilde{x}_{N_{min}} = $
					
						2.46052e-05
					
				
				
					$ \widetilde{x}_{0.25} = $
					
						6.59239e-05
					
				
				
					$ \widetilde{x}_{Med} = $
					
						0.000124977
					
				
				
					$ \widetilde{x}_{0.75} = $
					
						0.000210665
					
				
				
					$ \widetilde{x}_{N_{max}} = $
					
						0.000427226
					
				
				
					$ \widetilde{x}_{O_{max}} = $
					
						0.000638132
					
				
				
					$ \overline{x} = $
					
						0.00015528
					
				
				
					$ \overline{x}_{IQ} = $
					
						0.000128552
					
				
				
					$ \overline{x}_{R} = $
					
						0.000610826
					
				
				
					$ \overline{x}_{H^{-1}} = $
					
						8.55212e-05
					
				
				
					$ \overline{x}_{H^{2}} = $
					
						0.000197401
					
				
				
					$ \overline{x}_{H^{3}} = $
					
						0.000241961
					
				
			
		
		
			Dispersion
			
				
					$ s^{2}_{x} = $
					
						1.48575e-08
					
				
				
					$ s_{x} = $
					
						0.000121892
					
				
				
					$ v_{x} = $
					
						0.784981
					
				
				
					$ R = $
					
						0.00117244
					
				
				
					$ R_{IQ} = $
					
						0.000144741
					
				
			
		
		
			Shape
			
				
					$ S_{YP} = $
					
						0.745808
					
				
				
					$ S_{M} = $
					
						1.93346
					
				
				
					$ \gamma = $
					
						6.6625
					
				
			
		
		
			Rank
			
				
					$ s^{2}_{rk_{x}} = $
					
						3.45667e+06
					
				
				
					$ s_{rk_{x}} = $
					
						1859.21
					
				
				
					$ v_{rk_{x}} = $
					
						0.577305
					
				
			
		
		
			Binning
			
				
					$ k_{PDF} = $
					
						20
					
				
				
					$ k_{CDF} = $
					
						167
					
				
				
					$ x_{D_{PDF}} = $
					
						5.39162e-05
					
				
			
		
	
	


	
		
		KatzCentrality
		
	
	
	
		
			Properties
			
				
					$ n = $
					
						6440
					
				
				
					$ f_{BC} = $
					
						1.00016
					
				
				
					$ r = $
					
						0.255121
					
				
				
					$ C = $
					
						nan
					
				
			
		
		
			Location
			
				
					$ x_{(1)} = $
					
						0.00250583
					
				
				
					$ x_{(n)} = $
					
						0.078993
					
				
				
					$ \widetilde{x}_{O_{min}} = $
					
						0.00250583
					
				
				
					$ \widetilde{x}_{N_{min}} = $
					
						0.00250583
					
				
				
					$ \widetilde{x}_{0.25} = $
					
						0.00349336
					
				
				
					$ \widetilde{x}_{Med} = $
					
						0.00623257
					
				
				
					$ \widetilde{x}_{0.75} = $
					
						0.0118029
					
				
				
					$ \widetilde{x}_{N_{max}} = $
					
						0.0242274
					
				
				
					$ \widetilde{x}_{O_{max}} = $
					
						0.0367164
					
				
				
					$ \overline{x} = $
					
						0.00924866
					
				
				
					$ \overline{x}_{IQ} = $
					
						0.00667402
					
				
				
					$ \overline{x}_{R} = $
					
						0.0407494
					
				
				
					$ \overline{x}_{H^{-1}} = $
					
						0.00535364
					
				
				
					$ \overline{x}_{H^{2}} = $
					
						0.0124611
					
				
				
					$ \overline{x}_{H^{3}} = $
					
						0.0159483
					
				
			
		
		
			Dispersion
			
				
					$ s^{2}_{x} = $
					
						6.97526e-05
					
				
				
					$ s_{x} = $
					
						0.0083518
					
				
				
					$ v_{x} = $
					
						0.903028
					
				
				
					$ R = $
					
						0.0764872
					
				
				
					$ R_{IQ} = $
					
						0.00830957
					
				
			
		
		
			Shape
			
				
					$ S_{YP} = $
					
						1.08339
					
				
				
					$ S_{M} = $
					
						2.28341
					
				
				
					$ \gamma = $
					
						7.2571
					
				
			
		
		
			Rank
			
				
					$ s^{2}_{rk_{x}} = $
					
						3.45667e+06
					
				
				
					$ s_{rk_{x}} = $
					
						1859.21
					
				
				
					$ v_{rk_{x}} = $
					
						0.577305
					
				
			
		
		
			Binning
			
				
					$ k_{PDF} = $
					
						20
					
				
				
					$ k_{CDF} = $
					
						166
					
				
				
					$ x_{D_{PDF}} = $
					
						0.004418
					
				
			
		
	
	


	
		Betweenness centrality is the fraction of shortest paths between any pair of nodes that passes through a node.
		ApproxBetweenness2
		
	
	
	
		
			Properties
			
				
					$ n = $
					
						6440
					
				
				
					$ f_{BC} = $
					
						1.00016
					
				
				
					$ r = $
					
						0.00584382
					
				
				
					$ C = $
					
						nan
					
				
			
		
		
			Location
			
				
					$ x_{(1)} = $
					
						0
					
				
				
					$ x_{(n)} = $
					
						0.044909
					
				
				
					$ \widetilde{x}_{O_{min}} = $
					
						0
					
				
				
					$ \widetilde{x}_{N_{min}} = $
					
						0
					
				
				
					$ \widetilde{x}_{0.25} = $
					
						3.96385e-06
					
				
				
					$ \widetilde{x}_{Med} = $
					
						4.59084e-05
					
				
				
					$ \widetilde{x}_{0.75} = $
					
						0.000227037
					
				
				
					$ \widetilde{x}_{N_{max}} = $
					
						0.000559827
					
				
				
					$ \widetilde{x}_{O_{max}} = $
					
						0.000896126
					
				
				
					$ \overline{x} = $
					
						0.000300116
					
				
				
					$ \overline{x}_{IQ} = $
					
						6.71598e-05
					
				
				
					$ \overline{x}_{R} = $
					
						0.0224545
					
				
				
					$ \overline{x}_{H^{-1}} = $
					
						nan
					
				
				
					$ \overline{x}_{H^{2}} = $
					
						0.00119292
					
				
				
					$ \overline{x}_{H^{3}} = $
					
						0.00300996
					
				
			
		
		
			Dispersion
			
				
					$ s^{2}_{x} = $
					
						1.3332e-06
					
				
				
					$ s_{x} = $
					
						0.00115464
					
				
				
					$ v_{x} = $
					
						3.84732
					
				
				
					$ R = $
					
						0.044909
					
				
				
					$ R_{IQ} = $
					
						0.000223073
					
				
			
		
		
			Shape
			
				
					$ S_{YP} = $
					
						0.660484
					
				
				
					$ S_{M} = $
					
						16.9176
					
				
				
					$ \gamma = $
					
						460.479
					
				
			
		
		
			Rank
			
				
					$ s^{2}_{rk_{x}} = $
					
						3.44829e+06
					
				
				
					$ s_{rk_{x}} = $
					
						1856.96
					
				
				
					$ v_{rk_{x}} = $
					
						0.576606
					
				
			
		
		
			Binning
			
				
					$ k_{PDF} = $
					
						20
					
				
				
					$ k_{CDF} = $
					
						58
					
				
				
					$ x_{D_{PDF}} = $
					
						0.00112273
					
				
			
		
	
	


		
			
Degree

k-Core Decomposition

Local Clustering Coefficient

PageRank

Katz Centrality

Betweenness


			

		
	
	
		
	
		Community detection is the task of identifying groups of nodes in the network which are significantly more densely connected among each other than to the rest of nodes.
		PLM
		
	
	
	
		
			Properties
			
				
					$ n = $
					
						30
					
				
				
					$ f_{BC} = $
					
						1.03448
					
				
				
					$ r = $
					
						nan
					
				
				
					$ C = $
					
						nan
					
				
			
		
		
			Location
			
				
					$ x_{(1)} = $
					
						2
					
				
				
					$ x_{(n)} = $
					
						1161
					
				
				
					$ \widetilde{x}_{O_{min}} = $
					
						2
					
				
				
					$ \widetilde{x}_{N_{min}} = $
					
						2
					
				
				
					$ \widetilde{x}_{0.25} = $
					
						2
					
				
				
					$ \widetilde{x}_{Med} = $
					
						3
					
				
				
					$ \widetilde{x}_{0.75} = $
					
						276
					
				
				
					$ \widetilde{x}_{N_{max}} = $
					
						637
					
				
				
					$ \widetilde{x}_{O_{max}} = $
					
						980
					
				
				
					$ \overline{x} = $
					
						214.667
					
				
				
					$ \overline{x}_{IQ} = $
					
						47.5625
					
				
				
					$ \overline{x}_{R} = $
					
						581.5
					
				
				
					$ \overline{x}_{H^{-1}} = $
					
						3.5753
					
				
				
					$ \overline{x}_{H^{2}} = $
					
						417.422
					
				
				
					$ \overline{x}_{H^{3}} = $
					
						544.464
					
				
			
		
		
			Dispersion
			
				
					$ s^{2}_{x} = $
					
						132579
					
				
				
					$ s_{x} = $
					
						364.114
					
				
				
					$ v_{x} = $
					
						1.69618
					
				
				
					$ R = $
					
						1159
					
				
				
					$ R_{IQ} = $
					
						274
					
				
			
		
		
			Shape
			
				
					$ S_{YP} = $
					
						1.74396
					
				
				
					$ S_{M} = $
					
						1.42882
					
				
				
					$ \gamma = $
					
						0.469315
					
				
			
		
		
			Rank
			
				
					$ s^{2}_{rk_{x}} = $
					
						69.5862
					
				
				
					$ s_{rk_{x}} = $
					
						8.34183
					
				
				
					$ v_{rk_{x}} = $
					
						0.538183
					
				
			
		
		
			Binning
			
				
					$ k_{PDF} = $
					
						5
					
				
				
					$ k_{CDF} = $
					
						13
					
				
				
					$ x_{D_{PDF}} = $
					
						117.9
					
				
			
		
	
	


	
		All nodes in a connected component are reachable from each other.
		ConnectedComponents
		
	
	
	
		
			Properties
			
				
					$ n = $
					
						18
					
				
				
					$ f_{BC} = $
					
						1.05882
					
				
				
					$ r = $
					
						nan
					
				
				
					$ C = $
					
						nan
					
				
			
		
		
			Location
			
				
					$ x_{(1)} = $
					
						2
					
				
				
					$ x_{(n)} = $
					
						6402
					
				
				
					$ \widetilde{x}_{O_{min}} = $
					
						2
					
				
				
					$ \widetilde{x}_{N_{min}} = $
					
						2
					
				
				
					$ \widetilde{x}_{0.25} = $
					
						2
					
				
				
					$ \widetilde{x}_{Med} = $
					
						2
					
				
				
					$ \widetilde{x}_{0.75} = $
					
						2
					
				
				
					$ \widetilde{x}_{N_{max}} = $
					
						2
					
				
				
					$ \widetilde{x}_{O_{max}} = $
					
						2
					
				
				
					$ \overline{x} = $
					
						357.778
					
				
				
					$ \overline{x}_{IQ} = $
					
						2
					
				
				
					$ \overline{x}_{R} = $
					
						3202
					
				
				
					$ \overline{x}_{H^{-1}} = $
					
						2.27364
					
				
				
					$ \overline{x}_{H^{2}} = $
					
						1508.97
					
				
				
					$ \overline{x}_{H^{3}} = $
					
						2442.82
					
				
			
		
		
			Dispersion
			
				
					$ s^{2}_{x} = $
					
						2.27539e+06
					
				
				
					$ s_{x} = $
					
						1508.44
					
				
				
					$ v_{x} = $
					
						4.21613
					
				
				
					$ R = $
					
						6400
					
				
				
					$ R_{IQ} = $
					
						0
					
				
			
		
		
			Shape
			
				
					$ S_{YP} = $
					
						0.707575
					
				
				
					$ S_{M} = $
					
						3.56172
					
				
				
					$ \gamma = $
					
						11.3241
					
				
			
		
		
			Rank
			
				
					$ s^{2}_{rk_{x}} = $
					
						15.0882
					
				
				
					$ s_{rk_{x}} = $
					
						3.88436
					
				
				
					$ v_{rk_{x}} = $
					
						0.40888
					
				
			
		
		
			Binning
			
				
					$ k_{PDF} = $
					
						5
					
				
				
					$ k_{CDF} = $
					
						2
					
				
				
					$ x_{D_{PDF}} = $
					
						642
					
				
			
		
	
	


	
		k-cores result from successively peeling away nodes of degree k. It also categorizes nodes according to the highest-order core in which they are contained, assigning a core number to each node.
		CoreDecomposition
		
	
	
	
		
			Properties
			
				
					$ n = $
					
						72
					
				
				
					$ f_{BC} = $
					
						1.01408
					
				
				
					$ r = $
					
						nan
					
				
				
					$ C = $
					
						nan
					
				
			
		
		
			Location
			
				
					$ x_{(1)} = $
					
						38
					
				
				
					$ x_{(n)} = $
					
						725
					
				
				
					$ \widetilde{x}_{O_{min}} = $
					
						38
					
				
				
					$ \widetilde{x}_{N_{min}} = $
					
						38
					
				
				
					$ \widetilde{x}_{0.25} = $
					
						55
					
				
				
					$ \widetilde{x}_{Med} = $
					
						66
					
				
				
					$ \widetilde{x}_{0.75} = $
					
						89.5
					
				
				
					$ \widetilde{x}_{N_{max}} = $
					
						132
					
				
				
					$ \widetilde{x}_{O_{max}} = $
					
						176
					
				
				
					$ \overline{x} = $
					
						89.4444
					
				
				
					$ \overline{x}_{IQ} = $
					
						67.8333
					
				
				
					$ \overline{x}_{R} = $
					
						381.5
					
				
				
					$ \overline{x}_{H^{-1}} = $
					
						67.3068
					
				
				
					$ \overline{x}_{H^{2}} = $
					
						127.729
					
				
				
					$ \overline{x}_{H^{3}} = $
					
						190.366
					
				
			
		
		
			Dispersion
			
				
					$ s^{2}_{x} = $
					
						8431.55
					
				
				
					$ s_{x} = $
					
						91.8235
					
				
				
					$ v_{x} = $
					
						1.0266
					
				
				
					$ R = $
					
						687
					
				
				
					$ R_{IQ} = $
					
						34.5
					
				
			
		
		
			Shape
			
				
					$ S_{YP} = $
					
						0.765963
					
				
				
					$ S_{M} = $
					
						5.1046
					
				
				
					$ \gamma = $
					
						30.23
					
				
			
		
		
			Rank
			
				
					$ s^{2}_{rk_{x}} = $
					
						437.775
					
				
				
					$ s_{rk_{x}} = $
					
						20.9231
					
				
				
					$ v_{rk_{x}} = $
					
						0.573235
					
				
			
		
		
			Binning
			
				
					$ k_{PDF} = $
					
						8
					
				
				
					$ k_{CDF} = $
					
						28
					
				
				
					$ x_{D_{PDF}} = $
					
						80.9375

The above report can also be saved to disk directly with the output-method. Two options are available:

"HTML": Generates a single HTML file with embedded SVG graphics. Layout just as the above report.
"LaTeX": Creates a new folder with the name of the graph, saves plots as PDF and creates one .tex-file with all plots and statistical data included. Layout is similar to the above.



In [8]:

    
pf.output(outputType="HTML", directory="output")









    



....................................

Batch Processing

You can conveniently generate profiles for entire batches of graph files.



In [9]:

    
mv input/lesmis.graph input/lesmis.walk.graph



In [10]:

    
mv input/power.graph input/power.walk.graph

The previous two commands just prepared two graph files of the repositories input-directory to match a certain pattern. The profiling-module provides convenient function to process a batch of graph files within a given input directory.



In [11]:

    
profiling.walk("input/","input/reports/", Format.METIS, filePattern="*.walk.graph", preset="complete")









    



skipping input//.DS_Store as it does not match filePattern
skipping input//4elt.graph as it does not match filePattern
skipping input//airfoil1-10p.png as it does not match filePattern
skipping input//airfoil1.gi as it does not match filePattern
skipping input//airfoil1.graph as it does not match filePattern
skipping input//arxiv-qfin-author.dgs as it does not match filePattern
skipping input//astro-ph.graph as it does not match filePattern
skipping input//audikw1.graph as it does not match filePattern
skipping input//audikw1.graph.bz2 as it does not match filePattern
skipping input//bacteriorhodopsin.graph as it does not match filePattern
skipping input//caidaRouterLevel.graph as it does not match filePattern
skipping input//celegans_metabolic.graph as it does not match filePattern
skipping input//cm_NaechsteNachbarn.graphml as it does not match filePattern
skipping input//coAuthorsDBLP.graph as it does not match filePattern
skipping input//comments.edgelist as it does not match filePattern
skipping input//cond-mat-2005.graph as it does not match filePattern
skipping input//coPapersCiteseer.graph as it does not match filePattern
skipping input//dna-subgraph.gml as it does not match filePattern
skipping input//dna-supergraph.gml as it does not match filePattern
skipping input//dynamicTest.gexf as it does not match filePattern
skipping input//dynamicTest2.gexf as it does not match filePattern
skipping input//dynamicTest3.gexf as it does not match filePattern
skipping input//email-Enron.txt as it does not match filePattern
skipping input//email-Enron.txt.gz as it does not match filePattern
skipping input//eu-2005.graph as it does not match filePattern
skipping input//europe.osm.graph as it does not match filePattern
skipping input//example.dgs as it does not match filePattern
skipping input//example.edgelist as it does not match filePattern
skipping input//example.graph as it does not match filePattern
skipping input//example2.dgs as it does not match filePattern
skipping input//fe_4elt2.graph as it does not match filePattern
skipping input//foodweb-baydry.konect as it does not match filePattern
skipping input//guest.gml as it does not match filePattern
skipping input//hamming6-4.edgelist as it does not match filePattern
skipping input//hep-th.graph as it does not match filePattern
skipping input//host.gml as it does not match filePattern
skipping input//in-2004.graph as it does not match filePattern
skipping input//jazz.graph as it does not match filePattern
skipping input//jazz1.graph as it does not match filePattern
skipping input//jazz2.graph as it does not match filePattern
skipping input//jazz2_directed.gml as it does not match filePattern
skipping input//jazz2_undirected.gml as it does not match filePattern
skipping input//jazz2double.graph as it does not match filePattern
skipping input//johnson8-4-4.edgelist as it does not match filePattern
skipping input//karate.graph as it does not match filePattern
skipping input//keller4.edgelist as it does not match filePattern
skipping input//kkt_power.graph as it does not match filePattern
skipping input//kkt_power.graph.bz2 as it does not match filePattern

[ input//lesmis.walk.graph ]
Diameter: Diameter raised exception
Connected Components: 0.00 s

Centrality.Degree: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Centrality.CoreDecomposition: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Centrality.ClusteringCoefficient: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Centrality.PageRank: 0.01 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Centrality.Katz: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: Centrality.centralization not properly defined for Centrality.Katz. 0.00 s
Centrality.Betweenness: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: Centrality.centralization not properly defined for Centrality.Betweenness. 0.00 s
Centrality.Closeness: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Partition.Communities: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
Partition.ConnectedComponents: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
Partition.CoreDecomposition: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s

...............................

total time (measures + stats + correlations): 1.88 s
total speed: 135.4 edges/s

............................................

skipping input//looptest1.gml as it does not match filePattern
skipping input//looptest2.gml as it does not match filePattern
skipping input//mi_neu_Kanten30k.graphml as it does not match filePattern
skipping input//MIT8.edgelist as it does not match filePattern
skipping input//ns894786.mps.gz.variable.graph as it does not match filePattern
skipping input//ns894786.mps.gz.variable.graph.bz2 as it does not match filePattern
skipping input//out.ca-cit-HepTh as it does not match filePattern
skipping input//PGPgiantcompo.graph as it does not match filePattern
skipping input//polblogs.graph as it does not match filePattern
skipping input//power.gt as it does not match filePattern

[ input//power.walk.graph ]
Diameter: 0.00 s

Connected Components: 0.00 s

Centrality.Degree: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Centrality.CoreDecomposition: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Centrality.ClusteringCoefficient: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Centrality.PageRank: 0.01 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Centrality.Katz: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: Centrality.centralization not properly defined for Centrality.Katz. 0.00 s
Centrality.Betweenness: 0.02 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: Centrality.centralization not properly defined for Centrality.Betweenness. 0.00 s
Centrality.Closeness: 0.01 s
    Sort: 0.00 s
    Rank: 0.00 s
    Assortativity: 0.00 s
    Centralization: 0.00 s
Partition.Communities: 0.02 s
    Sort: 0.00 s
    Rank: 0.00 s
Partition.ConnectedComponents: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s
Partition.CoreDecomposition: 0.00 s
    Sort: 0.00 s
    Rank: 0.00 s

...............................

total time (measures + stats + correlations): 1.49 s
total speed: 4420.8 edges/s

............................................

skipping input//qfin-all-author.dgs as it does not match filePattern
skipping input//qfin-all-author.dgs.email as it does not match filePattern
skipping input//spaceseparated.edgelist as it does not match filePattern
skipping input//spaceseparated_weighted.edgelist as it does not match filePattern
skipping input//staticTest.gexf as it does not match filePattern
skipping input//test.csv as it does not match filePattern
skipping input//tiny_01.graph as it does not match filePattern
skipping input//tiny_02.graph as it does not match filePattern
skipping input//tiny_03.graph as it does not match filePattern
skipping input//tiny_04.graph as it does not match filePattern
skipping input//wiki-Vote.txt as it does not match filePattern
skipping input//wing.graph as it does not match filePattern
Done

The reports are named just after the input files:



In [12]:

    
ls input/reports









    



lesmis.html  power.html

Some commands to undo the changes and remove the reports.



In [13]:

    
mv input/power.walk.graph input/power.graph



In [14]:

    
mv input/lesmis.walk.graph input/lesmis.graph



In [15]:

    
rm input/reports -r









    



rm: input/reports: is a directory
rm: -r: No such file or directory



In [16]:

    
rm output/MIT8.html