Network analysis is largely descriptive statistics of network data. Network analysis software packages present themselves to the user as a collection of graph algorithms. However, such a collection is a rather low-level tool if the target users are scientists and data analysts from various fields. Making the most of NetworKit as a library requires writing some amount of custom code and some expertise in selecting algorithms and their parameters.
This is one reason why we also provide an interface that makes exploratory analysis of large networks easy and fast even for non-expert users, and provides an extensive overview. The underlying module assembles many algorithms into one program, automates analysis tasks and produces a graphical report to be displayed in the Jupyter Notebook or exported to an HTML or LATEX report document. Such a network profile gives a statistical overview over the properties of the network. It consists of the following parts: First global properties such as size and density are reported. The report then focuses on a variety of node centrality measures, showing an overview of their distributions in the network. Detailed views for centrality measures follow: Their distributions are plotted in histograms and characterized with standard statistics, and network-specific measures such as cen- tralization and assortativity are shown. We propose that correlations between centralities are per se interesting empirical features of a network. For instance, betweenness may or may not be positively correlated with increasing node degree. The prevalence of low- degree, high-betweenness nodes may influence the resilience of a transport network, as only few links then need to be severed in order to significantly disrupt transport processes following shortest paths. For the purpose of studying such aspects, the report displays a matrix of Spearman’s correlation coefficients, showing how node ranks derived from the centrality measures correlate with each other. Furthermore, scatter plots for each combination of centrality measure are shown, suggesting the type of correlation (see. The report continues with different ways of partitioning the network, showing histograms and pie charts for the size distributions of connected components, modularity- based communities and k-shells, respectively. Absent on purpose is a node- edge diagram of the graph, since graph drawing (apart from being computationally expensive) is usually not the preferred method to explore large complex networks. Rather, we consider networks first of all to be statistical data sets whose properties should be determined via graph algorithms and the results summarized via statistical graphics. The default configuration of the module is such that even networks with hundreds of millions of edges can be characterized in minutes on a parallel workstation. Furthermore, it can be configured by the user depending on the desired choice of analytics and level of detail, so that custom reports can be generated.
This notebook shows how the profiling module currently can be used.
First, set the directory, import matplotlib and networkit.
In [1]:
cd ../../
In [2]:
%matplotlib inline
In [3]:
from networkit import *
Read the graph of which you want the profile to be generated.
In [4]:
G = readGraph("input/MIT8.edgelist", Format.EdgeListTabZero)
G.isDirected()
Out[4]:
With setVerbose you can control how much informational data, e.g. which kernel runs now and how long it takes, will be printed. Check the docstring for the parameters.
In [5]:
profiling.Profile.setVerbose(True, level=0)
profiling.Profile.getVerbose()
Out[5]:
You can control the set of analytics measures appearing in the profile with the presetargument. Currently, minimal, default and complete are available.
In [6]:
pf = profiling.Profile.create(G, preset="default")
The statistical data and the results of the kernels can be visualized in the IPython notebook environment with the show-method. The only, currently available style is light, but you can specify whatever color you want.
Without a Config-object only the Global properties output will be produced. Depending on the amount of kernels and correlation measures, it may take a while until the output is produced.
In [7]:
pf.show()
The above report can also be saved to disk directly with the output-method. Two options are available:
.tex-file with all plots and statistical data included. Layout is similar to the above.
In [8]:
pf.output(outputType="HTML", directory="output")
In [9]:
mv input/lesmis.graph input/lesmis.walk.graph
In [10]:
mv input/power.graph input/power.walk.graph
The previous two commands just prepared two graph files of the repositories input-directory to match a certain pattern. The profiling-module provides convenient function to process a batch of graph files within a given input directory.
In [11]:
profiling.walk("input/","input/reports/", Format.METIS, filePattern="*.walk.graph", preset="complete")
The reports are named just after the input files:
In [12]:
ls input/reports
Some commands to undo the changes and remove the reports.
In [13]:
mv input/power.walk.graph input/power.graph
In [14]:
mv input/lesmis.walk.graph input/lesmis.graph
In [15]:
rm input/reports -r
In [16]:
rm output/MIT8.html