Tutorial "Algorithmic Methods for Network Analysis with NetworKit" (Part 4)

Determining Important Nodes (cont'd)

Betweenness Centrality

If you interpret the Facebook graph as web link graph in the previous Q&A session, the obvious ranking choice is the PageRank. Note that today it is only one of many aspects modern web search engines consider to rank web pages. However, we were looking for the eigenvector centrality as MIT8 is a social network (both and possibly others can be applicable, though).

In applications where the flow of goods, vehicles, information, etc. via shortest paths in a network plays a major role, betweenness centrality is an interesting centrality index. It is also widely used for social networks. Its drawback is its rather high running time, which makes its use problematic for really large networks. But in many applications we do not need to consider the exact betweenness values nor an exact ranking. An approximation is often good enough.

Q&A Session #7

  1. In the PGP network, compute the 15 nodes with the highest (exact) betweenness values and order them accordingly in a ranking. Answer:

  2. Perform the same as in 1) with one difference: Instead of using the algorithm for computing exact betweenness values, use the RK approximation algorithm. Use also values different from the default ones for the parameters $\delta$ and $\epsilon$. What effects do you see in comparison to the ranking based on exact values? What about running time (you can use %time preceding a call to get its CPU time)? And how do the parameter settings affect these effects? Answer:


In [2]:
from networkit import *



In [3]:
%matplotlib inline

In [4]:
cd ~/Documents/workspace/NetworKit


/Users/Henning/Documents/workspace/NetworKit

In [5]:
G = readGraph("input/PGPgiantcompo.graph", Format.METIS)

# Code for 7-1)
# exact computation

In [6]:
# Code for 7-2)
# approximate computation

Community Detection

This section demonstrates the community detection capabilities of NetworKit. Community detection is concerned with identifying groups of nodes which are significantly more densely connected to each other than to the rest of the network.

Code for community detection is contained in the community module. The module provides a top-level function to quickly perform community detection with a suitable algorithm and print some stats about the result.


In [7]:
community.detectCommunities(G)


PLM(balanced,pc,turbo) detected communities in 0.022229909896850586 [s]
solution properties:
-------------------  ----------
# communities        100
min community size     6
max community size   692
avg. community size  106.8
modularity             0.882296
-------------------  ----------
Out[7]:
<_NetworKit.Partition at 0x10e0ffa50>

The default setting uses the parallel Louvain method (PLM) as underlying algorithm. The function prints some statistics and returns the partition object representing the communities in the network as an assignment of node to community label. PLM yields a high-quality solution at reasonably fast running times. Let us now apply other algorithms. To this end, one specifies the algorithm directly in the call.


In [8]:
community.detectCommunities(G, algo=community.PLP(G))


PLP detected communities in 0.010107040405273438 [s]
solution properties:
-------------------  ----------
# communities        953
min community size     2
max community size   361
avg. community size   11.2067
modularity             0.797423
-------------------  ----------
Out[8]:
<_NetworKit.Partition at 0x10e12ef30>

The visualization module, which is based on external code for graph drawing, provides a function which visualizes the community graph for a community detection solution: Communities are contracted into single nodes whose size corresponds to the community size. For problems with hundreds or thousands of communities, this may take a while.


In [9]:
communities = _
viztasks.drawCommunityGraph(G, communities)


Q&A Session 8

  1. Run PLMR as well. What are the main differences between the three algorithms PLM, PLMR, and PLP in terms of the solutions they compute and the time they need for this computation? Answer:

  2. Visualize the three results. Can you see aspects of your answer to 1) in the figure as well? Do the figures lead to other insights? Answer:


In [10]:
# Code for 8-1) and 8-2)

Further Applications of Clustering/Community Detection

Subgraphs that are internally dense and externally sparse, i.e. communities or clusters, are not only useful in social or other complex networks. Outside network analysis, there are numerous applications as well. As an example, consider image segmentation. This fundamental step in image processing refers to the identification and separation of image regions. Typically, this is done to separate different objects in the image from each other and to simplify the image before applying other image processing tools.

Our student Marcel Radermacher has tested our community detection techniques for the purpose of image segmentation. If you still have time, I encourage you to take a look at his results: http://nbviewer.ipython.org/urls/networkit.iti.kit.edu/data/uploads/projects/graph-based-segmentation.ipynb.

On the website the code cannot be used interactively. If you are interested, you can pull it from https://algohub.iti.kit.edu/parco/NetworKit/NetworKit-ImageSegmentation, though.

Concluding Remarks and Further Reading

NetworKit is steadily growing and already offers more than can be treated within this tutorial. Take a look at the project website and our papers for more information. We would be very happy if you decided to use NetworKit on a regular basis, be it as a user or as a developer. Since NetworKit is an open-source project, feedback and contributions are welcome (and have already happened in the past). The email list networkit@ira.uni-karlsruhe.de is the place for general discussion and questions. Also feel free to contact the authors with questions on how NetworKit can be applied in your research.

-- Henning Meyerhenke

Acknowledgements

This tutorial is partially based on the NetworKit user guide, which has been written to a large extent by Christian L. Staudt. Christian also contributed several ideas, questions and answers (including code) to this tutorial. Thank you very much!

Many other people have contributed to NetworKit and its documentation since its first release as Python/C++ hybrid in November 2013 (version 2.0). NetworKit would not be what it is today without their help! A full list of contributors can be found on the project website: https://networkit.iti.kit.edu/credits/.


In [ ]: