Summer School on Algorithm Engineering

Tutorial "Algorithmic Methods for Network Analysis -- NetworKit" (Part 2)

Basic Network Overview in NetworKit

Such basic features of a network as those computed in Q&A Session #1 can also be displayed easily with one of NetworKit's convenience functions:


In [10]:
from networkit import *
G = readGraph("../../input/PGPgiantcompo.graph")
properties.overview(G)


Value 0 in data. Throwing out 0 values
Calculating best minimal value for power law fit
Value 0 in data. Throwing out 0 values
Calculating best minimal value for power law fit
<_NetworKit.PLM object at 0x11321cbb0>

Network Properties: PGPgiantcompo
==================
Basic Properties
-----------------------------  --------
nodes (n)                      10680
edges (m)                      24316
directed?                      False
weighted?                      False
isolated nodes                 0
self-loops                     0
density                        0.000426
clustering coefficient         0.444671
degeneracy (max. core number)  31
-----------------------------  --------
Node Degree Properties
-------------------------  --------------
min. degree                1
max. degree                205
avg. degree                4.553558
degree power law fit?      True, 2.101144
degree power law exponent  1.6997
degree assortativity       0.2382
-------------------------  --------------
Path Structure
-------------------------  --------
connected components       1
size of largest component  10680
estimated diameter range   (24, 24)
-------------------------  --------
Community Structure
-------------------------------------------  -----------  --------
modularity-driven community detection (PLM)
                                             communities  95
                                             modularity   0.881390
-------------------------------------------  -----------  --------
Degree Distribution
-------------------
0-   :  ▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇▇9394.00
9-   :  ▇▇▇▇ 781.00
18-  :  ▇ 240.00
27-  :  | 101.00
36-  :  |  91.00
45-  :  |  28.00
54-  :  |  17.00
63-  :  |  12.00
72-  :  |   5.00
81-  :  |   3.00
90-  :  |   2.00
99-  :  |   1.00
108- :  |   2.00
117- :  |   0.00
126- :  |   1.00
135- :  |   0.00
144- :  |   0.00
153- :  |   0.00
162- :  |   1.00
171- :  |   0.00
180- :  |   0.00
189- :  |   0.00
198- :  |   1.00
207- :  |   0.00
216- :  |   0.00

Now we know how to access the most fundamental features of a network. Some properties that could be regarded as fundamental are missing because their calculation would take too long for larger graphs.

Eulerian Cycles

Before we look at different network types, let us reflect on the distant roots of network analysis. Graph theory is one of its ancestors and has been around for several centuries now. As an example, think of Euler's problem from 1736 of finding a 'Eulerian tour/cycle' over the Pregel bridges in Königsberg.

Q&A session #2

  1. Do you remember how the problem of deciding wheter a graph has a Eulerian cycle can be solved for arbitrary undirected graphs? If not, use the web to find helpful characterizations of graphs with Eulerian cycles. Write it down here. Answer:

  2. Then enter code below to decide whether a graph has such a cycle or not. Insert code in next cell

  3. Test it on the PGP graph! What is the result? Does the result meet your expectations? Answer:


In [65]:
# 2-2) and 2-3) Decide whether graph is Eulerian or not

Differences between network types

As indicated by the previous task, graph theory and graph algorithms have been research foci for quite some time. In comparison, complex networks and their analysis have become a focus of investigation only recently. The reason for our interest in complex networks is their similarity to many real-world phenomena such as social interactions, web graphs, food webs, protein interactions and so forth.

But what makes a network complex or not complex?

Well, complex networks have 'non-trivial' topological features. Let us explore this statement in more detail with the help of data. To this end, we look at a social network, a technical mesh and an Erdös-Renyi random graph.


In [64]:
# Load/generate 3 graphs of different types
mit8 = readGraph("../../input-tutorial/MIT8.edgelist", Format.EdgeListTabZero)
airf1 = readGraph("../../input/airfoil1.graph")
gen = generators.ErdosRenyiGenerator(1000, 0.01)
er1000 = gen.generate()

Some context on these networks is given below, first for MIT8. It stems from a larger collection of Facebook networks from the early days of the online social network. MI8 models

"Facebook friendships at 100 US universities at some time in 2005, as well as a number of node attributes such as dorm, gender, graduation year, and academic major. The data was apparently provided directly by Facebook. (...) It does not include the names of individual or even of any of the node attributes (they have been given integer ids)." http://sociograph.blogspot.de/2011/03/facebook100-data-and-parser-for-it.html

The airfoil1 graph is a mesh that stems from a two-dimensional numerical simulation where one is interested in the airflow around an airplane wing. Meshes usually have a rather nice structure. Two-dimensional meshes are even planar in most (if not all reasonable) cases.


In [37]:
from IPython.core.display import Image
Image('../../input-tutorial/airfoil1-10p.png')


Out[37]:

The third network is a random graph generated according to the Erdös-Renyi $G(n, p)$ model. This model has been analyzed theoretically over the last 50 years or so. As we will see, however, it deviates dramatically from real networks in important aspects.

Giant Connected Component

Some types of realistic networks such as social ones usually have more than one connected component. However, even if there is more than one connected component, there is usually only one big one. Let us take a closer look at these giant components in particular, the differences of the three networks and other interesting properties in general.

Q&A Session #3

  1. Print the NetworKit overview for each of the three graphs! Answer:

  2. For those graphs with more than one connected component, extract the largest connected component $C$ and continue working with $C$. Enter the code for this in the cell below this one.

  3. What are the most striking topological differences between the three graphs in terms of the analytics kernels presented in the lecture? Answer:

  4. Pick an arbitrary pair of vertices $(u, v)$ in each network. Since we work with connected graphs, the two nodes are connected. What is the smallest number of hops a virus needs to make in the network to reach $v$ if it starts at $u$? What if $u$ and $v$ are chosen as worst case pair? Answer:

  5. What do you think: Why are complex networks more difficult to work with, what makes them 'complex'? Answer:

  6. Which parts of your answer to (5) concern theoretical analysis, which concern computational aspects? Answer:


In [66]:
# Code for Q&A Session #3
# 3-2) extract largest connected component

## some code fragments to help you...
# part = conncomp.getPartition()
# lc = part.getMembers(largestPart)
# sg = graph.Subgraph()
# mit8lc = sg.fromNodes(mit8,lc)

When you answered all questions, proceed with Tutorial #3.