Solutions to

Tutorial "Algorithmic Methods for Network Analysis with NetworKit" (Part 2)

Basic Network Overview in NetworKit

Such basic features of a network as those computed in Q&A Session #1 can also be displayed easily with one of NetworKit's convenience functions (before we can show them we have to import the Python module networkit again and change to the correct folder):


In [6]:
from networkit import *

In [7]:
cd ~/Documents/workspace/NetworKit


/Users/Henning/Documents/workspace/NetworKit

In [8]:
%matplotlib inline

In [9]:
G = readGraph("input/PGPgiantcompo.graph", Format.METIS)
pf = profiling.Profile.create(G, preset="minimal")
pf.show()


PGPgiantcompo
10680
24316
0.000426403
False
False
0
N/A
N/A
1
Node centrality index which ranks nodes by their degree.
DegreeCentrality
Properties
$ n = $ 10680
$ f_{BC} = $ 1.00009
$ r = $ 0.239188
$ C = $ 0.0187782
Location
$ x_{(1)} = $ 1
$ x_{(n)} = $ 205
$ \widetilde{x}_{O_{min}} = $ 1
$ \widetilde{x}_{N_{min}} = $ 1
$ \widetilde{x}_{0.25} = $ 1
$ \widetilde{x}_{Med} = $ 2
$ \widetilde{x}_{0.75} = $ 4
$ \widetilde{x}_{N_{max}} = $ 8
$ \widetilde{x}_{O_{max}} = $ 13
$ \overline{x} = $ 4.55356
$ \overline{x}_{IQ} = $ 2.15281
$ \overline{x}_{R} = $ 103
$ \overline{x}_{H^{-1}} = $ 1.75165
$ \overline{x}_{H^{2}} = $ 9.27234
$ \overline{x}_{H^{3}} = $ 16.4642
Dispersion
$ s^{2}_{x} = $ 65.2474
$ s_{x} = $ 8.07759
$ v_{x} = $ 1.77391
$ R = $ 204
$ R_{IQ} = $ 3
Shape
$ S_{YP} = $ 0.948386
$ S_{M} = $ 6.59782
$ \gamma = $ 81.4638
Rank
$ s^{2}_{rk_{x}} = $ 8.83549e+06
$ s_{rk_{x}} = $ 2972.46
$ v_{rk_{x}} = $ 0.556588
Binning
$ k_{PDF} = $ 20
$ k_{CDF} = $ 83
$ x_{D_{PDF}} = $ 6.1
All nodes in a connected component are reachable from each other.
ConnectedComponents
Properties
$ n = $ 1
$ f_{BC} = $ nan
$ r = $ nan
$ C = $ nan
Location
$ x_{(1)} = $ 10680
$ x_{(n)} = $ 10680
$ \widetilde{x}_{O_{min}} = $ 10680
$ \widetilde{x}_{N_{min}} = $ 10680
$ \widetilde{x}_{0.25} = $ 10680
$ \widetilde{x}_{Med} = $ 10680
$ \widetilde{x}_{0.75} = $ 10680
$ \widetilde{x}_{N_{max}} = $ 10680
$ \widetilde{x}_{O_{max}} = $ 10680
$ \overline{x} = $ 10680
$ \overline{x}_{IQ} = $ 10680
$ \overline{x}_{R} = $ 10680
$ \overline{x}_{H^{-1}} = $ 10680
$ \overline{x}_{H^{2}} = $ 10680
$ \overline{x}_{H^{3}} = $ 10680
Dispersion
$ s^{2}_{x} = $ nan
$ s_{x} = $ nan
$ v_{x} = $ nan
$ R = $ 0
$ R_{IQ} = $ 0
Shape
$ S_{YP} = $ nan
$ S_{M} = $ nan
$ \gamma = $ nan
Rank
$ s^{2}_{rk_{x}} = $ nan
$ s_{rk_{x}} = $ nan
$ v_{rk_{x}} = $ nan
Binning
$ k_{PDF} = $ 1
$ k_{CDF} = $ 1
$ x_{D_{PDF}} = $ 10680

Now we know how to access the most fundamental features of a network. Some properties that could be regarded as fundamental are missing because their calculation would take too long for larger graphs.

Eulerian Cycles

Before we look at different network types, let us reflect on the distant roots of network analysis. Graph theory is one of its ancestors and has been around for several centuries now. As an example, think of Euler's problem from 1736 of finding a 'Eulerian tour/cycle' over the Pregel bridges in Königsberg.

Q&A session #2

  1. Do you remember how the problem of deciding wheter a graph has a Eulerian cycle can be solved for arbitrary undirected graphs? If not, use the web to find helpful characterizations of graphs with Eulerian cycles. Write it down here. Answer:

  2. Then enter code below to decide whether a graph has such a cycle or not. Insert code in next cell

  3. Test it on the PGP graph! What is the result? Does the result meet your expectations? Answer:


In [10]:
# 2-1) An undirected graph is Eulerian iff all its vertices have even degree

# 2-2) and 2-3) Decide whether graph is Eulerian or not
eulerian = True
for v in G.nodes():
    if (G.degree(v) % 2 != 0):
        eulerian = False
        break

if (eulerian == True):
    print("2-3) Graph is Eulerian")
else:
    print("2-3) Graph is NOT Eulerian")


2-3) Graph is NOT Eulerian

Differences between network types

As indicated by the previous task, graph theory and graph algorithms have been research foci for quite some time. In comparison, complex networks and their analysis have become a focus of investigation only recently. The reason for our interest in complex networks is their similarity to many real-world phenomena such as social interactions, web graphs, food webs, protein interactions and so forth.

But what makes a network complex or not complex?

Well, complex networks have 'non-trivial' topological features. Let us explore this statement in more detail with the help of data. To this end, we look at a social network, a technical mesh and an Erdös-Renyi random graph.


In [11]:
# Load/generate 3 graphs of different types
mit8 = readGraph("input/MIT8.edgelist", Format.EdgeListTabZero)
airf1 = readGraph("input/airfoil1.graph", Format.METIS)
gen = generators.ErdosRenyiGenerator(1000, 0.01)
er1000 = gen.generate()

Some context on these networks is given below, first for MIT8. It stems from a larger collection of Facebook networks from the early days of the online social network. MI8 models

"Facebook friendships at 100 US universities at some time in 2005, as well as a number of node attributes such as dorm, gender, graduation year, and academic major. The data was apparently provided directly by Facebook. (...) It does not include the names of individual or even of any of the node attributes (they have been given integer ids)." http://sociograph.blogspot.de/2011/03/facebook100-data-and-parser-for-it.html

The airfoil1 graph is a mesh that stems from a two-dimensional numerical simulation where one is interested in the airflow around an airplane wing. Meshes are usually easy to visualize, two-dimensional meshes are even planar in most (if not all reasonable) cases. The picture below illustrates this. The colors signify different vertex blocks, which we can ignore here.


In [12]:
from IPython.core.display import Image
Image('input/airfoil1-10p.png')


Out[12]:

The third network is a random graph generated according to the Erdös-Renyi $G(n, p)$ model. This model has been analyzed theoretically over the last 50 years or so. As we will see, however, it deviates dramatically from real networks in important aspects.

Giant Connected Component

Some types of realistic networks such as social ones usually have more than one connected component. However, even if there is more than one connected component, there is usually only one big one. Let us take a closer look at these giant components in particular, the differences of the three networks and other interesting properties in general.

Q&A Session #3

  1. Print the NetworKit overview for each of the three graphs! Answer:

  2. For those graphs with more than one connected component, extract the largest connected component $C$ and continue working with $C$. Enter the code for this in the cell below this one.

  3. What are the most striking topological differences between the three graphs in terms of the analytics kernels presented in the lecture? Answer:

  4. Pick an arbitrary pair of vertices $(u, v)$ in each network. Since we work with connected graphs, the two nodes are connected. What is the smallest number of hops a virus needs to make in the network to reach $v$ if it starts at $u$? What if $u$ and $v$ are chosen as worst case pair? Answer:

  5. What do you think: Why are complex networks more difficult to work with, what makes them 'complex'? Answer:

  6. Which parts of your answer to (5) concern theoretical analysis, which concern computational aspects? Answer:


In [16]:
# Code for Q&A Session #3
# 3-1)
pf = profiling.Profile.create(airf1, preset="minimal")
pf.show()

pf = profiling.Profile.create(er1000, preset="minimal")
pf.show()

pf = profiling.Profile.create(mit8, preset="minimal")
pf.show()

# 3-2) extract largest connected component
mit8lc = workflows.extractLargestComponent(mit8)
pf = profiling.Profile.create(mit8lc, preset="minimal")
pf.show()


# for comparison:

# 3-3) Look at diameter, clustering coefficients, k-core decomposition, degree distribution

# 3-4)
for g in {mit8lc, airf1, er1000}:
    u = g.randomNode()
    v = g.randomNode()
    if g.isWeighted():
        print("weighted")
        dijkstra = graph.Dijkstra(g, u)
        dijkstra.run()
        dists = dijkstra.getDistances()
    else:
        print("unweighted")
        bfs = graph.BFS(g, u)
        bfs.run()
        dists = bfs.getDistances()

    print("distance from ", u, " to ", v, ": ", dists[v])
    print("Diameter of graph ", g.toString(), ": ", distance.Diameter(g))
    
# 3-5)
# Low diameter: $k$-hop exploration visits large parts of the graph even for small $k$, cache inefficiency
# Degree distribution: load imbalance in parallel computations; may require more flexible data structure
# Visualization: Traditional visualization tools often produce 'hairballs'


airfoil1
4253
12289
0.00135912
False
False
0
N/A
N/A
1
Node centrality index which ranks nodes by their degree.
DegreeCentrality
Properties
$ n = $ 4253
$ f_{BC} = $ 1.00024
$ r = $ 0.321839
$ C = $ 0.000758562
Location
$ x_{(1)} = $ 3
$ x_{(n)} = $ 9
$ \widetilde{x}_{O_{min}} = $ 6
$ \widetilde{x}_{N_{min}} = $ 6
$ \widetilde{x}_{0.25} = $ 6
$ \widetilde{x}_{Med} = $ 6
$ \widetilde{x}_{0.75} = $ 6
$ \widetilde{x}_{N_{max}} = $ 6
$ \widetilde{x}_{O_{max}} = $ 6
$ \overline{x} = $ 5.77898
$ \overline{x}_{IQ} = $ 6
$ \overline{x}_{R} = $ 6
$ \overline{x}_{H^{-1}} = $ 5.66886
$ \overline{x}_{H^{2}} = $ 5.82203
$ \overline{x}_{H^{3}} = $ 5.85897
Dispersion
$ s^{2}_{x} = $ 0.499586
$ s_{x} = $ 0.706814
$ v_{x} = $ 0.122308
$ R = $ 6
$ R_{IQ} = $ 0
Shape
$ S_{YP} = $ -0.938099
$ S_{M} = $ -1.51124
$ \gamma = $ 2.49863
Rank
$ s^{2}_{rk_{x}} = $ 764092
$ s_{rk_{x}} = $ 874.124
$ v_{rk_{x}} = $ 0.410965
Binning
$ k_{PDF} = $ 20
$ k_{CDF} = $ 7
$ x_{D_{PDF}} = $ 5.85
All nodes in a connected component are reachable from each other.
ConnectedComponents
Properties
$ n = $ 1
$ f_{BC} = $ nan
$ r = $ nan
$ C = $ nan
Location
$ x_{(1)} = $ 4253
$ x_{(n)} = $ 4253
$ \widetilde{x}_{O_{min}} = $ 4253
$ \widetilde{x}_{N_{min}} = $ 4253
$ \widetilde{x}_{0.25} = $ 4253
$ \widetilde{x}_{Med} = $ 4253
$ \widetilde{x}_{0.75} = $ 4253
$ \widetilde{x}_{N_{max}} = $ 4253
$ \widetilde{x}_{O_{max}} = $ 4253
$ \overline{x} = $ 4253
$ \overline{x}_{IQ} = $ 4253
$ \overline{x}_{R} = $ 4253
$ \overline{x}_{H^{-1}} = $ 4253
$ \overline{x}_{H^{2}} = $ 4253
$ \overline{x}_{H^{3}} = $ 4253
Dispersion
$ s^{2}_{x} = $ nan
$ s_{x} = $ nan
$ v_{x} = $ nan
$ R = $ 0
$ R_{IQ} = $ 0
Shape
$ S_{YP} = $ nan
$ S_{M} = $ nan
$ \gamma = $ nan
Rank
$ s^{2}_{rk_{x}} = $ nan
$ s_{rk_{x}} = $ nan
$ v_{rk_{x}} = $ nan
Binning
$ k_{PDF} = $ 1
$ k_{CDF} = $ 1
$ x_{D_{PDF}} = $ 4253
G#65
1000
4993
0.009996
False
False
0
N/A
N/A
1
Node centrality index which ranks nodes by their degree.
DegreeCentrality
Properties
$ n = $ 1000
$ f_{BC} = $ 1.001
$ r = $ 0.0166332
$ C = $ 0.0101252
Location
$ x_{(1)} = $ 2
$ x_{(n)} = $ 20
$ \widetilde{x}_{O_{min}} = $ 2
$ \widetilde{x}_{N_{min}} = $ 2
$ \widetilde{x}_{0.25} = $ 8
$ \widetilde{x}_{Med} = $ 10
$ \widetilde{x}_{0.75} = $ 12
$ \widetilde{x}_{N_{max}} = $ 18
$ \widetilde{x}_{O_{max}} = $ 20
$ \overline{x} = $ 9.986
$ \overline{x}_{IQ} = $ 9.822
$ \overline{x}_{R} = $ 11
$ \overline{x}_{H^{-1}} = $ 8.92911
$ \overline{x}_{H^{2}} = $ 10.4564
$ \overline{x}_{H^{3}} = $ 10.8958
Dispersion
$ s^{2}_{x} = $ 9.62543
$ s_{x} = $ 3.10249
$ v_{x} = $ 0.310684
$ R = $ 18
$ R_{IQ} = $ 4
Shape
$ S_{YP} = $ -0.0135375
$ S_{M} = $ 0.323073
$ \gamma = $ -0.141412
Rank
$ s^{2}_{rk_{x}} = $ 82586.9
$ s_{rk_{x}} = $ 287.379
$ v_{rk_{x}} = $ 0.574184
Binning
$ k_{PDF} = $ 20
$ k_{CDF} = $ 19
$ x_{D_{PDF}} = $ 9.65
All nodes in a connected component are reachable from each other.
ConnectedComponents
Properties
$ n = $ 1
$ f_{BC} = $ nan
$ r = $ nan
$ C = $ nan
Location
$ x_{(1)} = $ 1000
$ x_{(n)} = $ 1000
$ \widetilde{x}_{O_{min}} = $ 1000
$ \widetilde{x}_{N_{min}} = $ 1000
$ \widetilde{x}_{0.25} = $ 1000
$ \widetilde{x}_{Med} = $ 1000
$ \widetilde{x}_{0.75} = $ 1000
$ \widetilde{x}_{N_{max}} = $ 1000
$ \widetilde{x}_{O_{max}} = $ 1000
$ \overline{x} = $ 1000
$ \overline{x}_{IQ} = $ 1000
$ \overline{x}_{R} = $ 1000
$ \overline{x}_{H^{-1}} = $ 1000
$ \overline{x}_{H^{2}} = $ 1000
$ \overline{x}_{H^{3}} = $ 1000
Dispersion
$ s^{2}_{x} = $ nan
$ s_{x} = $ nan
$ v_{x} = $ nan
$ R = $ 0
$ R_{IQ} = $ 0
Shape
$ S_{YP} = $ nan
$ S_{M} = $ nan
$ \gamma = $ nan
Rank
$ s^{2}_{rk_{x}} = $ nan
$ s_{rk_{x}} = $ nan
$ v_{rk_{x}} = $ nan
Binning
$ k_{PDF} = $ 1
$ k_{CDF} = $ 1
$ x_{D_{PDF}} = $ 1000
MIT8
6440
251252
0.0121181
False
False
0
N/A
N/A
18
Node centrality index which ranks nodes by their degree.
DegreeCentrality
Properties
$ n = $ 6440
$ f_{BC} = $ 1.00016
$ r = $ 0.12006
$ C = $ 0.099037
Location
$ x_{(1)} = $ 1
$ x_{(n)} = $ 708
$ \widetilde{x}_{O_{min}} = $ 1
$ \widetilde{x}_{N_{min}} = $ 1
$ \widetilde{x}_{0.25} = $ 19
$ \widetilde{x}_{Med} = $ 56
$ \widetilde{x}_{0.75} = $ 112
$ \widetilde{x}_{N_{max}} = $ 251
$ \widetilde{x}_{O_{max}} = $ 391
$ \overline{x} = $ 78.0286
$ \overline{x}_{IQ} = $ 58.9543
$ \overline{x}_{R} = $ 354.5
$ \overline{x}_{H^{-1}} = $ 10.3075
$ \overline{x}_{H^{2}} = $ 111.034
$ \overline{x}_{H^{3}} = $ 142.569
Dispersion
$ s^{2}_{x} = $ 6241.03
$ s_{x} = $ 79.0002
$ v_{x} = $ 1.01245
$ R = $ 707
$ R_{IQ} = $ 93
Shape
$ S_{YP} = $ 0.836526
$ S_{M} = $ 1.95132
$ \gamma = $ 6.25905
Rank
$ s^{2}_{rk_{x}} = $ 3.4561e+06
$ s_{rk_{x}} = $ 1859.06
$ v_{rk_{x}} = $ 0.577258
Binning
$ k_{PDF} = $ 20
$ k_{CDF} = $ 166
$ x_{D_{PDF}} = $ 18.675
All nodes in a connected component are reachable from each other.
ConnectedComponents
Properties
$ n = $ 18
$ f_{BC} = $ 1.05882
$ r = $ nan
$ C = $ nan
Location
$ x_{(1)} = $ 2
$ x_{(n)} = $ 6402
$ \widetilde{x}_{O_{min}} = $ 2
$ \widetilde{x}_{N_{min}} = $ 2
$ \widetilde{x}_{0.25} = $ 2
$ \widetilde{x}_{Med} = $ 2
$ \widetilde{x}_{0.75} = $ 2
$ \widetilde{x}_{N_{max}} = $ 2
$ \widetilde{x}_{O_{max}} = $ 2
$ \overline{x} = $ 357.778
$ \overline{x}_{IQ} = $ 2
$ \overline{x}_{R} = $ 3202
$ \overline{x}_{H^{-1}} = $ 2.27364
$ \overline{x}_{H^{2}} = $ 1508.97
$ \overline{x}_{H^{3}} = $ 2442.82
Dispersion
$ s^{2}_{x} = $ 2.27539e+06
$ s_{x} = $ 1508.44
$ v_{x} = $ 4.21613
$ R = $ 6400
$ R_{IQ} = $ 0
Shape
$ S_{YP} = $ 0.707575
$ S_{M} = $ 3.56172
$ \gamma = $ 11.3241
Rank
$ s^{2}_{rk_{x}} = $ 15.0882
$ s_{rk_{x}} = $ 3.88436
$ v_{rk_{x}} = $ 0.40888
Binning
$ k_{PDF} = $ 5
$ k_{CDF} = $ 2
$ x_{D_{PDF}} = $ 642
G#75
6402
251230
0.0122613
False
False
0
N/A
N/A
1
Node centrality index which ranks nodes by their degree.
DegreeCentrality
Properties
$ n = $ 6440
$ f_{BC} = $ 1.00016
$ r = $ 0.119904
$ C = $ 0.0995672
Location
$ x_{(1)} = $ 0
$ x_{(n)} = $ 708
$ \widetilde{x}_{O_{min}} = $ 0
$ \widetilde{x}_{N_{min}} = $ 0
$ \widetilde{x}_{0.25} = $ 19
$ \widetilde{x}_{Med} = $ 56
$ \widetilde{x}_{0.75} = $ 112
$ \widetilde{x}_{N_{max}} = $ 251
$ \widetilde{x}_{O_{max}} = $ 391
$ \overline{x} = $ 78.0217
$ \overline{x}_{IQ} = $ 58.9543
$ \overline{x}_{R} = $ 354
$ \overline{x}_{H^{-1}} = $ nan
$ \overline{x}_{H^{2}} = $ 111.034
$ \overline{x}_{H^{3}} = $ 142.569
Dispersion
$ s^{2}_{x} = $ 6242.08
$ s_{x} = $ 79.0069
$ v_{x} = $ 1.01263
$ R = $ 708
$ R_{IQ} = $ 93
Shape
$ S_{YP} = $ 0.836196
$ S_{M} = $ 1.95083
$ \gamma = $ 6.25691
Rank
$ s^{2}_{rk_{x}} = $ 3.45621e+06
$ s_{rk_{x}} = $ 1859.09
$ v_{rk_{x}} = $ 0.577267
Binning
$ k_{PDF} = $ 20
$ k_{CDF} = $ 169
$ x_{D_{PDF}} = $ 17.7
All nodes in a connected component are reachable from each other.
ConnectedComponents
Properties
$ n = $ 1
$ f_{BC} = $ nan
$ r = $ nan
$ C = $ nan
Location
$ x_{(1)} = $ 6402
$ x_{(n)} = $ 6402
$ \widetilde{x}_{O_{min}} = $ 6402
$ \widetilde{x}_{N_{min}} = $ 6402
$ \widetilde{x}_{0.25} = $ 6402
$ \widetilde{x}_{Med} = $ 6402
$ \widetilde{x}_{0.75} = $ 6402
$ \widetilde{x}_{N_{max}} = $ 6402
$ \widetilde{x}_{O_{max}} = $ 6402
$ \overline{x} = $ 6402
$ \overline{x}_{IQ} = $ 6402
$ \overline{x}_{R} = $ 6402
$ \overline{x}_{H^{-1}} = $ 6402
$ \overline{x}_{H^{2}} = $ 6402
$ \overline{x}_{H^{3}} = $ 6402
Dispersion
$ s^{2}_{x} = $ nan
$ s_{x} = $ nan
$ v_{x} = $ nan
$ R = $ 0
$ R_{IQ} = $ 0
Shape
$ S_{YP} = $ nan
$ S_{M} = $ nan
$ \gamma = $ nan
Rank
$ s^{2}_{rk_{x}} = $ nan
$ s_{rk_{x}} = $ nan
$ v_{rk_{x}} = $ nan
Binning
$ k_{PDF} = $ 1
$ k_{CDF} = $ 1
$ x_{D_{PDF}} = $ 6402
unweighted
distance from  233  to  471 :  3.0
Diameter of graph  b'Graph(name=G#65, n=1000, m=4993)' :  <_NetworKit.Diameter object at 0x10fd6d3c8>
unweighted
distance from  995  to  1403 :  21.0
Diameter of graph  b'Graph(name=airfoil1, n=4253, m=12289)' :  <_NetworKit.Diameter object at 0x10fd6da20>
unweighted
distance from  4481  to  4355 :  2.0
Diameter of graph  b'Graph(name=G#75, n=6402, m=251230)' :  <_NetworKit.Diameter object at 0x10fd6da58>

When you answered all questions, proceed with Tutorial #3.