# Tutorial "Algorithmic Methods for Network Analysis with NetworKit" (Part 2)

### Basic Network Overview in NetworKit

Such basic features of a network as those computed in Q&A Session #1 can also be displayed easily with one of NetworKit's convenience functions (before we can show them we have to import the Python module networkit again and change to the correct folder):



In :

from networkit import *




In :

cd ~/Documents/workspace/NetworKit




/Users/Henning/Documents/workspace/NetworKit




In :

%matplotlib inline




In :

pf = profiling.Profile.create(G, preset="minimal")
pf.show()




PGPgiantcompo
10680
24316
0.000426403
False
False
0
N/A
N/A
1

Node centrality index which ranks nodes by their degree.
DegreeCentrality

Properties

$n =$

10680

$f_{BC} =$

1.00009

$r =$

0.239188

$C =$

0.0187782

Location

$x_{(1)} =$

1

$x_{(n)} =$

205

$\widetilde{x}_{O_{min}} =$

1

$\widetilde{x}_{N_{min}} =$

1

$\widetilde{x}_{0.25} =$

1

$\widetilde{x}_{Med} =$

2

$\widetilde{x}_{0.75} =$

4

$\widetilde{x}_{N_{max}} =$

8

$\widetilde{x}_{O_{max}} =$

13

$\overline{x} =$

4.55356

$\overline{x}_{IQ} =$

2.15281

$\overline{x}_{R} =$

103

$\overline{x}_{H^{-1}} =$

1.75165

$\overline{x}_{H^{2}} =$

9.27234

$\overline{x}_{H^{3}} =$

16.4642

Dispersion

$s^{2}_{x} =$

65.2474

$s_{x} =$

8.07759

$v_{x} =$

1.77391

$R =$

204

$R_{IQ} =$

3

Shape

$S_{YP} =$

0.948386

$S_{M} =$

6.59782

$\gamma =$

81.4638

Rank

$s^{2}_{rk_{x}} =$

8.83549e+06

$s_{rk_{x}} =$

2972.46

$v_{rk_{x}} =$

0.556588

Binning

$k_{PDF} =$

20

$k_{CDF} =$

83

$x_{D_{PDF}} =$

6.1

All nodes in a connected component are reachable from each other.
ConnectedComponents

Properties

$n =$

1

$f_{BC} =$

nan

$r =$

nan

$C =$

nan

Location

$x_{(1)} =$

10680

$x_{(n)} =$

10680

$\widetilde{x}_{O_{min}} =$

10680

$\widetilde{x}_{N_{min}} =$

10680

$\widetilde{x}_{0.25} =$

10680

$\widetilde{x}_{Med} =$

10680

$\widetilde{x}_{0.75} =$

10680

$\widetilde{x}_{N_{max}} =$

10680

$\widetilde{x}_{O_{max}} =$

10680

$\overline{x} =$

10680

$\overline{x}_{IQ} =$

10680

$\overline{x}_{R} =$

10680

$\overline{x}_{H^{-1}} =$

10680

$\overline{x}_{H^{2}} =$

10680

$\overline{x}_{H^{3}} =$

10680

Dispersion

$s^{2}_{x} =$

nan

$s_{x} =$

nan

$v_{x} =$

nan

$R =$

0

$R_{IQ} =$

0

Shape

$S_{YP} =$

nan

$S_{M} =$

nan

$\gamma =$

nan

Rank

$s^{2}_{rk_{x}} =$

nan

$s_{rk_{x}} =$

nan

$v_{rk_{x}} =$

nan

Binning

$k_{PDF} =$

1

$k_{CDF} =$

1

$x_{D_{PDF}} =$

10680

NetworKit_pageEmbed("NetworKit_Page_e9eb0f087cc364f8_0");



Now we know how to access the most fundamental features of a network. Some properties that could be regarded as fundamental are missing because their calculation would take too long for larger graphs.

## Eulerian Cycles

Before we look at different network types, let us reflect on the distant roots of network analysis. Graph theory is one of its ancestors and has been around for several centuries now. As an example, think of Euler's problem from 1736 of finding a 'Eulerian tour/cycle' over the Pregel bridges in Königsberg.

### Q&A session #2

1. Do you remember how the problem of deciding wheter a graph has a Eulerian cycle can be solved for arbitrary undirected graphs? If not, use the web to find helpful characterizations of graphs with Eulerian cycles. Write it down here. Answer:

2. Then enter code below to decide whether a graph has such a cycle or not. Insert code in next cell

3. Test it on the PGP graph! What is the result? Does the result meet your expectations? Answer:



In :

# 2-1) An undirected graph is Eulerian iff all its vertices have even degree

# 2-2) and 2-3) Decide whether graph is Eulerian or not
eulerian = True
for v in G.nodes():
if (G.degree(v) % 2 != 0):
eulerian = False
break

if (eulerian == True):
print("2-3) Graph is Eulerian")
else:
print("2-3) Graph is NOT Eulerian")




2-3) Graph is NOT Eulerian



## Differences between network types

As indicated by the previous task, graph theory and graph algorithms have been research foci for quite some time. In comparison, complex networks and their analysis have become a focus of investigation only recently. The reason for our interest in complex networks is their similarity to many real-world phenomena such as social interactions, web graphs, food webs, protein interactions and so forth.

But what makes a network complex or not complex?

Well, complex networks have 'non-trivial' topological features. Let us explore this statement in more detail with the help of data. To this end, we look at a social network, a technical mesh and an Erdös-Renyi random graph.



In :

# Load/generate 3 graphs of different types
gen = generators.ErdosRenyiGenerator(1000, 0.01)
er1000 = gen.generate()



Some context on these networks is given below, first for MIT8. It stems from a larger collection of Facebook networks from the early days of the online social network. MI8 models

"Facebook friendships at 100 US universities at some time in 2005, as well as a number of node attributes such as dorm, gender, graduation year, and academic major. The data was apparently provided directly by Facebook. (...) It does not include the names of individual or even of any of the node attributes (they have been given integer ids)." http://sociograph.blogspot.de/2011/03/facebook100-data-and-parser-for-it.html

The airfoil1 graph is a mesh that stems from a two-dimensional numerical simulation where one is interested in the airflow around an airplane wing. Meshes are usually easy to visualize, two-dimensional meshes are even planar in most (if not all reasonable) cases. The picture below illustrates this. The colors signify different vertex blocks, which we can ignore here.



In :

from IPython.core.display import Image
Image('input/airfoil1-10p.png')




Out:



The third network is a random graph generated according to the Erdös-Renyi $G(n, p)$ model. This model has been analyzed theoretically over the last 50 years or so. As we will see, however, it deviates dramatically from real networks in important aspects.

### Giant Connected Component

Some types of realistic networks such as social ones usually have more than one connected component. However, even if there is more than one connected component, there is usually only one big one. Let us take a closer look at these giant components in particular, the differences of the three networks and other interesting properties in general.

### Q&A Session #3

1. Print the NetworKit overview for each of the three graphs! Answer:

2. For those graphs with more than one connected component, extract the largest connected component $C$ and continue working with $C$. Enter the code for this in the cell below this one.

3. What are the most striking topological differences between the three graphs in terms of the analytics kernels presented in the lecture? Answer:

4. Pick an arbitrary pair of vertices $(u, v)$ in each network. Since we work with connected graphs, the two nodes are connected. What is the smallest number of hops a virus needs to make in the network to reach $v$ if it starts at $u$? What if $u$ and $v$ are chosen as worst case pair? Answer:

5. What do you think: Why are complex networks more difficult to work with, what makes them 'complex'? Answer:



In :

# Code for Q&A Session #3
# 3-1)
pf = profiling.Profile.create(airf1, preset="minimal")
pf.show()

pf = profiling.Profile.create(er1000, preset="minimal")
pf.show()

pf = profiling.Profile.create(mit8, preset="minimal")
pf.show()

# 3-2) extract largest connected component
mit8lc = workflows.extractLargestComponent(mit8)
pf = profiling.Profile.create(mit8lc, preset="minimal")
pf.show()

# for comparison:

# 3-3) Look at diameter, clustering coefficients, k-core decomposition, degree distribution

# 3-4)
for g in {mit8lc, airf1, er1000}:
u = g.randomNode()
v = g.randomNode()
if g.isWeighted():
print("weighted")
dijkstra = graph.Dijkstra(g, u)
dijkstra.run()
dists = dijkstra.getDistances()
else:
print("unweighted")
bfs = graph.BFS(g, u)
bfs.run()
dists = bfs.getDistances()

print("distance from ", u, " to ", v, ": ", dists[v])
print("Diameter of graph ", g.toString(), ": ", distance.Diameter(g))

# 3-5)
# Low diameter: $k$-hop exploration visits large parts of the graph even for small $k$, cache inefficiency
# Degree distribution: load imbalance in parallel computations; may require more flexible data structure
# Visualization: Traditional visualization tools often produce 'hairballs'




airfoil1
4253
12289
0.00135912
False
False
0
N/A
N/A
1

Node centrality index which ranks nodes by their degree.
DegreeCentrality