Tutorial "Algorithmic Methods for Network Analysis with NetworKit" (Part 2)



In [1]:

    
from networkit import *



In [2]:

    
%matplotlib inline



In [3]:

    
cd ~/Documents/workspace/NetworKit









    



/Users/Henning/Documents/workspace/NetworKit



In [4]:

    
G = readGraph("input/PGPgiantcompo.graph", Format.METIS)

Eulerian Cycles

Before we look at different network types, let us reflect on the distant roots of network analysis. Graph theory is one of its ancestors and has been around for several centuries now. As an example, think of Euler's problem from 1736 of finding a 'Eulerian tour/cycle' over the Pregel bridges in Königsberg.

Q&A session #2

Do you remember how the problem of deciding wheter a graph has a Eulerian cycle can be solved for arbitrary undirected graphs? If not, use the web to find helpful characterizations of graphs with Eulerian cycles. Write it down here. Answer:
Then enter code below to decide whether a graph has such a cycle or not. Insert code in next cell
Test it on the PGP graph! What is the result? Does the result meet your expectations? Answer:



In [5]:

    
# 2-2) and 2-3) Decide whether graph is Eulerian or not

Differences between network types

As indicated by the previous task, graph theory and graph algorithms have been research foci for quite some time. In comparison, complex networks and their analysis have become a focus of investigation only recently. The reason for our interest in complex networks is their similarity to many real-world phenomena such as social interactions, web graphs, food webs, protein interactions and so forth.

But what makes a network complex or not complex?

Well, complex networks have 'non-trivial' topological features. Let us explore this statement in more detail with the help of data. To this end, we look at a social network, a technical mesh and an Erdös-Renyi random graph.



In [6]:

    
# Load/generate 3 graphs of different types
mit8 = readGraph("input/MIT8.edgelist", Format.EdgeListTabZero)
airf1 = readGraph("input/airfoil1.graph", Format.METIS)
gen = generators.ErdosRenyiGenerator(1000, 0.01)
er1000 = gen.generate()

Some context on these networks is given below, first for MIT8. It stems from a larger collection of Facebook networks from the early days of the online social network. MI8 models

"Facebook friendships at 100 US universities at some time in 2005, as well as a number of node attributes such as dorm, gender, graduation year, and academic major. The data was apparently provided directly by Facebook. (...) It does not include the names of individual or even of any of the node attributes (they have been given integer ids)." http://sociograph.blogspot.de/2011/03/facebook100-data-and-parser-for-it.html

The airfoil1 graph is a mesh that stems from a two-dimensional numerical simulation where one is interested in the airflow around an airplane wing. Meshes are usually easy to visualize, two-dimensional meshes are even planar in most (if not all reasonable) cases. The picture below illustrates this. The colors signify different vertex blocks, which we can ignore here.



In [7]:

    
from IPython.core.display import Image
Image('input/airfoil1-10p.png')









    Out[7]:

The third network is a random graph generated according to the Erdös-Renyi $G(n, p)$ model. This model has been analyzed theoretically over the last 50 years or so. As we will see, however, it deviates dramatically from real networks in important aspects.

Giant Connected Component

Some types of realistic networks such as social ones usually have more than one connected component. However, even if there is more than one connected component, there is usually only one big one. Let us take a closer look at these giant components in particular, the differences of the three networks and other interesting properties in general.

Q&A Session #3

Print the NetworKit overview for each of the three graphs! Answer:
For those graphs with more than one connected component, extract the largest connected component $C$ and continue working with $C$. Enter the code for this in the cell below this one.
What are the most striking topological differences between the three graphs in terms of the analytics kernels presented in the lecture? Answer:
Pick an arbitrary pair of vertices $(u, v)$ in each network. Since we work with connected graphs, the two nodes are connected. What is the smallest number of hops a virus needs to make in the network to reach $v$ if it starts at $u$? What if $u$ and $v$ are chosen as worst case pair? Answer:
What do you think: Why are complex networks more difficult to work with, what makes them 'complex'? Answer:
Which parts of your answer to (5) concern theoretical analysis, which concern computational aspects? Answer:



In [8]:

    
# Code for Q&A Session #3
# 3-2) extract largest connected component

When you answered all questions, proceed with Tutorial #3.