Centrality

This evaluates the Eigenvector Centrality and PageRank implemented in Python against C++-native EVZ and PageRank. The Python implementation uses SciPy (and thus ARPACK) to compute the eigenvectors, while the C++ method implements a power iteration method itself.


In [1]:
cd ../../


/amd.home/home/maxv/workspace/hiwi/NetworKit

In [2]:
import networkit


Update to Python >=3.4 recommended - support for < 3.4 may be discontinued in the future
 WARNING: module 'sklearn' not found, supervised link prediction won't be available 

In [3]:
G = networkit.graphio.readGraph("input/celegans_metabolic.graph", networkit.Format.METIS)

First, we just compute the Python EVZ and display a sample. The "scores()" method returns a list of centrality scores in order of the vertices. Thus, what you see below are the (normalized, see the respective argument) centrality scores for G.nodes()[0], G.nodes()[1], ...


In [4]:
evzSciPy = networkit.centrality.SciPyEVZ(G, normalized=True)
evzSciPy.run()
evzSciPy.scores()[:10]


Out[4]:
[0.038890947542671417,
 0.025204387816348928,
 0.030099862584687796,
 0.023008066197884643,
 0.015014306929265246,
 0.047450338825435673,
 0.03877379800819393,
 0.00018201165419890557,
 0.0043027361546468185,
 0.019820501996514393]

We now take a look at the 10 most central vertices according to the four heuristics. Here, the centrality algorithms offer the ranking() method that returns a list of (vertex, centrality) ordered by centrality.


In [5]:
evzSciPy.ranking()[:10]


Out[5]:
[(185, 0.37998920394223168),
 (146, 0.25589844844192655),
 (407, 0.25299992743917177),
 (144, 0.21952034822779035),
 (204, 0.18014173124747021),
 (230, 0.17037022146635697),
 (227, 0.15343976623211927),
 (226, 0.15343976623211925),
 (152, 0.15256017345576342),
 (425, 0.13620265458341316)]

Compute the EVZ using the C++ backend and also display the 10 most important vertices, just as above. This should hopefully look similar...

Please note: The normalization argument may not be passed as a named argument to the C++-backed centrality measures. This is due to some limitation in the C++ wrapping code.


In [6]:
evz = networkit.centrality.EigenvectorCentrality(G, True)
evz.run()
evz.ranking()[:10]


Out[6]:
[(185, 0.4555629557898871),
 (146, 0.280011306168103),
 (407, 0.26733105380714245),
 (144, 0.2382197258062029),
 (204, 0.17660126272043924),
 (227, 0.16982894291110087),
 (226, 0.16982894291110087),
 (230, 0.16960898722498557),
 (152, 0.14877108011932902),
 (425, 0.1398532276524453)]

Now, let's take a look at the PageRank. First, compute the PageRank using the C++ backend and display the 10 most important vertices. The second argument to the algorithm is the dampening factor, i.e. the probability that a random walk just stops at a vertex and instead teleports to some other vertex.


In [7]:
pageRank = networkit.centrality.PageRank(G, 0.95, True)
pageRank.run()
pageRank.ranking()[:10]


Out[7]:
[(185, 0.08967935808573076),
 (146, 0.036293663539769275),
 (144, 0.031391049006808995),
 (407, 0.02996667806766083),
 (227, 0.021190293055750178),
 (226, 0.021190293055750178),
 (425, 0.019907206754720692),
 (152, 0.017614662527267698),
 (230, 0.016888616220117537),
 (228, 0.01624170578197208)]

Same in Python...


In [8]:
SciPyPageRank = networkit.centrality.SciPyPageRank(G, 0.95, normalized=True)
SciPyPageRank.run()
SciPyPageRank.ranking()[:10]


Out[8]:
[(185, 0.054799391769678088),
 (146, 0.021805099201351385),
 (351, 0.020445516424587442),
 (407, 0.020286826697537379),
 (144, 0.017857147931302247),
 (227, 0.017629500247607024),
 (226, 0.017360862294165981),
 (154, 0.016264063508127456),
 (152, 0.013996872812656708),
 (425, 0.012831154843169931)]

If everything went well, these should look similar, too.

Finally, we take a look at the relative differences between the computed centralities for the vertices:


In [9]:
differences = [(max(x[0], x[1]) / min(x[0], x[1])) - 1 for x in zip(evz.scores(), evzSciPy.scores())]
print("Average relative difference: {}".format(sum(differences) / len(differences)))
print("Maximum relative difference: {}".format(max(differences)))


Average relative difference: 0.22655120802802436
Maximum relative difference: 4.118201311862968

In [ ]: