Centrality

This evaluates the Eigenvector Centrality and PageRank implemented in Python against C++-native EVZ and PageRank. The Python implementation uses SciPy (and thus ARPACK) to compute the eigenvectors, while the C++ method implements a power iteration method itself.

Please note: This notebook requires the pandas package. If you do not have it installed, you can use the "Centrality" notebook instead - but this one will look much nicer, so install pandas. ;)


In [1]:
cd ../../


/amd.home/home/maxv/workspace/hiwi/NetworKit

In [2]:
import networkit
import pandas as pd
import random as rd


Update to Python >=3.4 recommended - support for < 3.4 may be discontinued in the future
 WARNING: module 'sklearn' not found, supervised link prediction won't be available 

In [3]:
G = networkit.graphio.readGraph("input/celegans_metabolic.graph", networkit.Format.METIS)

First, we just compute the Python EVZ and display a sample. The "scores()" method returns a list of centrality scores in order of the vertices. Thus, what you see below are the (normalized, see the respective argument) centrality scores for G.nodes()[0], G.nodes()[1], ...


In [4]:
evzSciPy = networkit.centrality.SciPyEVZ(G, normalized=True)
evzSciPy.run()
scoresTableEVZ = pd.DataFrame({"Python EVZ": evzSciPy.scores()[:10]})
scoresTableEVZ


Out[4]:
Python EVZ
0 0.038891
1 0.025204
2 0.030100
3 0.023008
4 0.015014
5 0.047450
6 0.038774
7 0.000182
8 0.004303
9 0.019821

We now take a look at the 10 most central vertices according to the four heuristics. Here, the centrality algorithms offer the ranking() method that returns a list of (vertex, centrality) ordered by centrality. We first compute the remaining values...


In [5]:
evz = networkit.centrality.EigenvectorCentrality(G, True)
evz.run()

pageRank = networkit.centrality.PageRank(G, 0.95)
pageRank.run()

pageRankSciPy = networkit.centrality.SciPyPageRank(G, 0.95, normalized=True)
pageRankSciPy.run()


Out[5]:
<networkit.centrality.SciPyPageRank at 0x7fe677a444d0>

... then display it. What you will see is a list of the 10 most important vertices and their respective centralities according to the C++ / Python version of eigenvector centrality:


In [6]:
rankTableEVZ = pd.DataFrame({"Python EVZ": evzSciPy.ranking()[:10], "C++ EVZ": evz.ranking()[:10]})
rankTableEVZ


Out[6]:
C++ EVZ Python EVZ
0 (185, 0.45556295578988704) (185, 0.379989203942)
1 (146, 0.28001130616810294) (146, 0.255898448442)
2 (407, 0.2673310538071424) (407, 0.252999927439)
3 (144, 0.23821972580620288) (144, 0.219520348228)
4 (204, 0.17660126272043922) (204, 0.180141731247)
5 (227, 0.16982894291110084) (230, 0.170370221466)
6 (226, 0.16982894291110084) (227, 0.153439766232)
7 (230, 0.16960898722498555) (226, 0.153439766232)
8 (152, 0.14877108011932902) (152, 0.152560173456)
9 (425, 0.1398532276524453) (425, 0.136202654583)

If everything went well, this should look at least similar. Now we do the same for the PageRank instead of EVZ:


In [7]:
rankTablePR = pd.DataFrame({"Python PageRank": pageRankSciPy.ranking()[:10], "C++ PageRank": pageRank.ranking()[:10]})
rankTablePR


Out[7]:
C++ PageRank Python PageRank
0 (185, 0.05754868914312292) (185, 0.0547993917697)
1 (146, 0.028745245322451197) (146, 0.0218050992014)
2 (407, 0.025288709544780706) (351, 0.0204455164246)
3 (144, 0.024359216733545457) (407, 0.0202868266975)
4 (226, 0.01741634101618819) (144, 0.0178571479313)
5 (227, 0.01741634101618819) (227, 0.0176295002476)
6 (152, 0.015082439274723199) (226, 0.0173608622942)
7 (425, 0.014952542568116086) (154, 0.0162640635081)
8 (204, 0.014704749200614788) (152, 0.0139968728127)
9 (230, 0.01423493042741351) (425, 0.0128311548432)

If everything went well, these should look similar, too.

To make sure that not just the top scoring vertices are comparable in both implementations, here are 10 randomly selected vertices in comparison:


In [8]:
vertices = rd.sample(G.nodes(), 10)
randTableEVZ = pd.DataFrame({"Python EVZ": evzSciPy.scores(), "C++ EVZ": evz.scores()})
randTableEVZ.loc[vertices]


Out[8]:
C++ EVZ Python EVZ
227 0.169829 0.153440
19 0.000706 0.000195
358 0.013178 0.016253
424 0.018395 0.022353
241 0.001555 0.001056
181 0.016049 0.019338
170 0.018214 0.017727
101 0.020514 0.021625
389 0.034151 0.040678
184 0.038353 0.048094

Finally, we take a look at the relative differences between the computed centralities for the vertices:


In [9]:
differences = [(max(x[0], x[1]) / min(x[0], x[1])) - 1 for x in zip(evz.scores(), evzSciPy.scores())]
print("Average relative difference: {}".format(sum(differences) / len(differences)))
print("Maximum relative difference: {}".format(max(differences)))


Average relative difference: 0.22655120802802445
Maximum relative difference: 4.1182013118629675

In [ ]: