This evaluates the Eigenvector Centrality and PageRank implemented in Python against C++-native EVZ and PageRank. The Python implementation uses SciPy (and thus ARPACK) to compute the eigenvectors, while the C++ method implements a power iteration method itself.
Please note: This notebook requires the pandas package. If you do not have it installed, you can use the "Centrality" notebook instead - but this one will look much nicer, so install pandas. ;)
In [1]:
cd ../../
In [2]:
import networkit
import pandas as pd
import random as rd
In [3]:
G = networkit.graphio.readGraph("input/celegans_metabolic.graph", networkit.Format.METIS)
First, we just compute the Python EVZ and display a sample. The "scores()" method returns a list of centrality scores in order of the vertices. Thus, what you see below are the (normalized, see the respective argument) centrality scores for G.nodes()[0], G.nodes()[1], ...
In [4]:
evzSciPy = networkit.centrality.SciPyEVZ(G, normalized=True)
evzSciPy.run()
scoresTableEVZ = pd.DataFrame({"Python EVZ": evzSciPy.scores()[:10]})
scoresTableEVZ
Out[4]:
We now take a look at the 10 most central vertices according to the four heuristics. Here, the centrality algorithms offer the ranking() method that returns a list of (vertex, centrality) ordered by centrality. We first compute the remaining values...
In [5]:
evz = networkit.centrality.EigenvectorCentrality(G, True)
evz.run()
pageRank = networkit.centrality.PageRank(G, 0.95)
pageRank.run()
pageRankSciPy = networkit.centrality.SciPyPageRank(G, 0.95, normalized=True)
pageRankSciPy.run()
Out[5]:
... then display it. What you will see is a list of the 10 most important vertices and their respective centralities according to the C++ / Python version of eigenvector centrality:
In [6]:
rankTableEVZ = pd.DataFrame({"Python EVZ": evzSciPy.ranking()[:10], "C++ EVZ": evz.ranking()[:10]})
rankTableEVZ
Out[6]:
If everything went well, this should look at least similar. Now we do the same for the PageRank instead of EVZ:
In [7]:
rankTablePR = pd.DataFrame({"Python PageRank": pageRankSciPy.ranking()[:10], "C++ PageRank": pageRank.ranking()[:10]})
rankTablePR
Out[7]:
If everything went well, these should look similar, too.
To make sure that not just the top scoring vertices are comparable in both implementations, here are 10 randomly selected vertices in comparison:
In [8]:
vertices = rd.sample(G.nodes(), 10)
randTableEVZ = pd.DataFrame({"Python EVZ": evzSciPy.scores(), "C++ EVZ": evz.scores()})
randTableEVZ.loc[vertices]
Out[8]:
Finally, we take a look at the relative differences between the computed centralities for the vertices:
In [9]:
differences = [(max(x[0], x[1]) / min(x[0], x[1])) - 1 for x in zip(evz.scores(), evzSciPy.scores())]
print("Average relative difference: {}".format(sum(differences) / len(differences)))
print("Maximum relative difference: {}".format(max(differences)))
In [ ]: