# Centralities

In this section, I'm going to learn how Centrality works and try to interpret the data based on small real dataset. I'm using Facebook DataSet from SNAP https://snap.stanford.edu/data/egonets-Facebook.html. The data is included in this repository for easier access. The data is in EdgeList format (source, target).

I'm going to use Networkx, iGraph and graph_tool to find all the centralities.

``````

In [4]:

%matplotlib inline
import networkx as nx
import matplotlib.pyplot as plt
import operator
import timeit

``````
``````

In [4]:

``````
``````

In [6]:

print nx.info(g_fb)

``````
``````

Name:
Type: Graph
Number of nodes: 4039
Number of edges: 88234
Average degree:  43.6910

``````
``````

In [7]:

print nx.is_directed(g_fb)

``````
``````

False

``````

Now let's find the celebrities. The most basic centrality is Degree Centrality which is the sum of all in and out nodes (in the case of directed graph).

``````

In [12]:

dg_centrality = nx.degree_centrality(g_fb)
sorted_dg_centrality = sorted(dg_centrality.items(), key=operator.itemgetter(1), reverse=True)
sorted_dg_centrality[:10]

``````
``````

Out[12]:

[(107, 0.258791480931154),
(1684, 0.1961367013372957),
(1912, 0.18697374938088163),
(3437, 0.13546310054482416),
(0, 0.08593363051015354),
(2543, 0.07280832095096582),
(2347, 0.07206537890044576),
(1888, 0.0629024269440317),
(1800, 0.06067360079247152),
(1663, 0.058197127290737984)]

``````

We can see that the node 107 has the highest degree centrality which means node 107 has the highest number of connected nodes. We can prove this by getting the degree of node 107 to see how many friends of node 107 has

``````

In [13]:

nx.degree(g_fb, [107])

``````
``````

Out[13]:

{107: 1045}

``````

Node 107 has 1045 friends and we can divide that by number of nodes to get the normalized degree centrality

``````

In [14]:

float(nx.degree(g_fb, [107]).values()[0]) / g_fb.number_of_nodes()

``````
``````

Out[14]:

0.25872740777420156

``````

Degree centrality might be the easiest number to calculate but it only shows the number of nodes connected which in real social network it might not be very useful as you might have a million followers but if the majority of them is bots then the number is not telling anything new.

Now let's try Betweenness which count all of the shortest path going throw each now. This might mean that if you have the highest shortest path going through you, you might be considered as bridge of your entire network.

Nodes with high betweenness are important in communication and information diffusion

We will be using multiprocessing so we can parallel the computation and distribute the load.

``````

In [15]:

from multiprocessing import Pool
import itertools

``````
``````

In [16]:

def partitions(nodes, n):
"Partitions the nodes into n subsets"
nodes_iter = iter(nodes)
while True:
partition = tuple(itertools.islice(nodes_iter,n))
if not partition:
return
yield partition

``````
``````

In [17]:

def btwn_pool(G_tuple):
return nx.betweenness_centrality_source(*G_tuple)

``````
``````

In [18]:

def between_parallel(G, processes = None):
p = Pool(processes=processes)
part_generator = 4*len(p._pool)
node_partitions = list(partitions(G.nodes(), int(len(G)/part_generator)))
num_partitions = len(node_partitions)

bet_map = p.map(btwn_pool,
zip([G]*num_partitions,
[True]*num_partitions,
[None]*num_partitions,
node_partitions))

bt_c = bet_map[0]
for bt in bet_map[1:]:
for n in bt:
bt_c[n] += bt[n]
return bt_c

``````

Let's try with multiprocesser.

``````

In [20]:

start = timeit.default_timer()
bt = between_parallel(g_fb)
stop = timeit.default_timer()
top = 10

max_nodes =  sorted(bt.iteritems(), key = lambda v: -v[1])[:top]
bt_values = [5]*len(g_fb.nodes())
bt_colors = [0]*len(g_fb.nodes())
for max_key, max_val in max_nodes:
bt_values[max_key] = 150
bt_colors[max_key] = 2

print 'It takes {} seconds to finish'.format(stop - start)
print max_nodes

``````
``````

It takes 152.425301075 seconds to finish
[(107, 0.4805180785560146), (1684, 0.33779744973019543), (3437, 0.23611535735892838), (1912, 0.22929533958687434), (1085, 0.14901509211665437), (0, 0.1463059214744287), (698, 0.11533045020561006), (567, 0.09631033121856328), (58, 0.08436020590796522), (428, 0.06430906239323834)]

``````

Now let's try with just one processor

``````

In [22]:

start = timeit.default_timer()
bt = nx.betweenness_centrality(g_fb)
stop = timeit.default_timer()
top = 10

max_nodes =  sorted(bt.iteritems(), key = lambda v: -v[1])[:top]
bt_values = [5]*len(g_fb.nodes())
bt_colors = [0]*len(g_fb.nodes())
for max_key, max_val in max_nodes:
bt_values[max_key] = 150
bt_colors[max_key] = 2

print 'It takes {} seconds to finish'.format(stop - start)
print max_nodes

``````
``````

It takes 283.723123074 seconds to finish
[(107, 0.48051807855601475), (1684, 0.3377974497302), (3437, 0.23611535735892797), (1912, 0.2292953395868785), (1085, 0.14901509211665198), (0, 0.14630592147442847), (698, 0.11533045020560787), (567, 0.09631033121856226), (58, 0.08436020590796657), (428, 0.0643090623932386)]

``````

# Page rank

We're going to try PageRank algorithm. This is very similar to Google's PageRank which they use incoming links to determine the "popularity"

``````

In [24]:

g_fb_pr = nx.pagerank(g_fb)

``````
``````

In [25]:

top = 10
max_pagerank = sorted(g_fb_pr.iteritems(), key = lambda v: -v[1])[:top]
max_pagerank

``````
``````

Out[25]:

[(3437, 0.007614586844749602),
(107, 0.006936420955866113),
(1684, 0.006367162138306825),
(0, 0.006289602618466542),
(1912, 0.0038769716008844957),
(348, 0.002348096972780577),
(686, 0.0022193592598000193),
(3980, 0.0021703235790099928),
(414, 0.0018002990470702264),
(698, 0.0013171153138368812)]

``````

We can see that now the score is different as node 3437 is more popular than node 107.

# Who is a "Gray Cardinal"

There's another metric that we can measure most influential node. It's called eigenvector centrality. To put it simply it means that if you're well connected to a lot of important people that means you're important or most influential as well.

``````

In [26]:

g_fb_eg = nx.eigenvector_centrality(g_fb)

``````
``````

In [27]:

top = 10
max_eg = sorted(g_fb_eg.iteritems(), key = lambda v: -v[1])[:top]
max_eg

``````
``````

Out[27]:

[(1912, 0.09540688873596533),
(2266, 0.08698328226321961),
(2206, 0.08605240174265634),
(2233, 0.08517341350597848),
(2464, 0.0842787836468596),
(2142, 0.08419312450068117),
(2218, 0.08415574433673877),
(2078, 0.08413617905810125),
(2123, 0.08367142125897375),
(1993, 0.08353243711860492)]

``````

Now we get quite a different result. This would mean that node 1912 is connected to more important people in the entire network that means that node is more influential than the rest of the network.

# iGraph with SNAP Facebook Dataset

Networkx is easy to install and great to start with. However, as it's written in Python it's quite slow. I'm going to try iGraph which is C based. I'm hoping that this would yield the same result but faster.

``````

In [5]:

from igraph import *
import timeit

``````
``````

In [6]:

``````
``````

In [7]:

print igraph_fb.summary()

``````
``````

IGRAPH U--- 4039 88234 --

``````

# Betweenness

``````

In [8]:

def betweenness_centralization(G):
vnum = G.vcount()
if vnum < 3:
raise ValueError("graph must have at least three vertices")
denom = (vnum-1)*(vnum-2)

temparr = [2*i/denom for i in G.betweenness()]
return temparr

``````
``````

In [9]:

start = timeit.default_timer()
igraph_betweenness = betweenness_centralization(igraph_fb)
stop = timeit.default_timer()
print 'It takes {} seconds to finish'.format(stop - start)

``````
``````

It takes 3.41120696068 seconds to finish

``````
``````

In [10]:

igraph_betweenness.sort(reverse=True)
print igraph_betweenness[:10]

``````
``````

[0.4805180785560146, 0.3377974497302, 0.23611535735892794, 0.22929533958687848, 0.14901509211665237, 0.14630592147442847, 0.11533045020560788, 0.09631033121856228, 0.08436020590796657, 0.0643090623932386]

``````

# Closeness

``````

In [11]:

start = timeit.default_timer()
igraph_closeness = igraph_fb.closeness()
stop = timeit.default_timer()
print 'It takes {} seconds to finish'.format(stop - start)

``````
``````

It takes 3.21295285225 seconds to finish

``````
``````

In [12]:

igraph_closeness.sort(reverse=True)
print igraph_closeness[:10]

``````
``````

[0.45969945355191255, 0.3974018305284913, 0.3948371956585509, 0.3939127889961955, 0.39360561458231796, 0.37049270575282134, 0.36991572004397216, 0.3698479575013739, 0.3695433330282786, 0.36655773420479304]

``````

# Eigen Value

``````

In [13]:

start = timeit.default_timer()
igraph_eg = igraph_fb.evcent()
stop = timeit.default_timer()
print 'It takes {} seconds to finish'.format(stop - start)

``````
``````

It takes 0.0219130516052 seconds to finish

``````
``````

In [14]:

igraph_eg.sort(reverse=True)
print igraph_eg[:10]

``````
``````

[1.0, 0.9117190175727714, 0.901962632218424, 0.8927488203018591, 0.8833723691600294, 0.8824740496738581, 0.8820826938160473, 0.8818776189568721, 0.8770062766705519, 0.8755494858467869]

``````

# PageRank

``````

In [15]:

start = timeit.default_timer()
igraph_pr = igraph_fb.pagerank()
stop = timeit.default_timer()
print 'It takes {} seconds to finish'.format(stop - start)

``````
``````

It takes 0.0559718608856 seconds to finish

``````
``````

In [16]:

igraph_pr.sort(reverse=True)
print igraph_pr[:10]

``````
``````

[0.0075745665246258935, 0.006888375869731625, 0.0063084887921967805, 0.0062246948047383005, 0.0038165503710367713, 0.002317366308283276, 0.002216791818397391, 0.002156551114912813, 0.0017822888082657136, 0.00129416751155367]

``````

We can see that iGraph yields similar result from networkx but it's a lot quicker in the same machine.

# Graph_tool with SNAP Facebook Dataset

I'm going to try another library which is supposed to be the fastest than networkx and igraph. Graph_tool is also C based which it has OpenMP enabled so a lot of algorithms is multiprocessing.

``````

In [1]:

import sys

from graph_tool.all import *
import timeit

``````
``````

In [2]:

show_config()

``````
``````

version: 2.2.44 (commit 178add3a, Thu Jul 2 01:44:54 2015 +0200)
gcc version: 4.2.1
compilation flags:  -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I/usr/local/include -I/usr/local/lib/python2.7/site-packages/numpy/core/include -I/usr/local/lib/python2.7/site-packages/scipy -I/usr/local/Cellar/google-sparsehash/2.0.2/include  -DSPARSEHASH_PREFIX=sparsehash -Wall -Wextra -ftemplate-backtrace-limit=0  -DNDEBUG -std=gnu++11 -ftemplate-depth-250 -Wno-deprecated -Wno-unknown-pragmas -fvisibility=default -fvisibility-inlines-hidden -Wno-unknown-pragmas -I/usr/local/Cellar/google-sparsehash/2.0.2/include  -DSPARSEHASH_PREFIX=sparsehash  -undefined dynamic_lookup
install prefix: /usr/local/Cellar/graph-tool/2.2.44
python dir: /Library/Python/2.7/site-packages
graph filtering: True
openmp: False
uname: Darwin Noppanits-MacBook-Air.local 14.5.0 Darwin Kernel Version 14.5.0: Wed Jul 29 02:26:53 PDT 2015; root:xnu-2782.40.9~1/RELEASE_X86_64 x86_64

``````
``````

In [3]:

graph_tool_fb = Graph(directed=False)

``````
``````

In [4]:

for line in f:
edge_list = line.split()
source, target = tuple(edge_list)

``````
``````

In [5]:

print graph_tool_fb.num_vertices()
print graph_tool_fb.num_edges()

``````
``````

4039
88234

``````

## Betweeness

``````

In [6]:

start = timeit.default_timer()
vertext_betweenness, edge_betweenness = betweenness(graph_tool_fb)
stop = timeit.default_timer()
print 'It takes {} seconds to finish'.format(stop - start)

``````
``````

It takes 36.7536098957 seconds to finish

``````
``````

In [7]:

vertext_betweenness.a[107]

``````
``````

Out[7]:

0.48051807855601453

``````

## Closeness

``````

In [8]:

start = timeit.default_timer()
v_closeness = closeness(graph_tool_fb)
stop = timeit.default_timer()
print 'It takes {} seconds to finish'.format(stop - start)

``````
``````

It takes 33.3668859005 seconds to finish

``````
``````

In [9]:

v_closeness.a[107]

``````
``````

Out[9]:

0.45969945355191255

``````

## Eigenvalue

``````

In [11]:

start = timeit.default_timer()
v_closeness = eigenvector(graph_tool_fb)
stop = timeit.default_timer()
print 'It takes {} seconds to finish'.format(stop - start)

``````
``````

It takes 0.131196975708 seconds to finish

``````

## Page Rank

``````

In [13]:

start = timeit.default_timer()
v_closeness = pagerank(graph_tool_fb)
stop = timeit.default_timer()
print 'It takes {} seconds to finish'.format(stop - start)

``````
``````

It takes 0.163510084152 seconds to finish

``````

# Information diffusion modelling

I'm going to information diffusion model to simulate how information travels in the graph.

``````

In [14]:

%matplotlib inline
import random as r
import networkx as nx
import matplotlib.pyplot as plot

``````
``````

``````
``````

In [15]:

class Person(object):
def __init__(self, id):
self.id = id
self.i = r.random()
self.a = self.i
# we value initial opinion and subsequent information equally
self.alpha = 0.8

def __str__(self):
return (str(self.id))

def step(self):
# loop through the neighbors and aggregate their preferences
neighbors = g[self]
# all nodes in the list of neighbors are equally weighted, including self
w = 1/float((len(neighbors) + 1 ))
s = w * self.a
for node in neighbors:
s += w * node.a

# update my beliefs = initial belief plus sum of all influences
self.a = (1 - self.alpha) * self.i + self.alpha * s

``````
``````

In [16]:

density = 0.9
g = nx.Graph()

## create a network of Person objects
for i in range(10):
p = Person(i)

## this will be a simple random graph, every pair of nodes has an
## equal probability of connection
for x in g.nodes():
for y in g.nodes():
if r.random() <= density:

## draw the resulting graph and color the nodes by their value
col = [n.a for n in g.nodes()]
pos = nx.spring_layout(g)
nx.draw_networkx(g, pos=pos, node_color=col)

``````
``````

``````
``````

In [17]:

## repeat for 30 times periods
for i in range(30):
## iterate through all nodes in the network and tell them to make a step
for node in g.nodes():
node.step()

## collect new attitude data, print it to the terminal and plot it.
col = [n.a for n in g.nodes()]
print col
plot.plot(col)

``````
``````

[0.45050753135154914, 0.5888690787037506, 0.4423961312885849, 0.5300498998080747, 0.6988731674943652, 0.6795418820824117, 0.6785718311317842, 0.568814202159901, 0.5706430294584026, 0.49632118526941715]
[0.4855296157010143, 0.5856804164323358, 0.47093717928779294, 0.5244620739653998, 0.6496396099696642, 0.6411899282308859, 0.6484393994615059, 0.5720352467769754, 0.5728158036583522, 0.514839066871444]
[0.4852388874069484, 0.5800425201917674, 0.46682024795791915, 0.5160964412252317, 0.6378977198782583, 0.6329660942199506, 0.6430044848703405, 0.5710302747024006, 0.5712191949768373, 0.5141570563368826]
[0.4817926221791012, 0.5759778764271721, 0.46267707497376387, 0.5119459265022431, 0.6338082132037018, 0.6296889831451993, 0.6402899659669313, 0.5689477848037309, 0.568903307620578, 0.5118553740331476]
[0.47917210885991474, 0.5733724448077611, 0.4600881540183551, 0.509553410753053, 0.6315479888743104, 0.6276208808481694, 0.6383507073752878, 0.5671342971746759, 0.5670689867837366, 0.5100571093106294]
[0.47738726861490655, 0.5716494776372223, 0.4584098871501169, 0.5079763369938789, 0.6300398411813045, 0.6261814021492542, 0.6369663172075363, 0.5658001804905268, 0.5657674458613378, 0.5087969382812318]
[0.47616720799791845, 0.5704749917622336, 0.45727213655018917, 0.5068916613716337, 0.6289959891374665, 0.6251763111703803, 0.6359968245014774, 0.5648619722374798, 0.5648629462562351, 0.507924323127787]
[0.4753256958472638, 0.5696643249358213, 0.45648750404727034, 0.5061396670832817, 0.6282711588427362, 0.6244775014439462, 0.6353228787405164, 0.564209549012505, 0.5642360059552106, 0.5073198878271796]
[0.4747430265725202, 0.569102723893816, 0.45594394476769473, 0.5056179796082445, 0.6277681965172944, 0.6239925673757833, 0.6348553075540804, 0.5637569372186323, 0.5638013941480882, 0.5069009002741897]
[0.47433910911214694, 0.5687133388005052, 0.45556705072545256, 0.5052561432242426, 0.6274193474962474, 0.6236562377410398, 0.6345310482695293, 0.5634430652325614, 0.5635000447609191, 0.5066103790087721]
[0.4740590268561978, 0.5684433196406642, 0.4553056879672762, 0.5050052110408993, 0.6271774256084398, 0.6234230028728354, 0.634306189046178, 0.5632254120963035, 0.5632910785418146, 0.5064089190652907]
[0.4738648033004549, 0.5682560722155623, 0.45512444230147703, 0.5048311982196966, 0.6270096622168863, 0.6232612646097078, 0.6341502596739543, 0.5630744803151793, 0.5631461705667189, 0.5062692157825318]
[0.4737301176939477, 0.5681262239966826, 0.4549987558804413, 0.5047105276775358, 0.6268933256171756, 0.6231491063314711, 0.6340421296775939, 0.5629698159902224, 0.5630456833506607, 0.5061723376902101]
[0.47363671908283966, 0.5680361798861583, 0.4549115977773571, 0.5046268479381407, 0.6268126513137963, 0.6230713295275687, 0.6339671463105065, 0.5628972359187272, 0.56297599988634, 0.5061051569804691]
[0.47357195123592, 0.5679737382403129, 0.45485115745041627, 0.5045688197186189, 0.6267567072365211, 0.6230173947394265, 0.6339151486461881, 0.5628469048340012, 0.5629276774650982, 0.506058570108148]
[0.47352703757816206, 0.5679304377018448, 0.454809244738982, 0.5045285797000933, 0.6267179124793222, 0.6229799933356248, 0.633879090552071, 0.5628120024393705, 0.5628941679894082, 0.5060262641593287]
[0.47349589193029284, 0.5679004106815524, 0.4547801801158474, 0.5045006750162676, 0.626691010020662, 0.6229540571061962, 0.6338540858487557, 0.5627877991627975, 0.5628709306408046, 0.5060038614037401]
[0.473474293792463, 0.5678795882628218, 0.4547600250772673, 0.5044813243446339, 0.6266723543481089, 0.6229360714711223, 0.6338367461860795, 0.5627710152543307, 0.5628548165609253, 0.505988326077442]
[0.47345931643322287, 0.5678651488306864, 0.4547460484439353, 0.5044679055044214, 0.6266594174592056, 0.6229235992237557, 0.6338247218921751, 0.5627593763515408, 0.5628436421548206, 0.5059775530115085]
[0.4734489302929201, 0.5678551357191866, 0.45473635626306397, 0.5044586001285595, 0.6266504462953916, 0.6229149502684235, 0.6338163835728791, 0.5627513052840428, 0.5628358931953572, 0.5059700823638134]
[0.47344172796114287, 0.5678481920664878, 0.45472963516164944, 0.5044521472594167, 0.6266442251876138, 0.6229089525980652, 0.6338106013149579, 0.5627457083534964, 0.5628305196325749, 0.5059649017980091]
[0.47343673346063375, 0.5678433769485494, 0.45472497437324444, 0.5044476724783019, 0.6266399111228184, 0.6229047934772991, 0.6338065915732485, 0.5627418271281754, 0.56282679330298, 0.5059613093033355]
[0.47343326999406377, 0.5678400378759074, 0.4547217423213503, 0.5044445694142877, 0.6266369195085074, 0.6229019093098619, 0.6338038109936212, 0.5627391356690145, 0.5628242092571969, 0.5059588180661874]
[0.4734308682322342, 0.5678377223757991, 0.45471950103542663, 0.5044424175757889, 0.6266348449554792, 0.6228999092665177, 0.6338018827838697, 0.5627372692603893, 0.5628224173350278, 0.5059570905027274]
[0.4734292027159746, 0.5678361166782517, 0.45471794680215405, 0.5044409253703741, 0.6266334063441409, 0.6228985223243704, 0.6338005456551616, 0.5627359749880457, 0.5628211747157675, 0.5059558925134029]
[0.4734280477536563, 0.567835003197581, 0.4547168690096958, 0.5044398905915087, 0.626632408730412, 0.6228975605409541, 0.6337996184151761, 0.5627350774670652, 0.5628203130139298, 0.5059550617604639]
[0.4734272468380549, 0.5678342310476787, 0.45471612160800046, 0.5044391730178491, 0.626631716929141, 0.6228968935864301, 0.6337989754150027, 0.5627344550757155, 0.5628197154615813, 0.5059544856698164]
[0.47342669143828964, 0.5678336955956613, 0.45471560331777666, 0.5044386754120571, 0.6266312371953676, 0.6228964310827836, 0.6337985295226456, 0.5627340234746718, 0.5628193010852929, 0.5059540861762744]
[0.4734263062929649, 0.567833324283223, 0.4547152439062878, 0.5044383303443126, 0.6266309045210825, 0.6228961103568342, 0.6337982203158951, 0.5627337241783191, 0.5628190137335509, 0.5059538091450627]
[0.4734260392116112, 0.5678330667943682, 0.45471499467021825, 0.5044380910549993, 0.6266306738260885, 0.6228958879474856, 0.6337980058946049, 0.5627335166294539, 0.5628188144677484, 0.5059536170360944]

``````
``````

In [18]:

class Influencer(Person):
def __ini__(self, id):
self.id = id
self.i = r.random()
self.a = 1 ## opinion is strong and immovable

def step(self):
pass

``````
``````

In [19]:

influencers = 2
connections = 4
## add the influencers to the network and connect each to 3 other nodes
for i in range(influencers):
inf = Influencer("Inf" + str(i))
for x in range(connections):

## repeat for 30 time periods
for i in range(30):
## iterate through all nodes in the network and tell them to make a step
for node in g.nodes():
node.step()

## collect new attitude data, print it to the terminal and plot it.
col = [n.a for n in g.nodes()]
#print col
plot.plot(col)

``````
``````

``````

``````

In [20]:

import copy
import networkx as nx
import random

``````
``````

In [28]:

def independent_cascade(G, seeds, steps = 0):
"""
"Return the active nodes of each diffusion step by the independent cascade
model

Parameters
-- -- -- -- -- -
G: graph
A NetworkX graph
seeds: list of nodes
The seed nodes for diffusion
steps: integer
The number of steps to diffuse.If steps <= 0, the diffusion runs until
no more nodes can be activated.If steps > 0, the diffusion runs for at
most "steps" rounds

Returns
-- -- -- -
layer_i_nodes: list of list of activated nodes
layer_i_nodes[0]: the seeds
layer_i_nodes[k]: the nodes activated at the kth diffusion step

Notes
-- -- -
When node v in G becomes active, it has a * single * chance of activating
each currently inactive neighbor w with probability p_ {
vw
}

Examples
-- -- -- --
>>> DG = nx.DiGraph() >>> DG.add_edges_from([(1, 2), (1, 3), (1, 5), (2, 1), (3, 2), (4, 2), (4, 3), \ >>> (4, 6), (5, 3), (5, 4), (5, 6), (6, 4), (6, 5)], act_prob = 0.2) >>> H = nx.independent_cascade(DG, [6])

References
-- -- -- -- --[1] David Kempe, Jon Kleinberg, and Eva Tardos.
Influential nodes in a diffusion model
for social networks.
In Automata, Languages and Programming, 2005.
"""
if type(G) == nx.MultiGraph or type(G) == nx.MultiDiGraph:
raise Exception(\
"independent_cascade() is not defined for graphs with multiedges.")

# make sure the seeds are in the graph
for s in seeds:
if s not in G.nodes():
raise Exception("seed", s, "is not in graph")

# change to directed graph
if not G.is_directed():
DG = G.to_directed()
else:
DG = copy.deepcopy(G)

# init activation probabilities
for e in DG.edges():
if 'act_prob' not in DG[e[0]][e[1]]:
DG[e[0]][e[1]]['act_prob'] = 0.1
elif DG[e[0]][e[1]]['act_prob'] > 1:
raise Exception("edge activation probability:", DG[e[0]][e[1]]['act_prob'], "cannot be larger than 1")

# perform diffusion
A = copy.deepcopy(seeds)# prevent side effect
if steps <= 0: #perform diffusion until no more nodes can be activated
return _diffuse_all(DG, A)# perform diffusion for at most "steps" rounds
return _diffuse_k_rounds(DG, A, steps)

``````
``````

In [29]:

def _diffuse_all(G, A):
tried_edges = set()
layer_i_nodes = [ ]
layer_i_nodes.append([i for i in A])  # prevent side effect
while True:
len_old = len(A)
(A, activated_nodes_of_this_round, cur_tried_edges) = _diffuse_one_round(G, A, tried_edges)
layer_i_nodes.append(activated_nodes_of_this_round)
tried_edges = tried_edges.union(cur_tried_edges)
if len(A) == len_old:
break
return layer_i_nodes

``````
``````

In [30]:

def _diffuse_k_rounds(G, A, steps):
tried_edges = set()
layer_i_nodes = [ ]
layer_i_nodes.append([i for i in A])
while steps > 0 and len(A) < len(G):
len_old = len(A)
(A, activated_nodes_of_this_round, cur_tried_edges) = _diffuse_one_round(G, A, tried_edges)
layer_i_nodes.append(activated_nodes_of_this_round)
tried_edges = tried_edges.union(cur_tried_edges)
if len(A) == len_old:
break
steps -= 1
return layer_i_nodes

``````
``````

In [31]:

def _diffuse_one_round(G, A, tried_edges):
activated_nodes_of_this_round = set()
cur_tried_edges = set()
for s in A:
for nb in G.successors(s):
if nb in A or (s, nb) in tried_edges or (s, nb) in cur_tried_edges:
continue
if _prop_success(G, s, nb):
activated_nodes_of_this_round = list(activated_nodes_of_this_round)
A.extend(activated_nodes_of_this_round)
return A, activated_nodes_of_this_round, cur_tried_edges

def _prop_success(G, src, dest):
return random.random() <= G[src][dest]['act_prob']

``````
``````

In [32]:

run_times = 10

G = nx.DiGraph()
nx.draw_networkx(G)

``````
``````

``````
``````

In [33]:

``````
``````

Out[33]:

[[1], []]

``````
``````

In [45]:

n_A = 0.0
for i in range(run_times):
print A
for layer in A:
n_A += len(layer)
n_A / run_times
#assert_almost_equal(n_A / run_times, 1.7, places=1)

``````
``````

[[1], []]
[[1], [2]]
[[1], [2]]
[[1], []]
[[1], [2, 3]]
[[1], [2, 3]]
[[1], []]
[[1], []]
[[1], []]
[[1], []]

Out[45]:

1.6

``````

# Conclusion

Both igraph and graph_tool yield similar performance. So, for the real data I'm going to use graph_tool as the documentation is more comprehensive.