Structural Analysis and Visualization of Networks

Home Assignment #2: Network models

Student: *Nazarov Ivan*


General Information

Due Date: 18.02.2015 23:59 <br > Late submission policy: -0.2 points per day <br >

Please send your reports to mailto:leonid.e.zhukov@gmail.com and mailto:shestakoffandrey@gmail.com with message subject of the following structure:<br > [HSE Networks 2015] Nazarov Ivan HA2

Support your computations with figures and comments. <br > If you are using IPython Notebook you may use this file as a starting point of your report.<br >

<hr >

Problem statement

Task 1

Consider Barabasi and Albert dynamical grow model. Two main ingredients of this model are network growing and prefferential attachment. Implement two restricted B&A-based models:

Model A
Lack of prefferential attachment, that is at each time-step form edges uniformly at random while network keeps growing.

Model B
Lack of growing, that is fix total number of nodes, on each time-step randomly choose one and form edges with prefferential attachment.

Task 2

Consider the following "Vertex copying model" of growing network.

At every time step a random vertex from already existing vertices is selected and duplicated together with all edges, such that every edge of the vertex

  • is copied with probability $q$
  • is rewired to any other randomly selected vertex with probability $1-q$

Starting state is defined by some small number of randomly connected vertices.

The model can generate both directed and undirected networks.

Objectives

  1. Generate networks according to the models above ($N > 1000$ nodes)
  2. Compute CDF/PDF, describe the distribution and compute\describe its properties.
  3. Illustate the following dependencies:
    • average path length to the number of nodes
    • average clustering coefficient to the number of nodes
    • average node degee to the nodes "age"
  4. Is scale-free property conserved in these models?

Analyse results with respect to various parameter settings



Solution

Preamble

We shall include all the necessary modules here, before any analysis.


In [ ]:
## As usual, attach the numpy and the netwrokx modules
import networkx as nx
import numpy as np
import numpy.random as rnd
import numpy.matlib as ml
from scipy.stats import chisquare

import matplotlib.pyplot as plt
%matplotlib inline

## Use direct HTML output capabilities of iPython
from IPython.display import HTML

## Ignore those pesky warnings
import warnings
warnings.filterwarnings( 'ignore' )

The original model

First and foremost, but solely for reference purposes, let's state the Barabasi-Albert dynamical growth model in its original form (Barabasi, Albert (1999)).

The evolution of the network is represented by a sequence of graphs ${(G_n)}_{n\geq0}$ with $G_n = \big(V_n, E_n\big)$. The process starts with a graph $G_0$ of $m_0$ nodes. The initial number of vertices is usually a small number. At every moment $n$ the network $G_n$ undergoes evolution, which can be divided in two phases

  • a new vertex $w'\notin V_n$ is introduced into the network: $V_{n+1} = V_n \cup \{w'\}$;
  • the vertex forms $m$ ($\leq m_0$) links to pre-existing nodes chosen at random without replacement according to some distribution $D_n$ over the nodes $V_n$

In the Barabasi-Albert growth model $D_n$ is the ``preferential connection'' distribution: for every $v\in V_n$

$$\mathbb{P}\big(w'\leadsto v\big) = \frac{\delta_n(v)}{\sum_{u\in V_n}\delta_n(u) }$$

where $\delta_n(v)$ is the degree of vertex $v$ in graph $G_n$.


In [ ]:
## The original attachment distribution of Barabasi-Albert
def ba_valency( G, nbunch = None ) :
## Get the distribution of the valency of the existing nodes base on the observed
##  connectivity and degrees.
    deg = G.degree( nbunch )
## For some reason the call to $order( G )$ takes exceptonally long time to return
##  But still, we need the numebr of edges here.
    deg_sum = float( sum( deg.values( ) ) )
## The vector of valencies is in the same order as the thr nbunch (or nx.nodes(.))
    nbunch = nx.nodes( G ) if nbunch is None else nbunch
    return np.fromiter( ( deg[ n ] / deg_sum for n in nbunch ),
        count = len( nbunch ), dtype = np.float )

Analysis of Barabasi-Albert model

Motivation

In their exposition Barabasi-Albert never state that the initial network $G_0$ is connected, although it should be noted that, unless $m\neq m_0$, the first ever attachment step brings about a connected star graph. Nevertheless, the model is stated as follows : (see p.551 c.1 L 30-35 of Barabasi, Albert (1999))

To incorporate the growing character of the network, starting with a small number ($m_0$) of vertices, at every time step we add a new vertex with $m$ ($\leq 􏰔m_0$) edges that link the new vertex to $m$ different vertices already present in the system.

Furthermore : (see p.551 c.1 L 40-42 of Barabasi, Albert (1999))

After $t$ time steps, the model leads to a random network with $t􏰄 + m_0$ vertices and $mt$ edges.

Note that had the initial network $G_0$ been connected, at least minimally, it would have had at least $mt + (m_0-1)$ edges by the end of step $t$. The $m_0-1$ term comes form the fact that the minimal spanning tree is the minimal connected graph.

Now, the case $m < m_0$ does not seem unnatural as it models the situation when the edge fromation is dominated by vertex discovery rather than preferential attachment.

In light of this observation the previous definition of attachment probability dstribution suffers from the following drawback: isolated vertices remain almost surely isolated.

In my opinion, this does not seem to be what authors had had in mind when they devised their model and attempted to check if both evolutionary phases were indeed necessary for the development of the power law scaling (see p.551 c.2 L 8-9 of Barabasi, Albert (1999) ) :

In model B, we start with $N$ vertices and no edges. At each time step, we randomly select a vertex and connect it with probability $\Pi􏰊(k_i) =􏰆 \frac{k_i}{􏰋\sum_j k_j}$ to vertex $i$ in the system.

This definition of attachment probabilities would have no effect on the Barabasi-Albert model and its type-B restriction even in the case when $m < m_0$, if the evolution began with a connected graph $G_0$.

References:

  • Barabási, A. L., & Albert, R. (1999). "Emergence of scaling in random networks." Science, 286(5439), 509-512. Chicago

Bayesian preferential attachment

Nevertheless it is easy to patch this "vagueness" in the model. Let's imagine for a moment that for some observed graph $G = (V,E)$ we are faced with the problem of estimating the probabilities ${(\theta_v)}_{v\in W}$ of getting attached to a node in some $W\subseteq V$. Then the vertex degrees of the nodes in $W$ can be regarded as the evidence of attachement, whence the likelihood function should be given by:

$$L\Big(G; {(\theta_v)}_{v\in W}\Big) = \prod_{v\in W} \theta_v^{\delta(v)}$$

Brief theory

For this task of parameter estimation, let's employ Bayesian approach instead of the common MLE, since the latter tends to overfit the model because it fails to incorporate the uncertainty of the parameters.

Cosider a prior $\pi(\cdot)$ on the space of possible probability vectors ${(\theta_v)}_{v\in W}$ -- a $(n-1)$-dimensional simplex $S^\circ_{n-1}\subseteq {[0,1]}^n$ where $n=\big\rvert W\big\lvert$. Using Bayes' theorem, the posterior distribution of ${(\theta_v)}_{v\in W}$ given the observed data is

$$p\Big(\Big.{(\theta_v)}_{v\in W}\,\big.\big\rvert\,G\Big.\Big) \propto p\Big(\Big.G\,\big.\big\rvert\,{(\theta_v)}_{v\in W}\Big.\Big) \pi\Big({(\theta_v)}_{v\in W}\Big) = L\Big(G; {(\theta_v)}_{v\in W}\Big) \pi\Big({(\theta_v)}_{v\in W}\Big)$$

In general it is convenient to pick a prior $\pi(\cdot)$ from an "eigen-family" of this prior-to-posterior tranfromation. Such prior families are know nas the conjugate prior families. For this particular kind of likelihood such a family is the Dirichlet distribution given by

$$\text{Dir}\big({(\alpha_v)}_{v\in W}\big) = \frac{\Gamma\big( \sum_{v\in W} \alpha_v \big)}{ \prod_{v\in W} \Gamma\big(\alpha_v \big)} \prod_{v\in W} \theta_v^{\alpha_v-1}$$

It is easy to demonstrate that the posterior is $\text{Dir}\big({(\alpha_v)}_{v\in W} + {(\delta(v))}_{v\in W}\big)$. This implies that the posterior probability of a random vertex being connected to a node $v\in W$ is given by
$$p\Big(\Big. v\,\big.\big\rvert\,G\Big.\Big) = \int p\Big(\Big. v\,\big.\big\rvert\,{(\theta_v)}_{v\in W}\Big.\Big) p\Big(\Big. {(\theta_v)}_{v\in W}\, \big.\big\rvert\,G\Big.\Big) d\mathbf{\theta} = \ldots $$

which in our case resolves into
$$\ldots = \int \theta_v \text{Dir}\big({(\alpha_v)}_{v\in W} + {(\delta(v))}_{v\in W}\big) d\mathbf{\theta} = \frac{\alpha_v + \delta(v)}{\sum_{u\in W} \alpha_u + \delta(u)}$$

Final probability vector

Since there is no reason to assume any specific prior distribution of the values of the attachment probability vectors, let's use an uninformative prior, which assigna equal probability of every vector in the simplex. In the case of a Dirichlet family, this sets the hyperparameter ${(\alpha_v)}_{v\in W}$ of the prior to a vector of ones: $\alpha_v=1$ for each $v\in W$.

Therefore the attachment probability distribution in $G_n$ is given by

$$p\Big(\Big. w'\leadsto v\,\big.\big\rvert\,G_n\Big.\Big) = \frac{\delta_n(v)+1}{\big\lvert V_n\big\rvert + \sum_{u\in S}\delta_n(u)}$$

Listed below is the procedure, which computes the required distribution for a given graph $G$.


In [ ]:
## Bayesian valency distribution. Funny thing is that this correction
##  turns out to is
def ba_bayes_valency( G, nbunch = None ) :
## Get the degree of nodes
    deg = G.degree( nbunch )
## Get the adjusted sum of vertex degrees
    nbunch = nx.nodes( G ) if nbunch is None else nbunch
    deg_sum = float( sum( deg.values( ) ) + len( nbunch ) )
## Compute the posterior distribution of vertex valency
    return np.fromiter( ( ( deg[ n ] + 1.0 ) / deg_sum for n in nbunch ),
        count = len( nbunch ), dtype = np.float )

Bayesian smoothing reduces the zero-degree penalty on the isolated vertices, thus enabling vertex discovery during the early stages of evolution. Notable thing is that for an uninformative prior the posterior distribution given $G$ is equivalent to the original attachment distribution for the same graph with an extra loop at every vertex.

The long run distribution

Using the heuristic "mean field" approach it is possible to derive an approximation to the digree distribution un the long run.

Suppose vertices are born at intervals $\frac{1}{n}$ with $m$ edges. Then at time $t\in [0,1]$ there are $nt$ vertices and the sum of all node degrees is $2mnt$. Across the span of an infinitesimal interval $[t, t+dt]$ extra $n dt$ vertices are born, each with $m$ edges.

Suppose a vertex $v$ is born at time $s$ with $m$ edges and at time $t>s$ this node $v$ has degree $k_s(t)$. Then during $[t, t+dt]$ the vertex $v$ gets an incoming edge with probability $n dt \frac{k_s(t)+1}{2mnt + nt} = \frac{k_s(t)+1}{(2m+1)t} dt$. Ov average thus $v$ gets $\frac{m}{2m+1} \frac{k_s(t)+1}{t}dt$ edges, which is by how much its degree increses :

$$ \frac{k_s(t+dt)-k_s(t)}{dt} = \frac{m}{2m+1} \frac{k_s(t)+1}{t}$$

Note that the degree depends on the time thevertex was born and not on the vertex itself.

In the continuous time limit as $n\to \infty$ the following differential equation approximates the dynamics of degree of the vertex $v$ :

$$\dot{k}_s(t) = \frac{m}{2m+1} \frac{k_s(t)+1}{t}$$

with initial value at time of birth being $m$ : $k_s(t=s) = m$.

The solution to this ordinary differential equation for these initial conditions is given by
$$\delta_s(t) = (m+1) \Big(\frac{t}{s}\Big)^\frac{m}{2m+1}$$ Put $\alpha = \frac{m}{2m+1}$ and note that $\frac{1}{\alpha} = 2 + \frac{1}{m}$.

In the continuous time limit the probability that at time $t$ a vertex born at $s$ has degree less than $k$ is given by $$F(k) = \mathbb{P}(\delta_s(t)\leq k) = \mathbb{P}\Bigg(t \Big(\frac{m+1}{k+1}\Big)^\frac{1}{\alpha} \leq s\Bigg) = \frac{t - t \Big(\frac{m+1}{k+1}\Big)^\frac{1}{\alpha}}{t} = 1 - \Big(\frac{m+1}{k+1}\Big)^\frac{1}{\alpha}$$

The distribution function is given by $$\frac{d}{d k} F(k) = \frac{1}{\alpha} \Big( k+1 \Big)^{-\frac{1}{\alpha}-1} \Big( m+1 \Big)^\frac{1}{\alpha} = \Big( k+1 \Big)^{-3-\frac{1}{m}} \Big(2+\frac{1}{m}\Big) \Big( m+1 \Big)^{2+\frac{1}{m}}$$

Thus means that the limiting degree distribution of the Barabasi-Albert model with Bayesian valency behaves approximately like the power law with similar exponent as in the original model.

For a more formally rigorous treatment of the original model see Bollobas, Riordan et al., (2001).

References:

  • Bollobás, B., Riordan, O., Spencer, J., & Tusnády, G. (2001). "The degree sequence of a scale‐free random graph process." Random Structures & Algorithms, 18(3), 279-290.

Model evolution

Complete model

A has been mentioned above the basic growth step in the complete Barabasi-Albert consisted of two phases:

  1. growth;
  2. attachment according to valency distribution.

In [ ]:
def barabasi_albert( n, m = 10, callback = None, avclu = False, avlen = False ) :
## Make sure the callback is a proper function
	callback_fn = callback if callable( callback ) else lambda G, _, __: None
## Barabasi-Albert evolution always begins with the isolated graph
	G = nx.empty_graph( m )
## To speed up the computations keep track of the node degree locally,
##  but instead of using a dictionary, store them in a linear array.
	degree = np.zeros( n + m, dtype = np.int )
	degree_half_sum = 0
## Local adjacency matrix : should be made sparse
	adjacency = np.zeros( ( n + m, n + m ) )
## Shortest path matrix for the modified Floyd Warshall
	pi = 1 - ml.identity( n + m, np.float )
	pi[ pi == 1 ] = np.inf
	npmin = np.minimum
## Local tracking of the number of triangles for the clustering coefficient
	triangles = np.zeros( n + m, dtype = np.int )
## Note that the graph itself is used just to keep track of the edges.
	result = list( )
## Create an array of average path lengths
	avg_path_ln = np.full( n + m, np.inf, dtype = np.float )
## Setup the average clustering metric storage
	avg_tri_clu = np.full( n + m, np.inf, dtype = np.float )
## Age-degree: the dependence of the average node degree on the age of a vertex
	avg_age_deg = np.zeros( n + 1, dtype = np.float )
## Identify vertices by consequitive integers. 
	t = G.number_of_nodes( )
## In the original Barabasi-Albert model the first new vertex
##  is attached to all m vertices, thus making the graph connected.
	U = range( m )
	while t < n + m :
## Add pending edges
		G.add_edges_from( ( t, u ) for u in U )    
## maintain a local copy of the adjacency matrix
		adjacency[ t, U ] = adjacency[ U, t ] = 1
## Record the vertex degree
		degree[ U ] += 1
		degree[ t ] += len( U )
		degree_half_sum += len( U )
## Create a new vertex using stack allocation -- on top of others.
		t = G.number_of_nodes( )
## Run the Floyd-Warshall algorithm
		if avlen :
## Add the shortest paths from the affected vertices to the new node
			pi[ U, t-1 ] = pi[ t-1, U ] = 1
## Compute the shortest path from the exisitng nodes to the new one
			for u in U :
				pi[ :t, t-1 ] = npmin( pi[ :t, t-1 ], pi[ :t, u ] + pi[ u, t-1 ] )
## Restore the distance symmetry due to bidirectionality
			pi[ t-1, :t ] = pi[ :t, t-1 ].T
## Compare the shortest paths through the new vertex with the current paths
##	At this point the matrix pi has the following property:
##	 * for nodes $i,j\in V$ the value $\pi_{ij}$ is the shortest path
##	   through nodes other than $\omega$;
##	 * for any $x\in V$ the element $\pi_{x\omega}$ (and $\pi_{\omega x}) is
##	   the length of the shortest path from $i$ to the new node $\omega$ with
##	   intermediate vertices exclusively in $V$.
			pi[ :t, :t ] = npmin( pi[ :t, :t ], pi[ :t, t-1 ] + pi[ t-1, :t ] )
## Numerically confirm the correctness shown elsewhere in this notebook
##			assert( np.all( nx.floyd_warshall_numpy( G ) == pi[ :t, :t ] ) )
## Compute the average path length
##             avg_path_ln[ t-1 ] = nx.average_shortest_path_length( G )
			spm = pi[ :t, :t ]
			avg_path_ln[ t-1 ] = np.mean( np.where( np.isfinite( spm ),
				spm, 0 ), dtype = np.float64 ) * t / ( t - 1.0 )
## Compute the average clustering coefficient
		if avclu :
			tri_num = 0
			for u in U :
## Count the number of edges from u to U, since they are responsible for the new triangles containing t
				X = np.sum( adjacency[ u, U ] )
## Update the number of the current endpoint
				triangles[ u ] += X
				tri_num += X
## Every new triangle could have been created only via the edges incident on t
##  The loop above counts the number of trinagles containing t twice (once per each edge of a triangle)
			triangles[ t-1 ] += tri_num // 2
## Compute the clustering coefficinet of each node
			max_clust = degree[ :t ] * ( degree[ :t ] - 1 )
			clust = np.where( max_clust != 0, ( 2.0 * triangles[ :t ] ) / max_clust, 0 )
## And the average
			avg_tri_clu[ t-1 ] = np.mean( clust, dtype = np.float64 )
# 			cc = nx.average_clustering( G )
# 			avg_tri_clu[ t-1 ] = np.abs( ( np.mean( clust, dtype = np.float64 ) - cc ) / ( cc + 1e-12 ) )
## Compute the average node degree of created vertices of a given age.
## * A[0:t-m0] -- current mean degrees of vertices of age 0:t-m0 (right ends not included)
## * D[0:m0] -- the current degrees of the oldest (initial) m0 vertices
## * D[m0:t] -- the degrees of nodes created at iterations 0:t-m0 respectively
		avg_age_deg[ :t-m ] += ( degree[ t-1 :m-1 :-1 ] - avg_age_deg[ :t-m ]
			) / np.arange( t-m, 0, -1, dtype = np.float )
## Now incorporate the degree of the initial $m$ nodes, which are the oldest ones.
		avg_age_deg[ t-m ] = np.mean( degree[ :m ], dtype = np.float64 )
## The evolution step is complete, call the required function
		result.append( callback_fn( G, degree[ :t ], pi[ :t, :t ] ) )
## The initial nodes contribute to the average node degree of the next generation of vertices
## The nodes which the new vertex is attached to are picked
##  according to the preferential attachment distribution.
		probabilities = degree[ :t ] / ( 2.0 * degree_half_sum )
		U = rnd.choice( t, size = m, p = probabilities, replace = False )
#     assert( np.all( nx.floyd_warshall_numpy( G ) == pi ) )
	return G, result, avg_age_deg, avg_tri_clu, avg_path_ln # , pi
## https://networkx.github.io/documentation/latest/reference/classes.graph.html

In [ ]:
## %lprun -f barabasi_albert barabasi_albert( 1000, m = 2, avlen = True, avclu = True )
# G1, R1, A1, C1, L1 = barabasi_albert( 1000, m = 10, avlen = True, avclu = True )
# plt.loglog( C1 )

Model A

In model A, the first restricted verison of the evolution, the network grows with time but the newly added nodes are attached to the existing ones uniformly at random. The code below implements the basic steps of both the Bayesian and the first restricted model.


In [ ]:
## This evolution step is similar to the original BA step, except it
##  relaxes the requirement that m_0 be equal to m, and it implemets
##  the egalitarian attachment and bayesian valency.
def barabasi_albert_model_a( n, m0 = 10, m = 5, bayes = False, callback = None, avclu = False, avlen = False ) :
	callback_fn = callback if callable( callback ) else lambda G, _, __: None
	result = list( )
	G = nx.empty_graph( m0 )
## Local degree and shortest path tracking
	degree = np.zeros( n + m0, dtype = np.int )
	degree_half_sum = 0
	adjacency = np.zeros( ( n + m0, n + m0 ) )
	pi = 1 - ml.identity( n + m0, np.float )
	pi[ pi == 1 ] = np.inf
	npmin = np.minimum
	triangles = np.zeros( n + m0, dtype = np.int )
## Average path length and clustering
	avg_path_ln = np.full( n + m0, np.inf, dtype = np.float )
	avg_tri_clu = np.full( n + m0, np.inf, dtype = np.float )
	avg_age_deg = np.zeros( n + 1, dtype = np.float )
## Begin the simulation
	t = G.number_of_nodes( )
	while t < n + m0 :
## Select either the bayesian valency distribution, or uniform attachment
		theta = None if not bayes else ( degree[ :t ] + 1.0 ) / ( 2.0 * degree_half_sum + t )
		U = rnd.choice( t, size = m, p = theta, replace = False )
		G.add_edges_from( ( t, u ) for u in U )
		adjacency[ t, U ] = adjacency[ U, t ] = 1
## Accumulate degree information
		degree[ U ] += 1
		degree[ t ] += len( U )
		degree_half_sum += len( U )
## Preallocate the next vertex
		t = G.number_of_nodes( )
## Compute the shortest paths
		if avlen :
			pi[ U, t-1 ] = pi[ t-1, U ] = 1
			for u in U :
				pi[ :t, t-1 ] = npmin( pi[ :t, t-1 ], pi[ :t, u ] + pi[ u, t-1 ] )
			pi[ t-1, :t ] = pi[ :t, t-1 ].T
			pi[ :t, :t ] = npmin( pi[ :t, :t ], pi[ :t, t-1 ] + pi[ t-1, :t ] )
			spm = pi[ :t, :t ]
			avg_path_ln[ t-1 ] = np.mean( np.where( np.isfinite( spm ),
				spm, 0 ), dtype = np.float64 ) * t / ( t - 1.0 )
## Get the average clustering coefficient
		if avclu :
			tri_num = 0
			for u in U :
				X = np.sum( adjacency[ u, U ] )
				triangles[ u ] += X
				tri_num += X
			triangles[ t-1 ] += tri_num // 2
			max_clust = degree[ :t ] * ( degree[ :t ] - 1 )
			clust = np.where( max_clust != 0, ( 2.0 * triangles[ :t ] ) / max_clust, 0 )
			avg_tri_clu[ t-1 ] = np.mean( clust, dtype = np.float64 )
## Finalize this step's statistics
		avg_age_deg[ :t-m0 ] += ( degree[ t-1 :m0-1 :-1 ] - avg_age_deg[ :t-m0 ]
			) / np.arange( t-m0, 0, -1, dtype = np.float )
		avg_age_deg[ t-m0 ] = np.mean( degree[ :m0 ], dtype = np.float64 )
		result.append( callback_fn( G, degree[ :t ], pi[ :t, :t ] ) )
	return G, result, avg_age_deg, avg_tri_clu, avg_path_ln # , pi

In [ ]:
## %lprun -f barabasi_albert_model_a barabasi_albert_model_a( 300, m = 1, avlen = True, avclu = False )
# G1, R1, A1, C1, L1 = barabasi_albert_model_a( 1000, m0 = 10, m=10, avlen = True, avclu = True )
# plt.loglog( C1 )

Model B

Model B retains the preferential attachment phase, but foregoes the growth step. Instead, a random existing node is picked and is linked with other nodes, preferentially chosing vertices with high degree. Initally this model begins its evolution from a set of isolated nodes.

For this model to work as intended the only option is to use the Bayesian valency distribution, since the originally proposed probabilities handle poorly the isolated nodes, as it has been alread stated.

A couple of nuances

Let $w'\in V$ be a randomly picked vertex. Since multiple connections and loops are forbidden in the model, the preferential attachment of a vertex $w'$ must use conditional node valency distribution. If a subset $S\subseteq V$, $w'\in S$, is prohibited, then for any vertex $v\in V\setminus S$

$$p\Big(\Big. w'\leadsto v\,\big.\big\rvert\, w'\not\!\leadsto S,\,G \Big.\Big) = \frac{p\Big(\Big. w'\leadsto v\,\big.\big\rvert G \Big.\Big)}{p\Big(\Big. w'\not\!\leadsto S\,\big.\big\rvert G \Big.\Big)} = \frac{\delta_n(v)+1}{\big\lvert V\setminus S\big\rvert + \sum_{u\in V\setminus S}\delta_n(u)} $$

In particular, the prohibited set for a vertex $w'$ is itself and the set of its immediate neighbours:

$$S = \big\{w'\big\}\cup N_G(w')$$

Another subtle point is the connection strategy when $\big\lvert N_G(w') \big\rvert\geq n-1-m$. If $\big\lvert N_G(w') \big\rvert < n-1$, the model forcefully connects $w'$ to all of the remaining vertices. If however $w'$ is saturated, then the attachment step for this node is skipped as this vertex is just unable to form new connections.


In [ ]:
## Another relaxation of the complete Barabasi-Albert model, which eliminates
##  the growth phase. This restriction uses the Bayesian valency distribution.
##  when performing preferential attachment.
def barabasi_albert_model_b( n, m = 5, callback = None, avclu = False, avlen = False ) :
	callback_fn = callback if callable( callback ) else lambda G, _, __: None
	G = nx.empty_graph( n )
## Local node degree cache
	degree = np.zeros( n, dtype = np.int64 )
	degree_sum = 0
	adjacency = np.zeros( ( n, n ), dtype = np.int )
## Shortest path matrix for the modified Floyd Warshall
	pi = 1 - ml.identity( n, np.float )
	pi[ pi == 1 ] = np.inf
	npmin = np.minimum
	triangles = np.zeros( n, dtype = np.int )
## Average path length and clustering
	avg_tri_clu = np.full( n, np.inf, dtype = np.float )
	avg_path_ln = np.full( n, np.inf, dtype = np.float )
	avg_age_deg = np.zeros( n, dtype = np.float )
## Begin the simulation
	t = 0
	result = list( )
	while t < n :
## "Pick an existing one at random" (cf. p.551 c.3 L 9 of [Barabasi, Albert; 1999])
##  ... and get its neighbours.
		v = rnd.randint( n, size = 1 )[ 0 ]
		S = nx.neighbors( G, v )
## If it has more neighbours than n-1-m, connect it to the remaining nodes
##  and continue. If the node is staturated, ignore it.
		m_prime = min( m, n - 1 - len( S ) )
		if m_prime > 0 :
## Get the correction terms due to conditioning (see the formula).
			c_term = degree[ v ] + 1.0 + np.sum( degree[ S ] ) + len( S )
## Compute the conditional valency of each vertex in the current configuration
			theta = ( degree + 1.0 ) / ( 0.0 + degree_sum + n - c_term )
## Zero out the prohibited vertices, so that they'll never be picked.
			theta[ v ] = 0 ; theta[ S ] = 0
## Pick $m'$ other nodes to preferentially link the chosen one to.
			U = rnd.choice( n, size = m_prime, p = theta, replace = False )
			G.add_edges_from( ( v, u ) for u in U )
			adjacency[ v, U ] = adjacency[ U, v ] = 1
			degree[ U ] += 1
			degree[ v ] += m_prime
			degree_sum += m_prime * 2 ;
## Compute the shortest paths
		if avlen :
			if m_prime > 0 :
				if False :
## Incrementally update the shortest path matrix by every new bidirectional edge
					for u in U :
## Compute the path through this edge
						path = pi[ :, u ] + 1.0 + pi[ v, : ]
## Choose the optimal edge direction and whether the new path is the shortest
						pi = npmin( pi, npmin( path, path.T ) )
				else :
					pi[ U, v ] = pi[ v, U ] = 1
## Construct the shortest paths thorugh each affected vertex U wit
##  intermediate nodes other than v. Before the addition of edges
##  the shortest path from any node to v could not have included
##  the vertex u in U as the last intermediate node, since there was no
##  edge between u and v prior to this generation. If it included it
##  as anything farther from v than one hop, then the shortest path has
##  to be updated. Also any prior path s~u through v with a new edge u-v
##  would create a cycle and be longer that the existing s~v (shorter that s~v~u~v).
## Therefore this step wold coorrectly compute the length of the shortest
##  paths to v
					for u in U :
						pi[ :, v ] = npmin( pi[ :, v ], pi[ :, u ] + pi[ u, v ] )
					pi[ v, : ] = pi[ :, v ].T
## See if paths x~v~y are shorter than x~y without v
					pi = npmin( pi, pi[ :, v ] + pi[ v, : ] )
# 				assert( np.all( nx.floyd_warshall_numpy( G ) == pi ) )
			avg_path_ln[ t ] = np.mean( np.where( np.isfinite( pi ), pi, 0 ),
				dtype = np.float64 ) * n / ( n - 1.0 ) if m_prime > 0 else avg_path_ln[ t-1 ]
## Get the average clustering coefficient
		if avclu :
			tri_num = 0
			for u in G[ v ] :
				X = np.sum( adjacency[ u, U ] )
				triangles[ u ] += X
				if u in U :
					tri_num += X
				else :
					triangles[ v ] += X
## it is necessary to inform the vertices adjacent to a preexisting neighbour
## that the edge (v,u) induced a new triangle through a newly added edge.
					triangles[ U ] += adjacency[ U, u ]
			triangles[ v ] += tri_num // 2
			max_clust = degree * ( degree - 1 )
			clust = np.where( max_clust != 0, ( 2.0 * triangles ) / max_clust, 0 )
			avg_tri_clu[ t ] = np.mean( clust, dtype = np.float64 ) if m_prime > 0 else avg_tri_clu[ t-1 ]
## In this model all nodes are the same age, which means that the degree-age
##  relationship is just the average degree with respect to the generation.
		avg_age_deg[ t ] = np.mean( degree ) if m_prime > 0 else avg_age_deg[ t-1 ]
		result.append( callback_fn( G, degree, pi ) )
		t += 1
	return G, result, avg_age_deg, avg_tri_clu, avg_path_ln, pi

In [ ]:
## %lprun -f barabasi_albert_model_b barabasi_albert_model_b( 400, m = 30, avlen = True, avclu = False )
# G1, R1, A1, C1, L1, pi = barabasi_albert_model_b( 1000, m = 10, avlen = True, avclu = False )
# plt.plot( C1 )

Vertex copying model

This model is another version of a growing network that shares features similar to preferential attachemnt.

  • the evolution begins with a small collection of randomly connected vertices $G_0$;
  • transition form $G_n = (V_n, E_n)$ to $G_{n+1} = (V_{n+1},E_{n+1})$ :
    1. a "role model" vertex $w$ is chosen uniformly at random from the existing vertices $V_n$;
    2. a new node $w'$ is created;
    3. every edge $(u,w)\in E_n$ is either kept with probability $q$, in which case an edge $(u,w')$ is created, or is abandoned with probability $1-q$; let $S$ be the set of endpoints of kept edges;
    4. for each endpoint $u$ of an abandoned edge a random node $u'\in V_n\setminus \big(S \cup \{u\}\big)$ is picked and an edge $(u',w)$ is added to the graph;

In [ ]:
def vertex_copying_model( G, n, q = 0.5, callback = None, avclu = False, avlen = False ) :
	callback_fn = callback if callable( callback ) else lambda G, _, __: None
	result = list( )
## Initialize the local degree accumulator
	m0 = G.number_of_nodes( )
	degree = np.zeros( n + m0, dtype = np.int )
	adjacency = ml.zeros( ( n + m0, n + m0 ) )
	pi = 1 - ml.identity( n + m0, np.float )
	pi[ pi == 1 ] = np.inf
	npmin = np.minimum
	triangles = np.zeros( n + m0, dtype = np.int )
## Average path length and clustering
	avg_tri_clu = np.zeros( n + m0, dtype = np.float )
	avg_path_ln = np.zeros( n + m0, dtype = np.float )
	avg_age_deg = np.zeros( n + 1, dtype = np.float )
## Preinitialize the local shortest path matrix and degree tracker
	deg = G.degree( )
	degree[ :m0 ] = [ deg.get( k, 0 ) for k in xrange( m0 ) ]
	pi[ :m0, :m0 ] = nx.floyd_warshall_numpy( G )
	avg_path_ln[ :m0 ] = np.mean( pi[ :m0, :m0 ], dtype = np.float64 ) * m0 / ( m0 - 1.0 )
	adjacency[ :m0, :m0 ] = nx.to_numpy_matrix( G )
	tri = nx.triangles( G )
	triangles[ :m0 ] = [ tri.get( k, 0 ) for k in xrange( m0 ) ]
## Begin the simulation
	t = G.number_of_nodes( )
	while t < n + m0 :
## Duplicate a random vertex
		neighbors = G[ rnd.randint( t, size = 1 )[ 0 ] ]
## For each neighbour, determine whether it is kept or abandoned
		survives = rnd.random_sample( size = len( neighbors ) ) < q
## Compile a list of surviving nodes
		survivors = set( u for m, u in zip( survives, neighbors ) if m )
## Construct the pool of possible target nodes
		targets = set( survivors )
		for s, w in zip( survives, neighbors ) :
			if not s :
## This neighbour should be abandoned. Pick another one at random, 
				w0 = w
##  ... bearing in mind that the new vertex must be other than any
##  of the survivors or the orignal neighbouring node.
				while w in survivors or w == w0 :
					w = rnd.randint( t, size = 1 )[ 0 ]
				targets.add( w )
## Add edges and keep track of the vertex degrees
		U = list( targets )
		G.add_edges_from( ( t, u ) for u in U )
		adjacency[ t, U ] = adjacency[ U, t ] = 1
		degree[ U ] += 1
		degree[ t ] += len( U )
## Preallocate the next vertex
		t = G.number_of_nodes( )
## Compute the shortest paths
		if avlen :
			pi[ U, t-1 ] = pi[ t-1, U ] = 1
			for u in U :
				pi[ :t, t-1 ] = npmin( pi[ :t, t-1 ], pi[ :t, u ] + pi[ u, t-1 ] )
			pi[ t-1, :t ] = pi[ :t, t-1 ].T
			pi[ :t, :t ] = npmin( pi[ :t, :t ], pi[ :t, t-1 ] + pi[ t-1, :t ] )
			spm = pi[ :t, :t ]
			avg_path_ln[ t-1 ] = np.mean( np.where( np.isfinite( spm ),
				spm, 0 ), dtype = np.float64 ) * t / ( t - 1.0 )
# 			assert( np.all( nx.floyd_warshall_numpy( G ) == pi[:t,:t] ) )
## Get the average clustering coefficient
		if avclu :
			tri_num = 0
			for u in U :
				X = np.sum( adjacency[ u, U ] )
				triangles[ u ] += X
				tri_num += X
			triangles[ t-1 ] += tri_num // 2
			max_clust = degree[ :t ] * ( degree[ :t ] - 1 )
			clust = np.where( max_clust != 0, ( 2.0 * triangles[ :t ] ) / max_clust, 0 )
			avg_tri_clu[ t-1 ] = np.mean( clust, dtype = np.float64 )
## Finalize this step's statistics
		avg_age_deg[ :t-m0 ] += ( degree[ t-1 :m0-1 :-1 ] - avg_age_deg[ :t-m0 ]
			) / np.arange( t-m0, 0, -1, dtype = np.float )
		avg_age_deg[ t-m0 ] = np.mean( degree[ :m0 ] )
		result.append( callback_fn( G, degree[ :t ], pi[ :t, :t ] ) )
	return G, result, avg_age_deg, avg_tri_clu, avg_path_ln # , pi

In [ ]:
# G0 = nx.fast_gnp_random_graph( 20, .5 )
## %lprun -f vertex_copying_model vertex_copying_model( G0.copy(), 100, q = .1, avlen = True, avclu = True )
# G1, R1, A1, C1, L1 = vertex_copying_model( G0.copy( ), 1000, q = .95, avlen = True, avclu = True )
# plt.loglog( C1 )

Simulation study

Analysis toolkit

A useful tool for exploring the tail behaviour of sample is the Mean Excess plot, defined as the

$$M(u) = \mathbb{E}\Big(\Big. X-u\,\big.\big\rvert\,X\geq u \Big.\Big)$$

of which the emprical counterpart is
$$\hat{M}(u) = {\Big(\sum_{i=1}^n 1_{x_i\geq u}\Big)^{-1}}\sum_{i=1}^n (x_i-u) 1_{x_i\geq u}$$ The key properties of $M(u)$ are

  • it steadily increases for power-law tails and the steeper the slope the smaller is the exponent;
  • it levels for exponential tails (heurstically: the case when $\alpha\to \infty$ is similar to an exponential tail);
  • it decays towards zero for a tail of a compactly supported distribution. When dealing with the empirical mean-excesses one looks for the trend in the large thresholds to discern behaviour, necessarily bearing in mind that in that region the varinace of the $\hat{M}(u)$ grows.

In [ ]:
from scipy.stats import rankdata
def mean_excess( data ) :
    data = np.array( sorted( data, reverse = True ) )
    ranks = rankdata( data, method = 'max' )
    excesses = np.array( np.unique( len( data ) - ranks ), dtype = np.int )
    thresholds = data[ excesses ]
    mean_excess = np.cumsum( data )[ excesses ] / ( excesses + 0.0 ) - thresholds
    return thresholds, mean_excess

Since the degree distribution, at last in the tails, is expected to be power law, define a procedure for estimating the exponent in : $$1-F(x) = \Big( \frac{x}{x_0} \Big)^{1-\alpha}$$ where $x_0$ is the tail threshold.


In [ ]:
## ML estimator of the power law in the "tail" (x≥u):
##  x_k \sim C x^{-\alpha} 1_{[u,+∞)}(x).
def mle_alpha( data, threshold ) :
    tail = np.array( [ v for v in data if v >= threshold ] )
    sum_log = np.sum( np.log( tail ) ) / ( len( tail ) + 0.0 )
    alpha = 1.0 + 1.0 / ( sum_log - np.log( threshold ) )
    return alpha

It is always useful to have a handy linear regression tool at one's disposal (even if it is not used).


In [ ]:
## Construct a regression model
import numpy.linalg as la
def lm_model( X, Y, intercept = True ) :
    T = np.array( Y, dtype = float )
    M = np.array( X, dtype = float )
    if intercept is True :
        M = np.vstack( [ np.ones( len( Y ) ), M ] ).T
## Remove infinities and not-a-numbers
    common = np.all( np.isfinite( M ), axis = 1 ) & np.isfinite( T )
    return (M[ common, : ],T[ common ], intercept)

## Define the OLS regression routine:
def lm_fit( model ) :
	M, T, intercept = model
	MMinv = la.inv( ## implement (X'X)^{-1} (X'Y)
		np.dot( M.T, M ) ) 
	coef = np.dot( MMinv,
		np.dot( M.T, T ) )
## Estimate the residual standard deviation
	resid = T - np.dot(M, coef)
	dof = len( T ) - len( coef )
	RSS = np.dot( resid.T, resid )
	return (coef, RSS, dof, MMinv )

Node degree dynamics with age

It might be insightful to inspect the dependence of the the node degree on its age. If $A_{it}$ is the average degree of a node of age $i$ by the end of generation $t\geq i$, then

$$ A_{it} = \frac{ \sum_{j=i}^t d_{t-i,j} }{t-i+1} $$

where $d_{kt}$ is the degree of a node born at $k$ by the end of generation $t\geq k$ and $t-i+1$ is the number of nodes of age $i$. For convenience set $A_{it} = 0$ for all $i>t$. In the orgiginal Barabasi-Albert modela and in its type-A restriction $A_{0t} = m$, while in the model B it is no more than $m$ and in the vertex copying model it is no more that the degree of some existing vertex.

In order to compute the average of a sequence of vaeus coming in real-time, while still consuming $O(1)$ memory it is necessary to use an iterative formula for the mean. For sequence $\big(x_t\big)_{t\geq 1}$ the mean could be computed iteratively:

$$\bar{x}_t = \bar{x}_{t-1} + \frac{1}{t}\big(x_t - \bar{x}_t\big)$$

which immediately follows from associativity of finite sums: $\sum_{k=1}^t x_k = \sum_{k=1}^{t-1} x_k + x_t$.

In the case at hand, the iterative formula translates into $$ A_{it} = A_{i,t-1} + \frac{ d_{t-i,t} - A_{i,t-1} }{t-i+1} $$

Let me elaborate a little bit on how the the simulation procedures keep track of the node degrees and the average degree with respect to the age.

The nodes are created one-by-one and each is idetified by the number of nodes before its birth. This way it is easy to choose the nodes which were born at a given iteration and therefore the nodes of a certain age. For instance, if there are $t$ nodes in the graph $G$, then the next node is created with index $t$ making all the vertices into a sequence $0,1,\ldots, t-1,t$ of $t+1$ elements. At the same time the node of age $t\geq\tau\geq0$ can be pinpointed by the index $t-\tau$.

The oldest nodes, wthose which existed before the first generation ever are assigned age $t+1$ (pre-simulation era).

The degree array has the following structure at any particular generation $t$:

  • first $m_0$ integers (indices $0, \ldots, m_0-1$) hold the degrees of the ancestral vertices;
  • the next $t-m_0$ values ( $m_0, \ldots, t-m_0-1$ ) contain the degrees of vertiecs, created during the simulation;
  • the youngest vertex is located at index $t$ in the degree array;

The avergage degree array has similar structure :

  • the first $t-m_0$ (indices $0$ throguh $t-m_0-1$ values contain the values for the created nodes;
  • the last, $t-m_0+1$-th value (exactly at $t-m_0$) is the combined average degree of the initial $m_0$ nodes.

The degree accounting and the average degree per age tracking are a bit contrived, but it is a necesary payment for speed.

Computing shortest paths

The average path length is computed by to the formula:

$$ \bar{l}_n = \frac{1}{n(n-1)}\sum_{x\neq y} d_{xy}$$

where $n=|V|$ and $d_{xy}$ is the length of the shortest path between nodes $x$ and $y$ in the network $G_n$. Note that the smulation of the model B is more likely to end up with a disconnected graph the lower the parameter $m$ is. To avoid this the average path length is computed wit the assumption that $d_{xy}=0$ if $x\not\sim y$, ie unreachability of $y$ from $x$.

Since the graph is undirected and path lenghts are measured in the number of hops between a pair of vertices, the first thing that comes to mind is the breadth-first-search algorithm with complexity $O\big(\lvert V\rvert + \lvert E\rvert \big)$.


In [ ]:
from collections import deque
def bfs_shortest_path( G, s = 0 ) :
## Initalize the array of shortest paths
    pi = np.full( G.number_of_nodes( ), np.inf, np.float )
## Cache deque operations for faster access (Python won't have to make lookups)
    deq = deque.popleft ; enq = deque.append
## Initialize with the source vertex
    Q = deque( [ s ] ) ; pi[ s ] = 0
## While the (de)queue is not empty
    while Q :
## get the virst vertes to have been added
        v = deq( Q )
## For each neighbour...
        for n in G[ v ] :
## Check if it has not been visited before
            if pi[ n ] != np.inf : continue
## Add it to the queue and update its distance from the source
            enq( Q, n ) ; pi[ n ] = pi[ v ] + 1
    return pi

However in order to compute the length of a shortest path between each pair $u,v\in V$, it is necessary to run the BFS starting at each vertes $s\in V$. Thus the complxity of the algorithm becomes $O\big(\lvert V\rvert^2 + \lvert V\rvert \lvert E\rvert \big)$. The overall complexity of hte simulation is $O\Big(\lvert V\rvert^3 + \lvert V\rvert^2 \lvert E\rvert \Big)$, without average culstering computations, which are approximately $O(\lvert V\rvert^3)$.


In [ ]:
def avg_shortest_path( G ) :
    asl = 0.0
    for n in G :
        asl += sum( bfs_shortest_path( G, n ) )
    return asl

The main drawback of this apporach is that in a growing network with dynamically added edges, the shortest paths of unaffected nodes haveto be recomputed. That is why a modified Floyd-Warshall algorithm has been used for this purpose.

In its original form FW has complexity $O\Big(\lvert V\rvert^3 \Big)$, which would have made the overall complexity of the simulation to be $O\Big(\lvert V\rvert^4\Big)$. When the growing nature of the Barabasi-Albert model is taken into account, the overall complexity of the simulation, without hte clustering, becomes $O\big(\lvert E\rvert \lvert V\rvert^2\big)$, which though looks larger, still runs faster.

Incremental modification of Floyd-Warshall

Suppose the matrix $d_{ij}$ contains the shortest paths between nodes in the graph $G = (V,E)$. Furhtermore a graph $G' = (V', E')$ is created from $G$ by a pure growth process: a new vertex $\omega\notin V$ is created with edges $E_\omega$ to nodes in $V$. Let $N_\omega = \big\{\big.u\in V\big.\big\rvert \{u,\omega\}\in E_\omega \big.\big\}$ and let $\sqsubseteq$ be some total order on the finite set $N_\omega$.

After the introduction of a new vertex the graph $G'$ is given by $V'=V\cup\{\omega\}$ and $E' = E\cup E_\omega$. The matrix $d_{ij}$ however is yet to be updated to the new connectivity structure of $G'$. Assume that no negative weight cycles have been created by this operation.

Before accounting for any new edges, the matrix is updated by setting $d_{x\omega} = d_{\omega x} = +\infty$ for any $x\neq\omega$ and $d_{\omega\omega} = 1$. At this point the matrix $d_{ij}$ has the following property: for nodes $i,j\in V$ the value $d_{ij}$ is the length of the shortest path through nodes other than $\omega$.

The incorporation $\omega$ into the shortest paths of $G'$ begins with the computation of the shortest path from any othrer node $x\in V$ to $\omega$. The idea mirrors the Floyd-Warshall algorithm:

  1. initially the column $d_{i\omega}$ represents shortest path lengths from any $x$ to $\omega$ through intermediate vertices in $V$ excluding $N_\omega$;
  2. iterate through the vertices in $N_\omega$ in ascending order of $\sqsubseteq$ ($u\in N_\omega$) :
    1. for a given iteration $u$ the column $d_{i\omega}$ contains the lengths of the shortest path from evry $i\in V$ to $\omega$ which pass through any vertex $n\in N_\omega$ with $n\sqsubset u$;
    2. if the true shortest path from $i$ to $\omega$ does not contain the node $u$ then the shortest path $x\to\omega$ is carried to the next iteration: we have $d_{i\omega} \leq d_{iu} + d_{u\omega}$;
    3. otherwise, $u\in \pi^*_{i\omega}$, where $\pi^*_{i\omega}$ is a shortest path from $i$ to $\omega$. It is easy to verify, that the subpaths of $\pi^*_{i\omega}$ from $i$ to $u$ ($\pi_{iu}$) and from $u$ to $\omega$ ($\pi_{u\omega}$), are themselves the shortest. In this case $d_{i\omega} \geq d_{iu} + d_{u\omega}$, since the length $d_{i\omega}$ cannot correspond to a path shorter than $\pi^*_{i\omega}$, for it would then contradict the minimality of $\pi^*_{i\omega}$.

The row $d_{\omega i}$ is just transpose of $d_{i \omega}$. After this phase the matrix $d_{ij}$ for either $i=\omega$ or $j=\omega$ obeys the following restriction:

  • for any $x\in V$ the element $d_{x\omega}$ (and $d_{\omega x}$) is the length of the shortest path from $i$ to the new node $\omega$ with intermediate vertices exclusively in $V$.

Note that $d_{ij}$ fro $i,j\in V$ corresponds to shortest paths with $\omega\notin \pi^*_{ij}$ -- the suboptimal paths.

The final step of the algorithm is the computatuion of the paths passing through $\omega$.

Consdier any $x,y\in V'$. If $\omega\notin \pi^*_{xy}$, ie. $\omega$ is not an intermediate vertex on the shortest path between $x$ and $y$, then the fact that $d_{xy}$ is the shortest path with intermendiate vertices other than $\omega$ implies that any path through $\omega$ cannot be shorter than $d_{xy}$. Therefore $d_{xy}\leq d_{x\omega}+d_{\omega y}$ and nothing gets be updated.

Note that a shortest path cannot contain a cycle of positive length, for otherwise it would be shortest, and there are no edges with non-positive lengths.

Now $\omega\in \pi^*_{xy}$ implies that the $x\sim\omega$ and $\omega\sim y$ subpaths of $\pi^*_{xy}$ have intermediate vertices entirely in $V\setminus \{\omega\}$ due to no cycle property, whence each subpath must be the shortest. Therefore joining these paths cannot yield a path other than the shortest ($\pi^*_{xy} = \pi^*_{x\omega} \cup \pi^*_{\omega y}$). The fact that $\omega \notin \pi^*_{x\omega}$ implies that the shortest path length is given by $d_{x\omega}$. (similary for the $\omega\sim y$ shortest path).

Computatiion of the clustering coefficient

The average clustering coefficient of a graph $G=(V,E)$ is defined by the following formula :

$$\bar{c} = \frac{1}{n}\sum_{x\in V}c_x$$

where $n=|V|$ and $c_x$ is the local clustering coefficient of vertex $x\in V$ defined below.

The local (trinagular) clustering coefficient of a node $x\in V$ is defined as the ratio of the number of unique edge triangles containing $x$ to the number of unique triangles a vertex has in a complete graph of order $\delta_x$ -- the degree of $x$ in $G$.

The expression for $c_x$ is $$c_x = \frac{1}{\delta_x (\delta_x-1)} \sum_{u\neq x} \sum_{v\neq x,u} 1_{xu} 1_{uv} 1_{vx} = \frac{1}{\delta_x (\delta_x-1)} \#_{x}$$

where $1_{ij}$ is the indicator equal to $1$ if the edge (undirected) $(i,j)\in E$ and $0$ otherwise.

Suppose the graph $G$ is altered by adding a bunch of edges $E_+$ along with addition of a new vertex $\omega\notin V$. Then each fresh edge $(u,v)\in E_+$ can only create new triangles, and thus affects $\#_u$ and $\#_v$ by adding the following number of triangles:

$$\nabla_{uv} = \big\lvert N_u \cap N_v \big\rvert$$

ie the number of vertices adjacent to both endpoints of the edge. Note that it has been implicitly assumed that there exist no loops in $G$.

If all newly created edges are cast from a single common node $\omega$ then the computation of the number of triangles can be simplified: for each $u\in U$

$$\nabla_{u\omega} = \big\lvert N_u \cap U \big\rvert$$

In practice, when each newly created edge $(u,\omega)$ for all $u\in U$ is accounted for, the number of triangles of any $u\in U$ is increased by $\nabla_{u\omega}$ and only once when the edge $(u,\omega)$ is considered, whereas $\#_\omega' = \frac{1}{2}\sum_{u\in U}\nabla_{u\omega}$ becasue prior to addition of edges the vertex $\omega$ was isolated and thus was not in any triangle.

The $\frac{1}{2}$ factor is justified by the following argument. Note that there is a triangle of the form $\Delta_{u\omega v}$ if and only if $u,v\in U$ and $(u,v)\in E$. Thus any triangle containing $\omega$ is counted twice.

If extra edges all begin with an already existing node $\omega\in V$ (no multiple edges or loops are allowed), then the correction, then every edge $(\omega, u)$ with $u\in N_\omega \setminus U$ creates extra $\nabla_{u\omega} = \lvert U\cap N_u \rvert$ triangles for both $u$ and $\omega$, and adds an extra triangle for each $v\in U$ adjacent to $u$, i.e. such that $(u,v)\in E$.

Other metrics

Besides the average clustering and shortest path length we also need the estimate of the power law exponent.


In [ ]:
def analyse( G, degree = None, pi = None ) :
    deg = G.degree( ).values( ) if degree is None else degree
    alpha = mle_alpha( deg, np.median( deg ) )
    return ( alpha, )

Visualisation

Define a couple of service function to jhelp visualize the simulation results. First, are the mean excess and power law exponent plots.


In [ ]:
def MEPPLEP( RR, labels = ['B-A', 'BB-A'], T = [ "-k", "-r" ] ) :
    plt.figure(1, figsize = (16,6))
## Mean excess plot
    plt.subplot(121)
    for r, l, t in zip( RR, labels, T ) :
        u, m = mean_excess( nx.degree( r[ 0 ] ).values( ) )
        plt.plot( u, m, t, label = l )
    plt.legend( loc = 'upper left' )
    plt.title( "Mean excess plot" )
    plt.ylabel( "Mean Excess" )
    plt.xlabel( "Threshold" )
## The MLE of alpha plot
    plt.subplot(122)
    for r, l, t in zip( RR, labels, T ) :
        plt.plot( [ a[ 0 ] for a in r[ 1 ] ], t, label = l )
    plt.legend( loc = 'upper right' )
    plt.title( "the Power Law exponent" )
    plt.ylabel( "Alpha" )
    plt.xlabel( "Generation" )
    ## Commit to the device
    plt.show( )

The node degreee distribution plot:


In [ ]:
def NDFP( RR, labels = ['B-A', 'BB-A'], T = [ "+k", "xr" ], linlog = False, lin = False, fig = True ) :
    if fig : plt.figure(1, figsize = ( 8, 6 ),  )
#     mdl_fit = lambda r, c : np.exp( c[ 0 ] + c[ 1 ] * np.log( r ) )
    for r, l, t in zip( RR, labels, T ) :
## Draw the degree distribution
        v, f = np.unique( nx.degree( r[ 0 ] ).values( ), return_counts = True )
        if linlog :
            plt.plot( v, np.log( f ), t, label = l )
        elif lin :
            plt.plot( v, f, t, label = l )
        else :
            plt.loglog( v, f, t, label = l )
    plt.legend( loc = 'upper right' )
    plt.title( "Node degree frequency" )
    if linlog :
        plt.xlabel( "Degree" )
    else :
        plt.xlabel( "Degree (log)" )
    if lin :
        plt.ylabel( "Frequency" )
    else :
        plt.ylabel( "Frequency (log)" )
## Commit to the device
    if fig : plt.show( )

Below a routine to plot the average shortest path length and the mean degree dependepcy on the age of a node is defined.


In [ ]:
def AVDPAPLP( RR, labels = ['B-A', 'BB-A'], T = [ "-k", "-r" ] ) :
    plt.figure( 1, figsize = ( 16, 6 ) )
## Draw the degree-age relationship
    plt.subplot(121)
    for r, l, t in zip( RR, labels, T ) :
        plt.plot( r[ 2 ], t, label = l )
    plt.legend( loc = 'upper left' )
    plt.title( "Average vertex degree" )
    plt.xlabel( "Age" )
    plt.ylabel( "Avg. degree" )
## Plot the average path length
    plt.subplot(122)
    for r, l, t in zip( RR, labels, T ) :
        plt.plot( r[ 4 ], t, label = l )
    plt.legend( loc = 'lower right' )
    plt.title( "Average path length" )
    plt.xlabel( "Number of vertices" )
    plt.ylabel( "Length" )
## Commit to the device
    plt.show( )

It is also necessary to depict the dynamics of the clustering coefficient.


In [ ]:
def ACCP( RR, labels = ['B-A', 'BB-A'], T = [ "-k", "-r" ], lin = False, fig = True ) :
    if fig : plt.figure( 1, figsize = ( 8, 6 ) )
    for r, l, t in zip( RR, labels, T ) :
        if lin :
            plt.plot( r[ 3 ], t, label = l )
        else :
            try :
                plt.loglog( r[ 3 ], t, label = l )
            except :
                pass
    plt.legend( loc = 'upper right' )
    plt.title( "Average clustering coefficient" )
    if lin :
        plt.xlabel( "Number of vertices" )
        plt.ylabel( "Coefficient" )
    else :
        plt.xlabel( "Number of vertices (log)" )
        plt.ylabel( "Coefficient (log)" )
    if fig : plt.show( )

The analysis

Let's check the degree distribution and its depndence on the age of the node for the following models:

  • The original Barabasi-Albert model ("B-A");
  • The Brarabasi-Albert model with the Bayesian valency distribution ("BB-A");
  • Type-A restriction: Barabasi-Albert model with uniform attachment ("mdlA");
  • Type-b restriction: Barabasi-Albert model with stunted growth and Bayesian valency distribution ("mdlB");
  • The vertex-copying model ("VCM").

In [ ]:
nGENR = 2000

Compare the distributions, dependence of degree on age and the heaviness of the tail.


In [ ]:
OBA_10 = barabasi_albert( nGENR, m = 10, callback = analyse, avlen = True, avclu = True )
BBA_10 = barabasi_albert_model_a( nGENR, m0 = 10, m = 10, bayes = True, callback = analyse, avlen = True, avclu = True )

For moderaltely large values of $m$ both the Bayesian valency and prefereantial attachment should be asymptotically identical.


In [ ]:
MEPPLEP( [OBA_10, BBA_10] )

First of all, for these settings ($\log_{10} n\approx 3.2$ with $m=10$) the bayesian attachments and preferential attahcment distributions produce netowrks with very close degree distributions. At least judging by the Mean Excess plots and the dynamics of the power law exponent. Indeed, the ME plot (red, and black) exhibits upwards trend, suggesting a pareto-like (heavy tailed) distribution, and the ML estimates of $\alpha$ do seem converge to one another.


In [ ]:
NDFP( [OBA_10, BBA_10], T = [ '^k', "vr" ] )

The plot of the distribution of vertex degrees indicates a great share of similarity between the bayesian and the original Barabasi-Albert models.

Let's have a look at the average path length and the dependence of the node degree on the node's age.


In [ ]:
AVDPAPLP( [OBA_10, BBA_10] )

Clearly picking neighbours according to bayesian valency or preferential attachment affects neither the average path length nor the average degree dynamics.

The average path length appears to become saturated as the number of nodes in the network increases.


In [ ]:
ACCP( [OBA_10, BBA_10] )

The dynamics of the average custering coefficient of both models are amost identical. For large number of nodes the trend appears to be a straight line, which suggests that $\bar{c}\propto n^{-\gamma}$.


In [ ]:
n = nx.number_of_nodes( OBA_10[ 0 ] )
coef, rss, dof, xx = lm_fit( lm_model(
    X = np.log( np.arange( n ) ),
    Y = np.log( OBA_10[ 3 ] ) ) )
HTML( "The LS estimate of $\gamma$ is %.3f (%.3f)" % ( coef[ 1 ], np.sqrt( rss / dof * xx[ 1, 1 ] ) ) )

In [ ]:
HTML( "The $\chi^2$ test statistic is %.3f with the p-value %.3g" % chisquare(
    nx.degree( OBA_10[ 0 ] ).values( ), nx.degree( BBA_10[ 0 ] ).values( ) ) )

However despite this empirical evidence, a more rigorous $\chi^2$-test rejects the null hypothesis that the distributions are alike. Indeed, Bayesian valency distribution seems to have a slower growing tial than the preferential attachment precisely due to the bayesian smoothing performed for isolated vertices.

Lower $m$

For a lower $m$ the models should differ more, at least according to the analysis of the asymptotic distribution.


In [ ]:
OBA_1 = barabasi_albert( nGENR, m = 1, callback = analyse, avlen = True, avclu = True )
BBA_1 = barabasi_albert_model_a( nGENR, m0 = 1, m = 1, bayes = True, callback = analyse, avlen = True, avclu = True )

In [ ]:
MEPPLEP( [OBA_1, BBA_1] )

The plot of $\hat{\alpha}_n$ show that for $m=1$ the models have slightly different asymptiotic tail. As has been shown earlier using the mean field approach, which suggested that the asymptotic distribution of the node degree in the case of the Bayesian valency attachment would be $\propto k^{-\alpha}$ with $\alpha \approx -3 - \frac{1}{m}$, whereas in the original model $\alpha \approx -3$. Indeed, the distinction between the models is noticable on the power law exponent plot.

The bayesian smoothing spreads the probability less extremely, which leads to slightly lighter tail than the preferential attachment.


In [ ]:
AVDPAPLP( [OBA_1, BBA_1] )

Interstingly in this extreme case, the bayesian valency attachment seems to keep the vertices on average more separated, than the preferential attachemnt. Once again this is the manifestation of the diffusuin of hte probability mass away from the nodes with dense connectivity and in favour of the nodes with extrmely poor connectivity.

In the special case of $m=1$ the generated network is necessarily acyclic. Indeed, when an edge is created it is always attached to a completely new node, which would have been isolated had the edge been not created. Thus the addition of a new node with exactly one random edge guarantees that there will be no cycles in the network, never mind the triangles.


In [ ]:
HTML( "The $\chi^2$ test statistic is %.3f with the p-value %.3g" % chisquare(
    nx.degree( OBA_1[ 0 ] ).values( ), nx.degree( BBA_1[ 0 ] ).values( ) ) )

This empirical evidence suggests that the effect of a smaller $m$ on the scale-free properties of the network is insignificant. However decreasing $m$ increases the average path length and decreases the average degree.

$m=100$

In [ ]:
OBA_100 = barabasi_albert( nGENR, m = 100, callback = analyse, avlen = True, avclu = True )
BBA_100 = barabasi_albert_model_a( nGENR, m0 = 100, m = 100, bayes = True, callback = analyse, avlen = True, avclu = True )

In [ ]:
MEPPLEP( [OBA_100, BBA_100] )

Again the plots look very much alike. The thing that draws attention to itself, is that though it is know theoretically that the asymptotic degreee distribution is power law: $\mathbb{P}(\delta \geq k)\propto k^{1-\alpha}$ for $\alpha>1$, -- still the Mean Excess plot seems to suggest a distribution with a lighter tail.

Note that somwehre around the $m$-th generation (with $m_0 + m$ nodes) the there seems to be a qulaitiaive change in the network, at least according the the emorircal evidence sugested by the $\hat{\alpha}_n$ plot.


In [ ]:
NDFP( [OBA_100, BBA_100], T = [ '^k', "vr" ] )

Usually more realiable estimates of the power law exponent are produced by the following regression model:
$$ \ln \sum_{y>v} f_y \sim (1-\alpha) \ln v + c$$ suggeested by the fact $\mathbb{P}(\delta > k) \propto k^{1-\alpha}$


In [ ]:
## Estimate the complementary cdf model
v, f = np.unique( nx.degree( OBA_100[ 0 ] ).values( ), return_counts = True )
cf = np.sum( f ) - np.cumsum( f )
coef, rss, dof, xx = lm_fit( lm_model( X = np.log( v ), Y = np.log( cf ) ) )
HTML( "The LS estimate of $\\alpha$ is %.3f (%.3f)" % ( 1 - coef[ 1 ], np.sqrt( rss / dof * xx[ 1, 1 ] ) ) )

In [ ]:
AVDPAPLP( [OBA_100, BBA_100] )

Empirically the clustering coefficent seems to obey $\ln \bar{c}_n \propto \gamma \ln n$.


In [ ]:
n = nx.number_of_nodes( OBA_100[ 0 ] )
coef, rss, dof, xx = lm_fit( lm_model( X = np.log( np.arange( n ) ), Y = np.log( OBA_100[ 3 ] ) ) )
HTML( "Preferential B-A: The LS estimate of $\\gamma$ is %.3f (%.3f)" % ( coef[ 1 ], np.sqrt( rss / dof * xx[ 1, 1 ] ) ) )

In [ ]:
n = nx.number_of_nodes( BBA_100[ 0 ] )
coef, rss, dof, xx = lm_fit( lm_model( X = np.log( np.arange( n ) ), Y = np.log( BBA_100[ 3 ] ) ) )
HTML( "Bayesian B-A: The LS estimate of $\\gamma$ is %.3f (%.3f)" % ( coef[ 1 ], np.sqrt( rss / dof * xx[ 1, 1 ] ) ) )

In [ ]:
ACCP( [OBA_100, BBA_100] )

In [ ]:
HTML( "The $\\chi^2$ test statistic is %.3f with the p-value %.3g" % chisquare(
    nx.degree( OBA_100[ 0 ] ).values( ), nx.degree( BBA_100[ 0 ] ).values( ) ) )

The main conclusion is that the prior theoretical discussion and the analysis of simulational evidence seem to support that the Bayesian valency and the preferential attachment models are asymptotically equivalent in the case when $m>1$.

Comparing restricted models

Let's compare the dynamics of the restricted models.

Case: $m=10$

In [ ]:
MA_10 = barabasi_albert_model_a( nGENR, m0 = 10, m = 10, bayes = False, callback = analyse, avlen = True, avclu = True )
MB_10 = barabasi_albert_model_b( nGENR+10, m = 10, callback = analyse, avlen = True, avclu = True )

In [ ]:
MEPPLEP( [MA_10, MB_10], labels = [ 'mdlA', 'mdlB' ] )

The mean excess plots suggest that the distribution of the node degrees follows a light tailed law, such as geometric (exponential) law.


In [ ]:
NDFP( [MA_10, MB_10], labels = [ 'mdlA', 'mdlB' ], T = [ 'ok', "xr" ], linlog = True )

The node degree frquency plotted on a log-lin scale hints at an exponential degree distribution. In general the geometric distribution is simialr to
$$\mathbb{P}(\delta > k) \propto e^{-\lambda k}$$


In [ ]:
## Estimate the complementary cdf model
v, f = np.unique( nx.degree( MA_10[ 0 ] ).values( ), return_counts = True )
cf = np.sum( f ) - np.cumsum( f )
coef, rss, dof, xx = lm_fit( lm_model( X = v, Y = np.log( cf ) ) )
HTML( "Model A: the LS estimate of $\\lambda$ is %.3f (%.3f)" % ( - coef[ 1 ], np.sqrt( rss / dof * xx[ 1, 1 ] ) ) )

In [ ]:
## Estimate the complementary cdf model
v, f = np.unique( nx.degree( MB_10[ 0 ] ).values( ), return_counts = True )
cf = np.sum( f ) - np.cumsum( f )
coef, rss, dof, xx = lm_fit( lm_model( X = v, Y = np.log( cf ) ) )
HTML( "Model B: the LS estimate of $\\lambda$ is %.3f (%.3f)" % ( - coef[ 1 ], np.sqrt( rss / dof * xx[ 1, 1 ] ) ) )

In [ ]:
AVDPAPLP( [MA_10, MB_10], labels = [ 'mdlA', 'mdlB' ] )

The average path length dynamics differs between restrictions: in model B the graph is initially dsiconnected, whereas in type-A model it is connected. Since the average path length plotted ommits the parth of infinite lengths (mutually unreachable vertices). Therefore in effect the plots illustrate the average length only of the nodes are which are connected.


In [ ]:
ACCP( [MA_10, MB_10], labels = [ 'mdlA', 'mdlB' ], lin = True )

Restricted models seem to exhibit different clustering, which is due to the fact that the model B is of fixed size andthe model A is growing.

The mean clustering coefficient is increasing with the number of nodes since the network in the type-B restriction has fixed size and thus colud become saturated with edges in finite number of generations $\approx n^2$.

In contrast, since the graph in model A keeps on growing, it can never be saturated, and its density $\frac{2 m n}{n(n-1)} = 2\frac{m}{n-1} \to 0$. Thus, at least heuristically, it is clear that the chances of a randomly selected node being a vertex of a triangle decays to zero.


In [ ]:
n = nx.number_of_nodes( MA_10[ 0 ] )
coef, rss, dof, xx = lm_fit( lm_model( X = np.log( np.arange( n ) ), Y = np.log( MA_10[ 3 ] ) ) )
HTML( "Model A: The LS estimate of $\\gamma$ is %.3f (%.3f)" % ( coef[ 1 ], np.sqrt( rss / dof * xx[ 1, 1 ] ) ) )

In [ ]:
n = nx.number_of_nodes( MB_10[ 0 ] )
coef, rss, dof, xx = lm_fit( lm_model( X = np.log( np.arange( n ) ), Y = np.log( MB_10[ 3 ] ) ) )
HTML( "Model B: The LS estimate of $\\gamma$ is %.3f (%.3f)" % ( coef[ 1 ], np.sqrt( rss / dof * xx[ 1, 1 ] ) ) )
Case: $m=100$

Lets study the sample evolution of the restricted Barabasi-Albert models for another $m = 100$.


In [ ]:
MA_100 = barabasi_albert_model_a( nGENR, m0 = 100, m = 100, bayes = False, callback = analyse, avlen = True, avclu = True )
MB_100 = barabasi_albert_model_b( nGENR + 100, m = 100, callback = analyse, avlen = True, avclu = True )

In [ ]:
MEPPLEP( [MA_100, MB_100], labels = [ 'mdlA', 'mdlB' ] )

Both restrictions seem to give up the scale free property of the original model, as suggested by the mean excess plot.

The mean excess plot suggests light tailed or finite distributions.


In [ ]:
NDFP( [MB_100, MA_100], labels = [ 'mdlB', 'mdlA' ], T = [ "xr", 'ok' ], linlog = True )

The degreee frequency histogram suugests that the restricted models do not preserve the scale-free property.


In [ ]:
AVDPAPLP( [MA_100, MB_100], labels = [ 'mdlA', 'mdlB' ] )

The "rich-get-richer" emprical law does not hold in this case, at least judging by the average degree denpendence on the age of a node. Type-B restriction has to


In [ ]:
ACCP( [MA_100, MB_100], labels = [ 'mdlA', 'mdlB' ], lin = True )

Edge creation process in a graph of fixed order produces an increasingly densely connected network with each generation, since when selected as the source of new randomly cast edges, a vertex already has some prexisting connectivity, which makes it more likely for a new edge to complete a triangle. That is why the clustering coefficient plots are so different.

To sum up, both the network growth and preferential (or Bayesian) attachment seem to be necessary conditions for a graph to have scale-free properties.


VCM

Now let's examine the properties of the Vertex Copying Model.


In [ ]:
nGENR = 1000

In [ ]:
probs = [ 0.05, 0.25, 0.50, 0.75, 0.95]

Run the simulation


In [ ]:
G0 = nx.fast_gnp_random_graph( int( 0.01 * nGENR ), .1 )
VCM_10_10  = [ vertex_copying_model( G0.copy( ), nGENR, prob, callback = analyse, avclu = True, avlen = True ) for prob in probs ]

In [ ]:
G0 = nx.fast_gnp_random_graph( int( 0.01 * nGENR ), .5 )
VCM_10_50  = [ vertex_copying_model( G0.copy( ), nGENR, prob, callback = analyse, avclu = True, avlen = True ) for prob in probs ]

In [ ]:
G0 = nx.fast_gnp_random_graph( int( 0.01 * nGENR ), .9 )
VCM_10_90  = [ vertex_copying_model( G0.copy( ), nGENR, prob, callback = analyse, avclu = True, avlen = True ) for prob in probs ]

In [ ]:
G0 = nx.fast_gnp_random_graph( int( 0.10 * nGENR ), .1 )
VCM_100_10 = [ vertex_copying_model( G0.copy( ), nGENR, prob, callback = analyse, avclu = True, avlen = True ) for prob in probs ]

In [ ]:
G0 = nx.fast_gnp_random_graph( int( 0.10 * nGENR ), .5 )
VCM_100_50 = [ vertex_copying_model( G0.copy( ), nGENR, prob, callback = analyse, avclu = True, avlen = True ) for prob in probs ]

In [ ]:
G0 = nx.fast_gnp_random_graph( int( 0.10 * nGENR ), .9 )
VCM_100_90 = [ vertex_copying_model( G0.copy( ), nGENR, prob, callback = analyse, avclu = True, avlen = True ) for prob in probs ]

The mean excess plots presented below suggest that the degree distribution is not the power law and in fact has light tails. This hints at the possibility that the vertex copying model does not generate a scale-free model.

There is an obvious dependency on the probability of copying an edge: the higher the $q$ the more likely it is for a new vertex to mimic the connectivity of an existing node.


In [ ]:
MEPPLEP(VCM_10_10, labels = ["m0 10 p %.2f"%P for P in probs], T=["-" for p in probs ] )
MEPPLEP(VCM_100_10, labels = ["m0 100 p %.2f"%P for P in probs], T=["-" for p in probs ] )
MEPPLEP(VCM_10_50, labels = ["m0 10 p %.2f"%P for P in probs], T=["-" for p in probs ] )
MEPPLEP(VCM_100_50, labels = ["m0 100 p %.2f"%P for P in probs], T=["-" for p in probs ] )
MEPPLEP(VCM_10_90, labels = ["m0 10 p %.2f"%P for P in probs], T=["-" for p in probs ] )
MEPPLEP(VCM_100_90, labels = ["m0 100 p %.2f"%P for P in probs], T=["-" for p in probs ] )

The node frequiencies also seem to confirm that the resulting networks are not scale free.


In [ ]:
plt.figure(1, figsize = ( 16, 6 ) )
plt.subplot(121)
NDFP(VCM_10_10, labels = ["m0 10 p %.2f"%P for P in probs], T=[".-" for p in probs ], linlog = True, lin = True, fig = False )
plt.subplot(122)
NDFP(VCM_100_10, labels = ["m0 100 p %.2f"%P for P in probs], T=[".-" for p in probs ], linlog = True, lin = True, fig = False )
plt.show()
plt.figure(2, figsize = ( 16, 6 ) )
plt.subplot(121)
NDFP(VCM_10_50, labels = ["m0 10 p %.2f"%P for P in probs], T=[".-" for p in probs ], linlog = True, lin = True, fig = False )
plt.subplot(122)
NDFP(VCM_100_50, labels = ["m0 100 p %.2f"%P for P in probs], T=[".-" for p in probs ], linlog = True, lin = True, fig = False )
plt.show()
plt.figure(3, figsize = ( 16, 6 ) )
plt.subplot(121)
NDFP(VCM_10_90, labels = ["m0 10 p %.2f"%P for P in probs], T=[".-" for p in probs ], linlog = True, lin = True, fig = False )
plt.subplot(122)
NDFP(VCM_100_90, labels = ["m0 100 p %.2f"%P for P in probs], T=[".-" for p in probs ], linlog = True, lin = True, fig = False )
plt.show()

In [ ]:
AVDPAPLP(VCM_10_10, labels = ["0.1 m0 10 p %.2f"%P for P in probs], T=["-" for p in probs ] )
AVDPAPLP(VCM_100_10, labels = ["0.1 m0 100 p %.2f"%P for P in probs], T=["-" for p in probs ] )
AVDPAPLP(VCM_10_50, labels = ["0.5 m0 10 p %.2f"%P for P in probs], T=["-" for p in probs ] )
AVDPAPLP(VCM_100_50, labels = ["0.5 m0 100 p %.2f"%P for P in probs], T=["-" for p in probs ] )
AVDPAPLP(VCM_10_90, labels = ["0.9 m0 10 p %.2f"%P for P in probs], T=["-" for p in probs ] )
AVDPAPLP(VCM_100_90, labels = ["0.9 m0 100 p %.2f"%P for P in probs], T=["-" for p in probs ] )

The connectivity of the initial graph affects the clustering coefficient


In [ ]:
plt.figure(1, figsize = ( 16, 6 ) )
plt.subplot(121)
ACCP(VCM_10_10, labels = ["0.1 m0 10 p %.2f"%P for P in probs], T=["-" for p in probs ], lin = True, fig = False )
plt.subplot(122)
ACCP(VCM_100_10, labels = ["0.1 m0 100 p %.2f"%P for P in probs], T=["-" for p in probs ], lin = True, fig = False )
plt.show()
plt.figure(2, figsize = ( 16, 6 ) )
plt.subplot(121)
ACCP(VCM_10_50, labels = ["0.5 m0 10 p %.2f"%P for P in probs], T=["-" for p in probs ], lin = True, fig = False )
plt.subplot(122)
ACCP(VCM_100_50, labels = ["0.5 m0 100 p %.2f"%P for P in probs], T=["-" for p in probs ], lin = True, fig = False )
plt.show()
plt.figure(3, figsize = ( 16, 6 ) )
plt.subplot(121)
ACCP(VCM_10_90, labels = ["0.9 m0 10 p %.2f"%P for P in probs], T=["-" for p in probs ], lin = True, fig = False )
plt.subplot(122)
ACCP(VCM_100_90, labels = ["0.9 m0 100 p %.2f"%P for P in probs], T=["-" for p in probs ], lin = True, fig = False )
plt.show()

The conclusion:

  • the connectivity of the initial graph, measured by the number of edges, is very inportatnt for the vertex copying model:
  • the probability of edge preservation $q$ shifts the degreee probability density and it concentrates around approximately $q n$;
  • accordingly the higher the probability of copying $q$ the more the average degree, the lower the path length and the higher the clustering coefficient.