For this assignment we will work on a network representing the collaboration between scientists in the field of General Relativity and Quantum Cosmology. The network comes from SNAP and is described as follows:
Arxiv GR-QC (General Relativity and Quantum Cosmology) collaboration network is from the e-print arXiv and covers scientific collaborations between authors papers submitted to General Relativity and Quantum Cosmology category. If an author i co-authored a paper with author j, the graph contains a undirected edge from i to j. If the paper is co-authored by k authors this generates a completely connected (sub)graph on k nodes. The data covers papers in the period from January 1993 to April 2003 (124 months). It begins within a few months of the inception of the arXiv.
In [ ]:
%matplotlib inline
import os
import random
import matplotlib.pyplot as plt
import networkx as nx
import numpy as np
import warnings
warnings.filterwarnings('ignore')
In [ ]:
G = nx.read_edgelist(os.path.join('..', 'data', 'arxiv_collaboration_network.txt'))
In [ ]:
print('My network has {} nodes.'.format(len(G.nodes())))
print('My network has {} edges.'.format(G.size()))
1.1 In this part of the assignment, you have to create an Erdős–Rényi and Barabási-Albert graph using NetworkX, and compare them to the collaboration network. Try to simulate the original network as best as you can. When choosing parameters for the networks, take into account the number of vertices and edges of the original network. The number of vertices should be exactly the same. Comment on your choice of parameters.
In [ ]:
# Your code here.
Your answer here.
In [ ]:
print('My Erdős–Rényi network has {} nodes.'.format(len(er.nodes())))
print('My Erdős–Rényi network has {} edges.'.format(er.size()))
print('My Barabási-Albert network has {} nodes.'.format(len(ba.nodes())))
print('My Barabási-Albert network has {} edges.'.format(ba.size()))
1.2 Check the size of the largest connected component in each graph and compare them to the original network. In the Erdős–Rényi model, what should the probability of creating each edge be in order to have the same expected size of the largest component? Justify. Generate a graph with this parameter to check if you indeed get a similar value.
In [ ]:
giant_G = max(nx.connected_component_subgraphs(G), key=len)
giant_er = max(nx.connected_component_subgraphs(er), key=len)
giant_ba = max(nx.connected_component_subgraphs(ba), key=len)
print(len(giant_G.nodes()))
print(len(giant_er.nodes()))
print(len(giant_ba.nodes()))
Your answer here.
In [ ]:
# Your code here.
print('The parameter p for an Erdős–Rényi network with the same expected size of the giant component is {}.'.format(p_giant))
# Your code here.
print('The size of the component in a randomly generated network with this parameter is {}.'.format(len(er_giant.nodes())))
1.3 Look at the clustering coefficient of the original network. Is there a network model we talked about that could have a clustering coefficient that is close? Explain.
In [ ]:
nx.average_clustering(G)
Your answer here.
In this part of the assignment, you will have to create a random network from a predefined degree distribution. There are several network models which can create a random network with the exact same degree distribution as the original, or with the same expected distribution as the original. Refer to section 4.8 of the Barabási book for more information.
One of the most famous ones is the configuration model. The model for a graph with $L$ edges in total is constructed in the following steps:
Reminder: A stub is a half-link, representing the half of an edge. It contains one node and can be paired up with another stub to create an edge (between the two corresponding nodes).
2.1 However, this model allows for the creation of multi-links (multiple edges between the same pair of vertices) and self-loops, thus leading to a non-simple graph. In this assignment, you will implement a greedy configuration model, to avoid these problems.
The algorithm goes as follows:
Hints:
nx.empty_graph()
to create an empty graph.G.add_edge(node1,node2,weight = 1)
to add an edge to a weighted graph.G.edge[node1][node2]['weight'] += 1
to increment the weight of an edge by one.
In [ ]:
def greedy_configuration(degree_distribution):
# Your code here.
return G
In [ ]:
degree_distribution=sorted(nx.degree(G).values(),reverse=True) # degree distribution sorted from highest to lowest
gc = greedy_configuration(degree_distribution)
2.2 Verify that the networks have the same number of nodes. Plot the difference between the weighted degree distributions to verify that they are identical. If not, why?
In [ ]:
degree_sequence_gc=sorted(nx.degree(gc, weight = 'weight').values(),reverse=True) #weighted degree distribution
# Your code here.
Your answer here.
2.3 Should these two networks have the same adjacency matrices? Justify.
Your answer here.
2.4 Draw both the generated and original networks. Are they similar? If not, why? Try to explain.
In [ ]:
# Your code here.
Your answer here.
2.5 Do you expect the properties studied in the first part of the assignment to be close to the original graph? Justify.
Your answer here.