Network analysis with networkx

  • What is network analysis?

  • What is networkx?

A simple example

Consider the social network shown in the figure below.


In [3]:
# load image

Let's create a graph in networkx to represent that social network.


In [4]:
import networkx as nx
G = nx.Graph() # create a Graph object

people = ["Michael", "Jane", "John", "Peter", "Angie"]

for person in people:
    G.add_node(person)

connections = [("Michael", "Jane"), ("Angie", "Michael"),
              ("John", "Peter"), ("Jane", "John"), ("Angie", "Peter"),
              ("Angie", "John"), ("Jane", "Angie")]

for person_a, person_b in connections:
    G.add_edge(person_a, person_b)

Let's draw the graph.


In [5]:
nx.draw(G, with_labels = True, font_size = 24, node_size = 0)


Loading a graph from a file

Imagine a case where we want to load a much bigger graph that that. It would be a lot more convenient to have the graph stored in a file and load it whenever we need it.

We have stored a larger social network in file "social_network_1.txt". Let's load it and see what it looks like.


In [21]:
%%bash
filename="social_network_1.txt"
head -15 $filename
echo Total lines: `cat $filename | wc -l`


0 65
0 71
0 37
0 73
0 61
1 31
2 92
2 76
3 11
3 91
3 89
3 27
3 92
3 62
4 64
Total lines: 134

In [9]:
g = nx.read_weighted_edgelist("social_network_1.txt")

In [17]:
fig, ax = plt.subplots(figsize = (8, 8))
nx.draw(g, node_color = "black", node_size = 15)



In [32]:
# More complex example: edge attributes, node attributes

Drawing graphs

Idea: demonstrate drawing examples that will be used later.

  • Customized Nodes
  • Customized Edges
  • Customized Layout

Let's use the graph from the previous example and see how we can customize its plot.

Simple operations on graphs

Example: Friendship network. Edge weights indicate messages exchanged. Nodes associated with name of workplace/department and position (clerk / manager).


In [23]:
g = nx.read_weighted_edgelist("social_network_1.txt") # use other network here
  • How many nodes? How many edges?

In [25]:
print("Number of nodes: ", len(g))
print("Number of nodes: ", g.number_of_nodes())


Number of nodes:  92
Number of edges:  92

In [26]:
print("Number of nodes: ", len(g.nodes()))


Number of nodes:  92

In [27]:
print("Nodes: ", g.nodes())


Nodes:  ['96', '87', '93', '34', '84', '41', '50', '4', '56', '22', '52', '18', '95', '16', '68', '83', '78', '40', '94', '46', '7', '73', '72', '42', '77', '76', '69', '27', '54', '97', '89', '48', '11', '9', '14', '98', '1', '79', '43', '51', '90', '71', '57', '65', '29', '47', '3', '86', '53', '13', '32', '64', '59', '26', '45', '0', '10', '37', '82', '35', '91', '2', '99', '58', '66', '70', '38', '81', '15', '55', '63', '33', '60', '39', '24', '36', '62', '19', '28', '23', '75', '67', '31', '92', '61', '44', '49', '25', '5', '6', '74', '80']

In [28]:
print("Number of edges: ", g.number_of_edges())


Number of edges:  134

In [29]:
print("Number of edges: ", len(g.edges()))


Number of edges:  134

In [31]:
print("Edges: ", g.edges())


Edges:  [('96', '50'), ('87', '46'), ('93', '10'), ('93', '33'), ('34', '35'), ('34', '70'), ('84', '60'), ('41', '64'), ('41', '18'), ('41', '19'), ('50', '95'), ('50', '54'), ('50', '38'), ('50', '61'), ('50', '83'), ('50', '19'), ('4', '64'), ('4', '53'), ('4', '7'), ('56', '58'), ('56', '71'), ('56', '70'), ('22', '49'), ('22', '29'), ('22', '39'), ('22', '9'), ('52', '45'), ('52', '44'), ('18', '78'), ('95', '68'), ('16', '92'), ('16', '40'), ('16', '19'), ('68', '10'), ('68', '92'), ('68', '25'), ('83', '15'), ('78', '24'), ('78', '77'), ('40', '77'), ('40', '44'), ('40', '51'), ('94', '24'), ('94', '81'), ('94', '71'), ('46', '99'), ('46', '54'), ('7', '32'), ('73', '90'), ('73', '0'), ('72', '11'), ('42', '31'), ('42', '36'), ('42', '86'), ('42', '5'), ('77', '31'), ('77', '27'), ('76', '2'), ('76', '70'), ('69', '86'), ('69', '44'), ('27', '33'), ('27', '13'), ('27', '47'), ('27', '3'), ('97', '47'), ('97', '57'), ('89', '26'), ('89', '3'), ('48', '25'), ('48', '74'), ('11', '53'), ('11', '3'), ('11', '24'), ('11', '6'), ('11', '57'), ('14', '47'), ('14', '36'), ('98', '62'), ('98', '35'), ('1', '31'), ('79', '39'), ('79', '92'), ('43', '37'), ('43', '86'), ('90', '92'), ('71', '0'), ('57', '36'), ('65', '0'), ('65', '74'), ('29', '13'), ('29', '36'), ('47', '5'), ('3', '91'), ('3', '92'), ('3', '62'), ('86', '62'), ('53', '70'), ('13', '32'), ('32', '33'), ('32', '45'), ('59', '99'), ('59', '19'), ('26', '99'), ('26', '45'), ('26', '60'), ('45', '15'), ('45', '74'), ('0', '37'), ('0', '61'), ('10', '49'), ('10', '82'), ('37', '66'), ('2', '92'), ('58', '28'), ('58', '63'), ('66', '23'), ('66', '74'), ('81', '28'), ('81', '44'), ('15', '67'), ('55', '63'), ('55', '74'), ('55', '67'), ('63', '25'), ('60', '61'), ('60', '44'), ('39', '75'), ('19', '80'), ('28', '92'), ('28', '67'), ('31', '6'), ('44', '25'), ('5', '74')]
  • How many nodes with attribute X = x?

  • What is the most common value for attribute X?

  • ...

  • What is the total number of messages exchanged in the network?
  • What is the total & average number of messages between people with attribute X?
  • ...

  • What is the most active node (in terms of people / messages)?

  • ...

More complex graphs

  • Directed graphs -- example
  • Multi-graphs -- example

More advanced - Traversal, Distance, Centrality

  • Path lengths
  • Radius & Diameter
  • Traversals
  • Random Walk & Cascade setting
  • Centrality measures

Setup

Run these cells before other code cells.


In [16]:
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt

In [2]:
%matplotlib inline

In [42]:
nx.read_weighted_edgelist?

In [51]:
g = nx.fast_gnp_random_graph(100, 0.03)
nx.write_edgelist(g, "social_network_1.txt", data = False)

In [ ]: