Graph Theory Introduction



In [1]:

    
import networkx as nx
import matplotlib.pyplot as plt
from scipy import sparse
from IPython.display import Image, display
%matplotlib inline

G = nx.house_graph()
pos = nx.spring_layout(G)
nx.draw(G, pos=pos)

Definition

Mathematical definition:

A graph is is an ordered pair $G = (V,E)$ of a set $V$ of vertices or nodes and a set $E \subseteq [V]^2$ of edges or lines. The elements of $E$ are therefore 2-element subsets of $V$, representing a relation between the two elemnts of $V$.

Terminology:
- graph = network = web
- vertex = node = point = site = junction: $i = v_i$
- edge = line = link = arc = tie: $(i,j) = e$

Examples:

graph	nodes	edges
internet	computers	network connections
power grid	power plants, consumers	power cables
transportation network	locations	routes
social network	individuals	relationship
citation network	scientific paper	citation
neural network	neuron	synapses
food web	functional groups	feeding relationships



In [2]:

    
display(Image(filename='pictures/air_net_light.jpg'))
display(Image(filename='pictures/europ_grid.jpg'))
display(Image(filename='pictures/ice_net.jpg'))
display(Image(filename='pictures/leaf_net.jpg'))
display(Image(filename='pictures/mold_net.png'))
display(Image(filename='pictures/mouse_brain.jpg'))

Basic graph types

directed $\leftrightarrow$ undirected
weighted $\leftrightarrow$ unweighted
multigraph $\leftrightarrow$ simple graph

Representations



In [4]:

    
nx.draw_networkx(G, pos=pos, node_color='black', font_color='white')
dump = nx.draw_networkx_edge_labels(G, pos=pos, edge_labels=dict(zip(G.edges(), G.edges())))

Adjacency matrix

$$A_{ij} := \left\{ \begin{array}{ll} 1 & : \text{node }i \text{ and } j \text{ are connected}\\ 0 & : \text{else} \end{array} \right. \quad i, j \in V$$



In [5]:

    
A = nx.adjacency_matrix(G)
print(A)









    



[[ 0.  1.  1.  0.  0.]
 [ 1.  0.  0.  1.  0.]
 [ 1.  0.  0.  1.  1.]
 [ 0.  1.  1.  0.  1.]
 [ 0.  0.  1.  1.  0.]]

Incidence matrix

$$B_{ei} := \left\{ \begin{array}{ll} 1 & : \text{edge } e \text{ connects to node } j\\\ 0 & : \text{else} \end{array} \right. \quad e \in E, i \in V$$



In [7]:

    
B = nx.incidence_matrix(G).astype(int)
print(B)









    



[[1 1 0 0 0 0]
 [1 0 1 0 0 0]
 [0 1 0 1 1 0]
 [0 0 1 1 0 1]
 [0 0 0 0 1 1]]

Edgelist



In [8]:

    
edgelist = G.edges()
print(edgelist)









    



[(0, 1), (0, 2), (1, 3), (2, 3), (2, 4), (3, 4)]

Basic graph properties

Order (number of nodes)

$$N := |V|$$



In [11]:

    
N = G.order() #G.number_of_nodes()
print(N)

Size (number of edges)

$$M := |E|$$



In [12]:

    
M = G.size() #G.number_of_edges()
print(M)

Degree of a node

$$d_i := \sum_{(i,j)\in E} 1$$

$\rightarrow$ number of edges connected to a node



In [ ]:

    
degrees = G.degree()
print degrees

Clustering coeficient of a node

$$c_u=\frac{\text{Triangles containing }u}{d_u(d_u-1)/2}$$



In [38]:

    
nx.clustering(G)









    Out[38]:





{0: 0.0, 1: 0.0, 2: 0.3333333333333333, 3: 0.3333333333333333, 4: 1.0}



In [39]:

    
nx.draw(G)

Degree distribution

$$P(d) := \frac{d_i}{N} \\ N_d : \text{number of nodes with degree } d$$



In [13]:

    
N_d = np.array(nx.degree_histogram(G), dtype=float)
print(N_d / N)









    



[ 0.   0.   0.6  0.4]

Shortest paths

A path from node $v_1$ to node $v_n$ is a sequence of pairwise connected nodes $v_i \in V$, so that $(v_i, v_{i+1}) \in E$: $$p(v_1,v_n) = \{v_1, v_2, \dots, v_n\}$$

The length of a path in an unweighted graph is the number of edges it contains: $$l_{p(v_i, v_j)} = |p(v_i, v_j)| - 1$$

distance between two nodes $u$ and $v$ is: $$d(u,v)=\min_{\text{paths p between u and v}} l_p$$



In [26]:

    
p = nx.shortest_path(G, source=0, target=3)
print(p)









    



[0, 1, 3]



In [27]:

    
print(nx.shortest_path_length(G, source=0, target=3))

Centralities

measure of relative importance for nodes or edges

closeness centrality

$$C_C(v_i) = \frac{N-1}{\sum_{j=0}^N d(v_i,v_j)}$$

$\rightarrow$ information on how close a node is to all other nodes



In [19]:

    
CC = nx.closeness_centrality(G)
print(CC)









    



{0: 0.6666666666666666, 1: 0.6666666666666666, 2: 0.8, 3: 0.8, 4: 0.6666666666666666}



In [20]:

    
nx.draw(G)

betweenness centrality

$$C_B(v_i) = \sum_{k,l \in V}_{k \neq l \neq i} \frac{\sigma_{kl}(v_i)}{\sigma_{kl}} \\ \sigma_{kl} : \text{ number of shortest paths } p(v_k, v_l) \\ \sigma_{kl}(v_i) : \text{ those passing through } v_i $$

$\rightarrow$ information on how often a node is "in-between" all other nodes



In [21]:

    
BC = nx.betweenness_centrality(G)
print(BC)









    



{0: 0.08333333333333333, 1: 0.08333333333333333, 2: 0.25, 3: 0.25, 4: 0.0}

Global graph properties

Diameter:

$$\max_{u,v\in V} d(u,v)$$



In [34]:

    
nx.diameter(G)









    Out[34]:





2

Radius:

$$\min_{v} \max_{u\neq v} d_{u,v}$$



In [35]:

    
nx.radius(G)









    Out[35]:





2

Random graphs

There exists many models for producing random graphs, each achieving certain criteria. Here we list two.

A. Erdös-Renyí model:

Start with $n$ nodes and no edges.
For each vertex pair $u,v \in V$, add an edge betwen them with a fixed probability $p$.

properties:

Average number of edges: $$\frac{N(N-1)}{2}$$
Degree distribution: binomial

B. Barabási–Albert model:

Start with $m$ nodes and no edges.
Add in sequence, $n-m$ nodes. While adding a node $v$, create an edge with pre-existing node $u$ with probability $$\frac{d_u}{|V|}$$

properties:

Degree distribution: power law
Average path length: $<d(u,v)> \approx |V|$