Graph Theory Introduction


In [1]:
import networkx as nx
import matplotlib.pyplot as plt
from scipy import sparse
from IPython.display import Image, display
%matplotlib inline

G = nx.house_graph()
pos = nx.spring_layout(G)
nx.draw(G, pos=pos)


Definition

  • Mathematical definition:

    A graph is is an ordered pair $G = (V,E)$ of a set $V$ of vertices or nodes and a set $E \subseteq [V]^2$ of edges or lines. The elements of $E$ are therefore 2-element subsets of $V$, representing a relation between the two elemnts of $V$.

  • Terminology:
    • graph = network = web
    • vertex = node = point = site = junction: $i = v_i$
    • edge = line = link = arc = tie: $(i,j) = e$
  • Examples:

    graph nodes edges
    internet computers network connections
    power grid power plants, consumers power cables
    transportation network locations routes
    social network individuals relationship
    citation network scientific paper citation
    neural network neuron synapses
    food web functional groups feeding relationships

In [2]:
display(Image(filename='pictures/air_net_light.jpg'))
display(Image(filename='pictures/europ_grid.jpg'))
display(Image(filename='pictures/ice_net.jpg'))
display(Image(filename='pictures/leaf_net.jpg'))
display(Image(filename='pictures/mold_net.png'))
display(Image(filename='pictures/mouse_brain.jpg'))


Basic graph types

  • directed $\leftrightarrow$ undirected
  • weighted $\leftrightarrow$ unweighted
  • multigraph $\leftrightarrow$ simple graph

Representations


In [4]:
nx.draw_networkx(G, pos=pos, node_color='black', font_color='white')
dump = nx.draw_networkx_edge_labels(G, pos=pos, edge_labels=dict(zip(G.edges(), G.edges())))


Adjacency matrix

$$A_{ij} := \left\{ \begin{array}{ll} 1 & : \text{node }i \text{ and } j \text{ are connected}\\ 0 & : \text{else} \end{array} \right. \quad i, j \in V$$

In [5]:
A = nx.adjacency_matrix(G)
print(A)


[[ 0.  1.  1.  0.  0.]
 [ 1.  0.  0.  1.  0.]
 [ 1.  0.  0.  1.  1.]
 [ 0.  1.  1.  0.  1.]
 [ 0.  0.  1.  1.  0.]]

Incidence matrix

$$B_{ei} := \left\{ \begin{array}{ll} 1 & : \text{edge } e \text{ connects to node } j\\\ 0 & : \text{else} \end{array} \right. \quad e \in E, i \in V$$


In [7]:
B = nx.incidence_matrix(G).astype(int)
print(B)


[[1 1 0 0 0 0]
 [1 0 1 0 0 0]
 [0 1 0 1 1 0]
 [0 0 1 1 0 1]
 [0 0 0 0 1 1]]

Edgelist


In [8]:
edgelist = G.edges()
print(edgelist)


[(0, 1), (0, 2), (1, 3), (2, 3), (2, 4), (3, 4)]

Basic graph properties

Order (number of nodes)

$$N := |V|$$


In [11]:
N = G.order() #G.number_of_nodes()
print(N)


5

Size (number of edges)

$$M := |E|$$

In [12]:
M = G.size() #G.number_of_edges()
print(M)


6

Degree of a node

$$d_i := \sum_{(i,j)\in E} 1$$

$\rightarrow$ number of edges connected to a node


In [ ]:
degrees = G.degree()
print degrees

Clustering coeficient of a node

$$c_u=\frac{\text{Triangles containing }u}{d_u(d_u-1)/2}$$

In [38]:
nx.clustering(G)


Out[38]:
{0: 0.0, 1: 0.0, 2: 0.3333333333333333, 3: 0.3333333333333333, 4: 1.0}

In [39]:
nx.draw(G)


Degree distribution

$$P(d) := \frac{d_i}{N} \\ N_d : \text{number of nodes with degree } d$$

In [13]:
N_d = np.array(nx.degree_histogram(G), dtype=float)
print(N_d / N)


[ 0.   0.   0.6  0.4]

Shortest paths

A path from node $v_1$ to node $v_n$ is a sequence of pairwise connected nodes $v_i \in V$, so that $(v_i, v_{i+1}) \in E$: $$p(v_1,v_n) = \{v_1, v_2, \dots, v_n\}$$

The length of a path in an unweighted graph is the number of edges it contains: $$l_{p(v_i, v_j)} = |p(v_i, v_j)| - 1$$

distance between two nodes $u$ and $v$ is: $$d(u,v)=\min_{\text{paths p between u and v}} l_p$$


In [26]:
p = nx.shortest_path(G, source=0, target=3)
print(p)


[0, 1, 3]

In [27]:
print(nx.shortest_path_length(G, source=0, target=3))


2

Centralities

  • measure of relative importance for nodes or edges

closeness centrality

$$C_C(v_i) = \frac{N-1}{\sum_{j=0}^N d(v_i,v_j)}$$

$\rightarrow$ information on how close a node is to all other nodes


In [19]:
CC = nx.closeness_centrality(G)
print(CC)


{0: 0.6666666666666666, 1: 0.6666666666666666, 2: 0.8, 3: 0.8, 4: 0.6666666666666666}

In [20]:
nx.draw(G)


betweenness centrality

$$C_B(v_i) = \sum_{k,l \in V}_{k \neq l \neq i} \frac{\sigma_{kl}(v_i)}{\sigma_{kl}} \\ \sigma_{kl} : \text{ number of shortest paths } p(v_k, v_l) \\ \sigma_{kl}(v_i) : \text{ those passing through } v_i $$

$\rightarrow$ information on how often a node is "in-between" all other nodes


In [21]:
BC = nx.betweenness_centrality(G)
print(BC)


{0: 0.08333333333333333, 1: 0.08333333333333333, 2: 0.25, 3: 0.25, 4: 0.0}

Global graph properties

Diameter:

$$\max_{u,v\in V} d(u,v)$$

In [34]:
nx.diameter(G)


Out[34]:
2

Radius:

$$\min_{v} \max_{u\neq v} d_{u,v}$$

In [35]:
nx.radius(G)


Out[35]:
2

Random graphs

There exists many models for producing random graphs, each achieving certain criteria. Here we list two.

A. Erdös-Renyí model:

  1. Start with $n$ nodes and no edges.
  2. For each vertex pair $u,v \in V$, add an edge betwen them with a fixed probability $p$.

properties:

  1. Average number of edges: $$\frac{N(N-1)}{2}$$
  2. Degree distribution: binomial

B. Barabási–Albert model:

  1. Start with $m$ nodes and no edges.
  2. Add in sequence, $n-m$ nodes. While adding a node $v$, create an edge with pre-existing node $u$ with probability $$\frac{d_u}{|V|}$$

properties:

  1. Degree distribution: power law
  2. Average path length: $<d(u,v)> \approx |V|$