DH3501: Advanced Social Networks

Class 6: Formal Definitions and the Jackson Text

Western University
Department of Modern Languages and Literatures
Digital Humanities – DH 3501

Instructor: David Brown
E-mail: dbrow52@uwo.ca
Office: AHB 1R14

So, what do you think of the Jackson text?

  • The Jackson text may seem a bit mathy or formal...but if you are going to study networks, you have to get used to a bit of math.

  • The most important thing is to take your time and remain calm when you read it:

Formal representation of networks

Graph theory uses a variety of symbols, including set-builder notation:

  • $\{i | i \in g\}$

Set of all i such that i is an element in g...

  • A set is an unordered collection of unique elements.
  • Python implements a set data structure.

How is a set different from a Python list???


In [1]:
set([1,2,3,4])


Out[1]:
{1, 2, 3, 4}

Some common symbols we will see in this class:

Set:

  • $\{1,...,n\}$

Matrix:

  • $g = \begin{bmatrix}0 & 1\\1 & 0\end{bmatrix}$

Membership:

  • $ij \in g$

Owns, has member:

  • $g \ni ij$

Set such that:

  • $\{x | x \in \mathbb{R} \}$
  • or
  • $\{x : x \in \mathbb{R} \}$

Subset, subset or equal set:

  • $g' \subset g$
  • $g' \subseteq g$

Superset, superset or equal set:

  • $g \supset g'$
  • $g \supseteq g'$

Union:

  • $a \cup b$

Intersect:

  • $a \cap b$

Anyone remember what the difference between a set union and a set intersect is?

Summation

Stats class anyone?

$\sum\limits_{i=1}^ni=1 + 2 + 3 + \dots + n$

  • Summation is the operation of adding a sequence of numbers.

Anyone freaking yet?

IT'S OK! We've still got Python to help us. Let's put all of this in terms we understand.

Coding challenge: Translation to Python

All of the above statements can be expressed in Python. Translate them!

For example:

# 1
N = set(range(1, 11))

Translate the following:

  1. N = $\{1,\dots,10\}$
  2. $5 \in N$
  3. $\{4 | 4 \in N \}$
  4. $\{1, 3, 4\} \subset N$
  5. $a = \{1, 2, 3\}$
  6. $b = \{3, 4, 5\}$
  7. $a \cap b$
  8. $a \cup b$

Hint: Ever looked at the Python docs for set?


In [2]:
# Translate here. Add as many new cells as you like.

Everyone knows how to do a summation in Python right?


In [3]:
sum(range(5))


Out[3]:
10

What if we need to apply an operation to each element in the summation first?

$\sum\limits_{i=1}^ni^2=1^2 + 2^2 + 3^2 + \dots + n^2$

Implement a one liner in Python that performs the above summation with n = 4.


In [4]:
# One liner here :)
sum(map(lambda x: x * x, range(5)))


Out[4]:
30

Ok down to business: Formal representations of graphs a la Jackson

pp. 41-44

Jackson defines a graph as a set $N = \{1,...,n\}$ of n nodes involved in a network of relationships defined as a real valued $n x n$ matrix $g$, where $g_{ij}$ represents the relation between $i$ and $j$.

  • $graph = (N,g)$
  • $N = \{1, 2, 3\}$
  • $g = \begin{bmatrix}0 & 1 & 0\\1 & 0 & 1\\0 & 1 & 0\end{bmatrix}$

A network is directed if if it is possible that:

  • $g_{ij} \ne g_{ji}$

Edges can also be represented as a set of tuples:

  • $g = \{(1,2), (2,3)\}$

A subset of edges:

  • $g' \subset g = \{ij | ij \in g' \} \subset \{ij | ij \in g\}$

The network created by deleting all edges except those that are between nodes in the set $S$ where $S \subset N$:

  • $[g|s]_{ij} = \begin{cases} 1\ \text{if}\ i \in S, j \in S, g_{ij} = 1,\ \text{else}\ \\ 0 \end{cases}$

The set of all undirected and unweighted networks on $N$:

  • $G(N)$

The set of nodes that have at least one link in the network $g$:

  • $N(g) = \{i|\exists j\ \text{s.t.}\ ij \in g,\ \text{or}\ ji \in g\}$

Can you translate that to set builder speak?

Isomorphism

Isomorphism is a mapping (bijection) that preserves edge structure.

Jackson p. 43 one line at a time:

  • $(N, g)$ and $(N', g')$ are isomorphic if there exists a bijection

  • $f : N \to N'$ such that

  • $ij \in g$ if and only if

  • $f(i)f(j) \in g'$

Paths and Cycles

Page 44-45

Definitions:

  • A walk is a sequence of links connecting a sequence of nodes.

  • A cycle is a walk that starts and ends at the same node, with all nodes appearing once except the starting node which also appears as the ending node.

  • A path is a walk where a node appears at most once in the sequence.

  • A geodesic between two nodes is a shortest path between them.

Formally...

A path in a network $g \in G(N)$ betgween nodes $i$ and $j$ is a sequence of links

  • $i_1,i_2,\dots,i_{K-1},i_K$ such that

  • $i_ki_{k+1} \in g$ for each

  • $k \in \{1,\dots,K-1\}$, with

  • $i_1 = i$ and $i_K = j$, and such that each node in the sequence

  • $i_1,\dots,i_K$ is distinct.

A walk in a network $g \in G(N)$ betgween nodes $i$ and $j$ is a sequence of links

  • $i_1,i_2,\dots,i_{K-1},i_K$ such that

  • $i_kI_{k+1} \in g$ for each

  • $k \in \{1,\dots,K-1\}$, with

  • $i_1 = i$ and $i_K = j$

WAIT! WHAT'S THE DIFFERENCE BETWEEN A PATH AND A WALK AGAIN?

Components and connectivity

Remember that a network is connected (path connected) if every two nodes in the network are connected by some path in the network.

  • $(N,g)$ is connected if for each $i \in N$ and $j \in N$ there exists a path in $(N,g)$ between $i$ and $j$

A component of a network $(N,g)$, is a nonempty subnetwork $(N',g')$ such that $\emptyset \ne N' \subset N, g' \subset g, $ and

  • $(N',g')$ is connected, and
  • if $i \in N'$ and $ij \in g,$ then $j \in N'$ and $ij \in g'$

Set of components of a network $(N,g)$ is denoted $C(N,g)$

Node neighborhoods

pp. 50-51

The neighborhood of a node i is the set of nodes that i is linked to.

  • $N_i(g)=\{j: g_{ij} = 1\}$

Given some set of nodes $S$, the neighborhood of $S$ is the union of the neighborhoods of its members:

  • $N_S(g) = \bigcup\limits_{i \in S} N_i(g) = \{j: \exists i \in S, g_{ij}=1\}$

We can also extend the neighborhood of a node, remember we tried to do this in class 4...

  • $N_i^k(g) = N_i(g) \cup \bigg( \bigcup\limits_{j \in N_i(g)} N_j^{k-1}(g) \bigg)$

Degree

pp. 51

The degree of node is the number of links that are connected to it.

A node $i$'s degree in a network $g$, denoted $d_i(g)$ is:

  • $d_i(g) = \# \{j: g_{ji} = 1 \} = \# N_i(g)$

Phew, that was exhausting

I know, let's play with some graphs!

Degree Distribution

Degree distribution is one of the fundamental ways to describe a graph. It describes the relative frequencies of the nodes that have different degrees in the graph.

  • Two famous degree distributions, resulting from two kinds of networks:
    • Scale free network
    • Random network
  • Lot's more on this later this term...

Look at the following examples. Is their anything about the structure of the two graphs that strikes you as different?


In [5]:
# Config environment for code examples.
%matplotlib inline
import networkx as nx
import matplotlib as plt
plt.rcParams['figure.figsize'] = 17, 12

In [6]:
g = nx.scale_free_graph(40).to_undirected()
nx.draw_networkx(g)



In [7]:
g = nx.gnp_random_graph(40, 0.1)
nx.draw_networkx(g)


Measures of centrality

pp. 61-65

Degree centrality - a normalized measure of a node's connectivity:

  • $\frac{d_i(g)}{n - 1}$

Closeness centrality - a the inverse of the average distance between $i$ and any other node:

  • $\frac{n-1}{\sum\limits_{j \ne i}\ell(i,j)}$ , where
  • $\ell(i,j)$ is the number of links in the shortest path between $i$ and $j$

Betweenness centrality - the proportion of how many shortest paths a node lies on averaged across all pairs of nodes in a network:

  • $Ce_i^B(g)= \sum\limits_{k \ne j: i \notin \{k,j\}} \frac{P_i(kj)/P(kj)}{(n-1)(n-2)/2}$

Coding challenge: Plotting degree distribution

Use NetworkX to generate a scale free graph and a gnp random graph, then plot the degree distribution of each graph...

HINTS:

  • Degree distribution plots the probability $P(k)$ that a node has the degree $k$
  • You calculate $P(k)$ by dividing the number of nodes in the network with degree $k$ by the total number of nodes $n$ in the graph.
    • $P(k) = \frac{\# nodes\ with\ degree\ k} {\# nodes}$
  • On the x axis of the graph you plot the degree, $k$
  • On the y-axis of the graph you plot the probability $P(k)$
  • Some useful functions:
nx.degree(graph)
plt.scatter(xvals, yvals)
plt.yscale("log")
plt.xscale("log")

In [8]:
def degree_distribution(g):
    """
    :param g: networkx.Graph
    :returns ?: whatever you need to plot the degree distribution.
    """
    pass

# Here you can either script or write a function to plot the degree distributions.
# Use as many cells as you need.