$\newcommand{\G}{\mathcal{G}}$ $\newcommand{\V}{\mathcal{V}}$ $\newcommand{\E}{\mathcal{E}}$ $\newcommand{\R}{\mathbb{R}}$
This notebook shows how to apply our graph ConvNet (paper & code), or any other, to your structured or unstructured data. For this example, we assume that we have $n$ samples $x_i \in \R^{d_x}$ arranged in a data matrix $$X = [x_1, ..., x_n]^T \in \R^{n \times d_x}.$$ Each sample $x_i$ is associated with a vector $y_i \in \R^{d_y}$ for a regression task or a label $y_i \in \{0,\ldots,C\}$ for a classification task.
From there, we'll structure our data with a graph $\G = (\V, \E, A)$ where $\V$ is the set of $d_x = |\V|$ vertices, $\E$ is the set of edges and $A \in \R^{d_x \times d_x}$ is the adjacency matrix. That matrix represents the weight of each edge, i.e. $A_{i,j}$ is the weight of the edge connecting $v_i \in \V$ to $v_j \in \V$. The weights of that feature graph thus represent pairwise relationships between features $i$ and $j$. We call that regime signal classification / regression, as the samples $x_i$ to be classified or regressed are graph signals.
Other modelling possibilities include:
In [ ]:
from lib import models, graph, coarsening, utils
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
In [ ]:
d = 100 # Dimensionality.
n = 10000 # Number of samples.
c = 5 # Number of feature communities.
# Data matrix, structured in communities (feature-wise).
X = np.random.normal(0, 1, (n, d)).astype(np.float32)
X += np.linspace(0, 1, c).repeat(d // c)
# Noisy non-linear target.
w = np.random.normal(0, .02, d)
t = X.dot(w) + np.random.normal(0, .001, n)
t = np.tanh(t)
plt.figure(figsize=(15, 5))
plt.plot(t, '.')
# Classification.
y = np.ones(t.shape, dtype=np.uint8)
y[t > t.mean() + 0.4 * t.std()] = 0
y[t < t.mean() - 0.4 * t.std()] = 2
print('Class imbalance: ', np.unique(y, return_counts=True)[1])
Then split this dataset into training, validation and testing sets.
In [ ]:
n_train = n // 2
n_val = n // 10
X_train = X[:n_train]
X_val = X[n_train:n_train+n_val]
X_test = X[n_train+n_val:]
y_train = y[:n_train]
y_val = y[n_train:n_train+n_val]
y_test = y[n_train+n_val:]
The second thing we need is a graph between features, i.e. an adjacency matrix $A \in \mathbb{R}^{d_x \times d_x}$. Structuring data with graphs is very flexible: it can accomodate both structured and unstructured data.
There are many ways, supervised or unsupervised, to construct a graph given some data. And better the graph, better the performance ! For this example we'll define the adjacency matrix as a simple similarity measure between features. Below are the choices one has to make when constructing such a graph.
In [ ]:
dist, idx = graph.distance_scipy_spatial(X_train.T, k=10, metric='euclidean')
A = graph.adjacency(dist, idx).astype(np.float32)
assert A.shape == (d, d)
print('d = |V| = {}, k|V| < |E| = {}'.format(d, A.nnz))
plt.spy(A, markersize=2, color='black');
To be able to pool graph signals, we need first to coarsen the graph, i.e. to find which vertices to group together. At the end we'll have multiple graphs, like a pyramid, each at one level of resolution. The finest graph is where the input data lies, the coarsest graph is where the data at the output of the graph convolutional layers lie. That data, of reduced spatial dimensionality, can then be fed to a fully connected layer.
The parameter here is the number of times to coarsen the graph. Each coarsening approximately reduces the size of the graph by a factor two. Thus if you want a pooling of size 4 in the first layer followed by a pooling of size 2 in the second, you'll need to coarsen $\log_2(4+2) = 3$ times.
After coarsening we rearrange the vertices (and add fake vertices) such that pooling a graph signal is analog to pooling a 1D signal. See the paper for details.
In [ ]:
graphs, perm = coarsening.coarsen(A, levels=3, self_connections=False)
X_train = coarsening.perm_data(X_train, perm)
X_val = coarsening.perm_data(X_val, perm)
X_test = coarsening.perm_data(X_test, perm)
We finally need to compute the graph Laplacian $L$ for each of our graphs (the original and the coarsened versions), defined by their adjacency matrices $A$. The sole parameter here is the type of Laplacian, e.g. the combinatorial Laplacian, the normalized Laplacian or the random walk Laplacian.
In [ ]:
L = [graph.laplacian(A, normalized=True) for A in graphs]
graph.plot_spectrum(L)
Here we apply the graph convolutional neural network to signals lying on graphs. After designing the architecture and setting the hyper-parameters, the model takes as inputs the data matrix $X$, the target $y$ and a list of graph Laplacians $L$, one per coarsening level.
The data, architecture and hyper-parameters are absolutely not engineered to showcase performance. Its sole purpose is to illustrate usage and functionality.
In [ ]:
params = dict()
params['dir_name'] = 'demo'
params['num_epochs'] = 40
params['batch_size'] = 100
params['eval_frequency'] = 200
# Building blocks.
params['filter'] = 'chebyshev5'
params['brelu'] = 'b1relu'
params['pool'] = 'apool1'
# Number of classes.
C = y.max() + 1
assert C == np.unique(y).size
# Architecture.
params['F'] = [32, 64] # Number of graph convolutional filters.
params['K'] = [20, 20] # Polynomial orders.
params['p'] = [4, 2] # Pooling sizes.
params['M'] = [512, C] # Output dimensionality of fully connected layers.
# Optimization.
params['regularization'] = 5e-4
params['dropout'] = 1
params['learning_rate'] = 1e-3
params['decay_rate'] = 0.95
params['momentum'] = 0.9
params['decay_steps'] = n_train / params['batch_size']
In [ ]:
model = models.cgcnn(L, **params)
accuracy, loss, t_step = model.fit(X_train, y_train, X_val, y_val)
We often want to monitor:
The model_perf
class in utils.py can be used to compactly evaluate multiple models.
In [ ]:
fig, ax1 = plt.subplots(figsize=(15, 5))
ax1.plot(accuracy, 'b.-')
ax1.set_ylabel('validation accuracy', color='b')
ax2 = ax1.twinx()
ax2.plot(loss, 'g.-')
ax2.set_ylabel('training loss', color='g')
plt.show()
In [ ]:
print('Time per step: {:.2f} ms'.format(t_step*1000))
In [ ]:
res = model.evaluate(X_test, y_test)
print(res[0])