In [ ]:
%matplotlib inline
In [ ]:
import numpy as np
from scipy import sparse, spatial
import pandas as pd
import matplotlib.pyplot as plt
from pygsp import graphs, filters, plotting
In [ ]:
plt.rcParams['figure.figsize'] = (17, 5)
plotting.BACKEND = 'matplotlib'
If you get an import error about the PyGSP, install it with pip install pygsp
or conda install -c conda-forge pygsp
.
For this assignment, we'll again use the Free Music Archive dataset, as for the third. This time, I'm giving you all the data. Let's load the musical genres.
In [ ]:
# Ground truth genres.
tracks = pd.read_csv('../data/fma_tracks.csv', index_col=0, squeeze=True)
# Complete missing genres.
tracks[:10] = [21, 21, 27, 12, 31, 89, 36, 25, 21, 12]
# Convert to top-level genres.
tracks = tracks.apply(lambda gid: 21 if gid in [21, 83, 100, 539, 542, 811] else 12)
assert set(tracks.unique()) == {12, 21}
Given a data matrix $\mathbf{X} = [\mathbf{x}_1, \ldots, \mathbf{x}_N]^\intercal \in \mathbb{R}^{N \times d}$, where each $\mathbf{x} \in \mathbb{R}^d$ of the $N=2000$ row is a sample for which we have $d$ features, we constructed in the last assigment a similarity graph defined as $$\mathbf{W}[i,j] = \exp(-d^2(\mathbf{x}_i - \mathbf{x}_j) / \sigma^2).$$
We construct below a PyGSP graph object with the edge weights from the last assignment.
In [ ]:
weights = sparse.load_npz('../data/fma_graph.npz')
G = graphs.Graph(weights, gtype='FMA genres')
del weights
With the PyGSP, compute the normalized graph Laplacian, defined as $$\mathbf{L} = \mathbf{I} - \mathbf{D}^{-1/2} \mathbf{W} \mathbf{D}^{-1/2},$$ where $\mathbf{D} \in \mathbb{R}^{N \times N}$ is the diagonal degree matrix and $\mathbf{I}$ is the identity matrix.
Hints:
In [ ]:
G.compute_laplacian('normalized')
In [ ]:
G.compute_fourier_basis(recompute=True)
The vector of eigenvalues are then available at G.e
, and the Fourier basis, i.e., the eigenvectors, at G.U
.
In [ ]:
plt.plot(G.e);
From the above plot, can you infer if the Fourier basis was computed from a combinatorial or normalized Laplacian?
Your answer here. If the largest eigenvalue is larger than 2, then we know for sure it was computed from a combinatorial Laplacian. If it's smaller than 2, then it is probably a normalized Laplacian, but we cannot be sure.
In [ ]:
G.set_coordinates(G.U[:, 1:3])
G.plot(vertex_size=50)
We can also visualize signals, like the genres.
In [ ]:
G.plot_signal(tracks, vertex_size=20)
Or the eigenvectors / Fourier modes.
Hint:
In [ ]:
G.plot_signal(G.U[:, 5], vertex_size=50)
We define below the low-pass filter $$\hat{g}(x) = \exp \left( \frac{-\tau x}{\lambda_{\text{max}}} \right).$$ Its frequency response is depicted.
The vertical bars in the plot represents the eigenvalues of the graph. While the filter is continuous, it is only evaluated at those eigenvalues when filtering a graph signal.
In [ ]:
f = filters.Heat(G, tau=10)
f.plot()
To observe how our kernel looks like in the node domain, we would like to localize it on node $i$. Given the Fourier basis $\mathbf{U}$, the filter kernel $\hat{g}(\lambda)$, and the diagonal matrix of eigenvalues $\mathbf\Lambda$, what is the expression of the localized kernel $T_i g \in \mathbb{R}^N$?
Your answer here. The signal is given by $$T_i g = \mathbf{U} \hat{g}(\mathbf{\Lambda}) \mathbf{U}^\intercal \delta_i.$$
Now localize it at node $i=300$ and observe the result.
Hints:
f.evaluate()
.f.filter()
.
In [ ]:
NODE = 300
s = G.U @ np.diag(f.evaluate(G.e).squeeze()) @ G.U.T[:, NODE]
# Alternative:
# s = np.zeros(G.N)
# s[NODE] = 1
# s = G.U @ np.diag(f.evaluate(G.e).squeeze()) @ G.U.T @ s
# Alternative:
# s = np.zeros(G.N)
# s[NODE] = 1
# s = f.filter(s)
# Alternative:
# s = f.localize(NODE)
G.plot_signal(s, vertex_size=50, highlight=NODE)
Intuitively, should the plotted graph signal be smooth?
Your answer here. Yes, because the energy of the signal is concentrated in the low frequencies.
Confirm your intuition by looking at the signal s
in the Fourier domain. Compare with the graph Fourier transform (GFT) of a delta signal
$$\delta_i[j] =
\begin{cases}
1 & \text{if } j = i, \\
0 & \text{otherwise.}
\end{cases}$$
Hints:
G.gft()
.
In [ ]:
s0 = np.zeros(G.N)
s0[NODE] = 1
plt.plot(G.e, G.gft(s0));
plt.plot(G.e, G.gft(s));
In this problem, we'll consider having the labels for some percentage of our $N=2000$, but missing the rest. Those of you who have done some Machine Learning (ML) will surely recognize here a supervised learning problem and will say: yeah, let's train a classifier on the training data, and predict the missing labels! While they would not be wrong in doing so, they would not use the unlabeled data at all for training. The setup where test cases are known a-priori is called transductive learning. So let's exploit the structure of the data domain with a similarity graph!
Define $\mathbf{y} \in \mathbb{R}^N$ such as $$\mathbf{y}[i] = \begin{cases} 1 &\text{if the genre of track } i \text{ is Rock}, \\ -1 &\text{if the genre of track } i \text{ is Hip-Hop}, \\ 0 &\text{if the genre of track } i \text{ is unknown}, \\ \end{cases} $$
and the diagonal masking matrix $\mathbf{M} \in \mathbb{R}^{N \times N}$, which indicates the observations, such as
$$\mathbf{M}[i,i] = \begin{cases} 1 &\text{if the genre of track } i \text{ is known}, \\ 0 &\text{if the genre of track } i \text{ is unknown}. \\ \end{cases} $$
In [ ]:
# Ground truth.
x = np.ones(G.N)
x[tracks == 21] = -1
def prepare_observations(p):
"""Prepare observations, where p is the percentage of values to keep."""
rs = np.random.RandomState(42)
M = np.diag(rs.uniform(size=G.N) < p)
return M.dot(x)
# Play with the percentage of observed values.
y = prepare_observations(p=0.1)
plt.hist(y);
The problem we then want to solve is $$\mathbf{x}^* = \operatorname*{arg\,min}_{\mathbf{x} \in \mathbb{R}^N} \|\mathbf{y} - \mathbf{Mx}\|_2^2 + \alpha \mathbf{x}^\intercal \mathbf{L} \mathbf{x},$$ where $\alpha$ is an hyper-parameter which controls the trade-off between the data fidelity term and the smoothness prior.
What is the solution of this problem?
Your answer here. The optimal solution (by putting the derivative to zero) is given by: $$\begin{align} 2 \mathbf{M}^\intercal (\mathbf{Mx}^\ast - \mathbf{y}) + \alpha (\mathbf{L x}^\ast + \mathbf{L}^\intercal \mathbf{x}^\ast) &= 0 \\ \mathbf{Mx}^\ast - \mathbf{My} + \alpha \mathbf{L x}^\ast &= 0 \\ \mathbf{x}^\ast &= (\mathbf{M} + \alpha \mathbf{L})^{-1} \mathbf{y} \end{align}$$ Note that $\mathbf{My} = \mathbf{y}$ by definition of $\mathbf{y}$ and $\mathbf{M}$.
Implement it below.
Hints:
np.linalg.solve()
.
In [ ]:
def solve(y, alpha):
"""
Solve the optimization problem.
Parameters:
y: the observations
alpha: the balance between fidelity and smoothness prior.
Returns:
x_pred: the predicted class
x_star: the solution of the optimization problem
"""
M = np.diag(y != 0)
# Reconstruct.
x_star = np.linalg.solve(M + alpha * G.L, y)
# Binarize.
x_pred = np.ones(G.N)
x_pred[x_star < 0] = -1
return x_pred, x_star
x_pred, x_star = solve(y, alpha=1)
# Be sure that the prediction is binary.
np.testing.assert_equal(abs(x_pred), 1)
# Error rate.
err = np.count_nonzero(x - x_pred)
print('{} errors ({:.2%})'.format(err, err/G.N))
Let's visualize the various graph signals:
x
.y
.x_star
.x_pred
.
In [ ]:
G.plot_signal(x, vertex_size=20)
G.plot_signal(y, vertex_size=20)
G.plot_signal(x_star, vertex_size=20)
G.plot_signal(x_pred, vertex_size=20)
Compute the prediction accuracy as a function of $p \approx \frac{n}{N}$.
In [ ]:
alpha = 0.1
for p in [0.1, 0.3, 0.5, 0.7, 0.9]:
y = prepare_observations(p)
x_pred, _ = solve(y, alpha)
err = np.count_nonzero(x - x_pred)
print('{}: {:6.2%}'.format(p, err/G.N))
In this assignment, you hopefully got some intuitions about the graph Fourier transform (GFT), and have an idea on how we can leverage graphs to exploit geometry in the data. Moreover, we saw that the PyGSP can considerably ease Graph Signal Processing (GSP)!
If you feel confident about predicting genres on the FMA, consider participating to our genre recognition challenge to label 35,000 tracks! I will surely have a semester or master project to offer to those who do well. :)