Problem set 3 (90 pts)

Important note: the template for your solution filename is Name_Surname_PS3.ipynb

For this problem set we do not run the bot, so try to debug your solutions with your own simple tests

Problem 1 (20 pts)

  • (5 pts) Prove that $\mathrm{vec}(AXB) = (B^\top \otimes A)\, \mathrm{vec}(X)$ if $\mathrm{vec}(X)$ is a columnwise reshape of a matrix into a long vector. What does it change if the reshape is rowwise?

Note:

  1. To make a columnwise reshape in Python one should use np.reshape(X, order='f'), where the string 'f' stands for the Fortran ordering.
  2. If $\mathrm{vec}(X)$ is a rowwise reshape, $$\mathrm{vec}(AXB)=(A \otimes B^\top) \mathrm{vec}(X).$$ $B_k$ is $k^{th}$ column of $B$ and $x_i$ - $i^{th}$ column of X $$ (AXB)_k = AXB_k = A \sum x_i b_ik = A \begin{bmatrix} x_1 & \dots & x_n \end{bmatrix} \begin{bmatrix} b_{1k} \\ b_{2k} \\ \vdots \\ b_{nk} \end{bmatrix} = \sum b_{ik}Ax_i = \begin{bmatrix} b_{1k}A & \dots & b_{nk}A \end{bmatrix} \begin{bmatrix} x_{1} \\ x_{2} \\ \vdots \\ x_{n} \end{bmatrix} = (b^T_k \otimes A) \mbox{vec}X $$ After stacking all $b_k^T$-s up we will get: $$ \mbox{vec}(AXB) = (B^T \otimes A) \mbox{vec} X $$ q.e.d
    If vec if rowwise (let it be vecr): $$ \mbox{vec}_r (AXB) = \mbox{vec}(B^T X^T A^T = (A \otimes B^T)\mbox{vec}X^T = (A \otimes B^T) \mbox{vec}_r X $$
  • (2 pts) What is the complexity of a naive computation of $(A \otimes B) x$? Show how it can be reduced.
    Let $A \in \mathbb{R}_{n \times m}$, $B \in \mathbb{R}_{k \times p}$, $x \in \mathbb{R}_{mp \times 1}$, $X \in \mathbb{R}_{p \times m}$
    Complexity of $(A \otimes B)x$ is $\mathcal{O}(nkmp) \sim \boxed{\mathcal{O}(N^4)}$
    Let $x$ = vec($X$), then: $$ (A \otimes B) \mbox{vec}(X) = \mbox{vec}(BXA^T) $$ Complexity of $\mbox{vec}(BXA^T)$ is $\mathcal{O}((p+n)km) \sim \boxed{\mathcal{O}(N^3)} $
  • (3 pts) Let matrices $A$ and $B$ have eigendecompositions $A = S_A\Lambda_A S_A^{-1}$ and $B = S_B\Lambda_B S^{-1}_B$. Find eigenvectors and eigenvalues of the matrix $A\otimes I + I \otimes B$.
    Let $W = S_A \otimes S_B$ and $W^{-1} = S_A^{-1} \otimes S_B^{-1}$, then: $$ W^{-1} (A \otimes I + I \otimes B) W = W^{-1} (A \otimes I) W + W^{-1} (I \otimes B) W = \\ = (S_A^{-1} \otimes S_B^{-1})(A \otimes I)(S_A \otimes S_B) + (S_A^{-1} \otimes S_B^{-1})(I \otimes B)(S_A \otimes S_B) = \\ = S_A^{-1} A S_A \otimes S^{-1}_B I S_B + S_A^{-1} I S_A \otimes S_B^{-1} B S_B = \\ = \Lambda_A \otimes I + I \otimes \Lambda_B = \begin{bmatrix} \lambda_1 I & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \lambda_n I \end{bmatrix} + \begin{bmatrix} \Lambda_B & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \Lambda_B \end{bmatrix}, \, \Lambda_B = \begin{bmatrix} \mu_1 & \dots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \dots & \mu_m \end{bmatrix} $$ Hence eigenvalues of $A \otimes I + I \otimes B$ are $\boxed{\lambda_i + \mu_j}$, $i = 1...n, j = 1...m$
    eigenvectors are columns of $\boxed{S_A \otimes S_B}$
  • (10 pts) Let $A = \mathrm{diag}\left(\frac{1}{1000},\frac{2}{1000},\dots \frac{999}{1000}, 1, 1000 \right)$. Estimate analytically the number of iterations required to solve linear system with $A$ with the relative accuracy $10^{-4}$ using
    • Richardson iteration with the optimal choice of parameter (use $2$-norm) $$ e_k \leq q^k e_0 \to \frac{e_k}{e_0} \leq q^k \\ q^k \geq \frac{\|x_k - x_* \|_2}{\|x_*\|_2}, x_0 = 0 \\ q^k \geq 10^{-4} \to k \mbox{lg} q \geq -4, q = \frac{\lambda_{max} - \lambda_{min}}{\lambda_{max} + \lambda_{min}} = \frac{1000 - 10^{-3}}{1000 + 10^{-3}} \\ k \geq \frac{-4}{\mbox{lg}q} \approx 4605170 \to \boxed{k \geq 4605170} $$
    • Chebyshev iteration (use $2$-norm)
$$ e_{k+1} = p(A) e_0, e_{k+1} \leq Cq^k e_0 \to Cq^k \geq 10^{-4} \\ \\ \to k \geq \frac{-4}{\mbox{lg}q}, \, q = \frac{\sqrt{\mbox{cond}(A)} - 1}{\sqrt{\mbox{cond}(A)} + 1} = \frac{999}{1001} \\ \\ \boxed{k \geq 4605} $$
    • Conjugate gradient method (use $A$-norm).
      In chapter 10 of book Golub G.H., Van Loan C.F. - Matrix Computation it is said that:
$$ \frac{\|x_k - x_* \|_A}{\| x_* \|_A} \leq 2 \Big( \frac{\sqrt{\mbox{cond}(A)} - 1}{\sqrt{\mbox{cond}(A)} + 1} \Big)^k $$$$ -4 \leq \mbox{lg}2 \Big( \frac{999}{1001} \Big)^k = \mbox{lg}2 + k \mbox{lg}\frac{999}{1001} $$$$ -4 - \mbox{lg}2 \leq k \mbox{lg}\frac{999}{1001} \to k \geq \frac{-4 \mbox{lg}2}{\mbox{lg}\frac{999}{1001}} $$$$ \boxed{k \geq 4951} $$

But we might get better analytical results if one tries to derive $k$ from the following formula: $$ \tau_k = \frac{\|x_k - x_* \|_A^2}{\| x_* \|^2_A} \leq \underset{deg(q) \leq k, q(0) = 1}{\mbox{inf}} \Big( \underset{i = 1...n}{\mbox{max}} q(\lambda_i)^2 \Big) $$ On each step there will be polynomial $q$ such that deg$q$ = n and all roots are eigenvalues of $A$, so $\tau_k$ = 0. Such polynomial will be available from $n_{th}$ step, so we need a number of iterations equal to the number of eigenvalues $A$ has or 1001, so $\boxed{k \geq 1001}$

Problem 2 (40 pts)

Spectral graph partitioning and inverse iteration

Given connected graph $G$ and its corresponding graph Laplacian matrix $L = D - A$ with eigenvalues $0=\lambda_1, \lambda_2, ..., \lambda_n$, where $D$ is its degree matrix and $A$ is its adjacency matrix, Fiedler vector is an eignevector correspondng to the second smallest eigenvalue $\lambda_2$ of $L$. Fiedler vector can be used for graph partitioning: positive values correspond to the one part of a graph and negative values to another.

Inverse power method (15 pts)

To find the Fiedler vector we will use the inverse iteration with adaptive shifts (Rayleigh quotient iteration).

  • (5 pts) Write down the orthoprojection matrix on the space orthogonal to the eigenvector of $L$, corresponding to the eigenvalue $0$ and prove (analytically) that it is indeed an orthoprojection.
    First of all, it is know that eigenvector corresponding to 0 eigenvalue is $e^T = [ 1 ... 1 ]$. Let's find orthoprojection matrix on the space orthogonal to the eigenvector $e$, let's consider it in the following setup. Let $U$ be the space perdendicular to $e$, then orthoprojection of arbitrary vector $a$ is: $$ a_{\bot} = a - a_{e} = I a - \frac{e e^T}{e^T e} a $$ Hence the orthoprojection matrix is: $$ \boxed{P = I - \frac{e e^T}{e^T e}} $$ Of course matrix is right because it was derived from geometrical considerations, but we can check it by taking any arbitrary vector $x^T$ = $[x_1 ... x_n]$: $$ Px = x - \frac{1}{n} \begin{bmatrix} \sum_i x_i \\ \vdots \\ \sum_i x_i \\ \end{bmatrix} = x - \frac{(x, e)}{(e,e)} e $$
  • (5 pts) Implement the spectral partitioning as the function partition:

In [7]:
import numpy as np
import scipy as sp
from scipy import sparse
from scipy.sparse import linalg
import networkx as nx
from networkx.linalg.algebraicconnectivity import fiedler_vector
import matplotlib.pyplot as plt

In [248]:
# INPUT:
# A - adjacency matrix (scipy.sparse.csr_matrix)
# num_iter_fix - number of iterations with fixed shift (int)
# shift - (float number)
# num_iter_adapt - number of iterations with adaptive shift (int) -- Rayleigh quotient iteration steps
# x0 - initial guess (1D numpy.ndarray)
# OUTPUT:
# x - normalized Fiedler vector (1D numpy.ndarray)
# eigs - eigenvalue estimations at each step (1D numpy.ndarray)
# eps - relative tolerance (float)
def operator_P(x):
    return x - np.ones_like(x) * np.sum(x) / x.shape[0]

def is_small(eigs, eps):
    if (abs(eigs[-2] - eigs[-1]) / eigs[-1] <= eps):
        return True

def partition(A, shift, num_iter_fix, num_iter_adapt, x0, eps):
    x_k = x0 / np.linalg.norm(x0)
    eigs = np.array([0])
    L, I = nx.laplacian_matrix(nx.from_scipy_sparse_matrix(A)), sp.sparse.identity(x_k.shape[0])
    
    for _ in range(num_iter_fix):
        x_k = operator_P(x_k)
        x_k = sp.sparse.linalg.spsolve(L - shift * I, x_k)
        x_k /= np.linalg.norm(x_k, 2)
        eigs = np.append(eigs, np.dot(x_k, L.dot(x_k)))
        
        if is_small(eigs, eps):
            return x_k, eigs
        
    for _ in range(num_iter_adapt):
        x_k = operator_P(x_k)
        x_k = sp.sparse.linalg.spsolve(L - np.dot(x_k, L.dot(x_k)) * I, x_k)
        x_k /= np.linalg.norm(x_k, 2)
        eigs = np.append(eigs, np.dot(x_k, L.dot(x_k)))
        
        if is_small(eigs, eps):
            return x_k, eigs            
        
    return x_k, eigs

Algorithm must halt before num_iter_fix + num_iter_adapt iterations if the following condition is satisfied $$ \boxed{\|\lambda_k - \lambda_{k-1}\|_2 / \|\lambda_k\|_2 \leq \varepsilon} \text{ at some step } k.$$

Do not forget to use the orthogonal projection from above in the iterative process to get the correct eigenvector. It is also a good idea to use shift=0 before the adaptive stragy is used. This, however, is not possible since the matrix $L$ is singular, and sparse decompositions in scipy does not work in this case. Therefore, we first use a very small shift instead.

  • (3 pts) Generate a random lollipop_graph using networkx library and find its partition. Draw this graph with vertices colored according to the partition.

In [249]:
m, n = (10, 20)
G = nx.lollipop_graph(m, n)
A = nx.adjacency_matrix(G)
x0 = np.random.random((A.shape[0],)).astype(np.float64)

eigs_builtin = np.sort(np.linalg.eigh(nx.laplacian_matrix(G).todense())[0])
x, eigs = partition(A, 0.01, 10, 100, x0, 0.00001)
print("Are second smallest the same?", np.allclose(eigs_builtin[1], eigs[-1]))


Are second smallest the same? True

In [250]:
pos = nx.spring_layout(G)
plt.figure(figsize=(14,7))
nx.draw_networkx(G, pos, node_color=np.sign(x))


  • (2 pts) Start the method with a random initial guess x0, set num_iter_fix=0 and comment why the method can converge to a wrong eigenvalue.

In [251]:
x0 = np.random.random((A.shape[0],)).astype(np.float32)

eigs_builtin = np.sort(np.linalg.eigh(nx.laplacian_matrix(G).todense())[0])
x, eigs = partition(A, 1e-3, 0, 5, x0, 1e-5)
print("Are second smallest the same?", np.allclose(eigs_builtin[1], eigs[-2]))
print(eigs_builtin[1], eigs[-1])


Are second smallest the same? False
0.012734401114602275 1.180469665628412

Fixed shift is important because w/o one it starts converging to the number equal to initial shift, which can be seen in code explicitly $x_0^T L x_0$

Spectral graph properties (15 pts)

  • (5 pts) Prove that multiplicity of the eigenvalue $0$ in the spectrum of the graphs Laplacian is the number of its connected components.
  • (10 pts) The second-smallest eigenvalue of $L(G)$, $\lambda_2(L(G))$, is often called the algebraic connectivity of the graph $G$. A basic intuition behind the use of this term is that a graph with a higher algebraic connectivity typically has more edges, and can therefore be thought of as being “more connected”.
    To check this statement, create few graphs with equal number of vertices using networkx, one of them should be $C_{30}$ - simple cyclic graph, and one of them should be $K_{30}$ - complete graph. (You also can change the number of vertices if it makes sense for your experiments, but do not make it trivially small).
    • Find the algebraic connectivity for the each graph using inverse iteration.
    • Plot the dependency $\lambda_2(G_i)$ on $|E_i|$.
    • Draw a partition for a chosen graph from the generated set.
    • Comment on the results.

Image bipartition (10 pts)

Let us deal here with a graph constructed from a binarized image. Consider the rule, that graph vertices are only pixels with $1$, and each vertex can have no more than $8$ connected vertices (pixel neighbours), $\textit{i.e}$ graph degree is limited by 8.

  • (3 pts) Find an image with minimal size equal to $(256, 256)$ and binarize it such that graph built on black pixels has exactly $1$ connected component.
  • (5 pts) Write a function that constructs sparse adjacency matrix from the binarized image, taking into account the rule from above.
  • (2 pts) Find the partition of the resulting graph and draw the image in accordance with partition.

In [265]:
from PIL import Image
import requests
url = "https://findicons.com/files/icons/2773/pictonic_free/256/lin_ubuntu.png"
img = np.array(Image.open(requests.get(url, stream=True).raw).convert('RGBA')).astype(np.uint8)[:,:,3]
img = 1.0 * (img < 100)
plt.imshow(img)


Out[265]:
<matplotlib.image.AxesImage at 0x7f2704704e80>

Problem 3 (30 pts)

Say hi to the drone

You received a radar-made air scan data of a terrorist hideout made from a heavy-class surveillance drone. Unfortunately, it was made with an old-fashioned radar, so the picture is convolved with the diffractive pattern. You need to deconvolve the picture to recover the building plan.


In [170]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import hankel2
radiointel = np.load('radiointel.npy')
plt.subplot(1,2,1)
plt.imshow( np.abs(radiointel) )
plt.title('Intensity')
plt.subplot(1,2,2)
plt.imshow( np.angle(radiointel), cmap='bwr' )
plt.title('Phase')
plt.show()


In this problem you asked to use using FFT-matvec and make the convolution operator for the picture of the size $N\times N$, where $N=300$ with the following kernel (2D Helmholtz scattering): $$ T_{\overline{i_1 j_1}, \overline{i_2 j_2} } \equiv eG_{i_1-j_1,i_2-j_2} = \frac{-1j}{4} H^{(2)}_0 \left( k_0 \cdot \Delta r_{ \overline{i_1 j_1}, \overline{i_2 j_2} } \right), \quad i_1,j_1, i_2, j_2 = 0,\dots, N-1 $$

except when both $i_1=i_2$ and $j_1 = j_2$.

In that case set $$T_{i_1=i_2, j_1=j_2} = 0$$.

Here $1j$ is the imaginary unit, $H^{(2)}_0(x)$ - (complex-valued) Hankel function of the second kind of the order 0. See 'scipy.special.hankel2'.

$$ \Delta r_{ \overline{i_1 j_1}, \overline{i_2 j_2} } = h \sqrt{ (i_1-i_2)^2 + (j_1-j_2)^2 } $$$$ h = \frac{1}{N-1}$$$$k_0 = 50.0$$

Tasks:

  1. (5 pts) Create the complex-valued kernel $eG$ ($2N-1 \times 2N-1$)-sized matrix according with the instructions above. Note that at the point where $\Delta r=0$ value of $eG$ should be manually zet to zero. Store in the variable eG. Plot the eG.real of it with plt.imshow

  2. (5 pts) Write function Gx that calculates matvec of $T$ by a given vector $x$. Make sure all calculations and arrays are in dtype=np.complex64. Hint: matvex with a delta function in pl

  3. (3 pts) What is the complexity of one matvec?

  4. (2 pts) Use scipy.sparse.linalg.LinearOperator to create an object that has attribute .dot() (this object will be further used in the iterative process). Note that .dot() input and output must be 1D vectors, so do not forget to use reshape.
  5. (15 pts) Write a function that takes an appropriate Krylov method(s) and solves linear system $Gx=b$ to deconvolve radiointel. The result should be binary mask array (real, integer, of 0s and 1s) of the plane of the building. Make sure it converged sufficiently and you did the post-processing properly. Plot the result as an image.

Note: You can use standart fft and ifft from e.g. numpy.fft

1. Kernel (5 pts)


In [234]:
k0 = 50.0
N = 300

def make_eG(k0, N):
    # INPUT:  
    # k0 #dtype = float
    # N #dtype = int
    # OUTPUT:
    # np.array, shape = (2N-1, 2N-1), dtype = np.complex64
    # eG = np.zeros((2*N - 1, 2*N - 1), dtype=np.complex64)
    row = np.square(np.linspace(-N + 1, N - 1, 2*N - 1))
    column = np.square(np.linspace(-N + 1, N - 1, 2*N - 1))
    row_m, col_m = np.meshgrid(row, column, indexing = 'ij')
    eG = -1j * hankel2(0, k0 / (N - 1) * np.sqrt(row_m + col_m)) / 4
    eG[N - 1, N - 1] = 0
    return np.roll(eG.astype(np.complex64), shift=N-1, axis=(0,1))

eG = make_eG(k0=k0, N=N)
plt.imshow(eG.real)


Out[234]:
<matplotlib.image.AxesImage at 0x7f068d0462b0>

2. Matvec (5 pts)


In [235]:
def Gx(x, eG):
    # input:  
    # x, np.array, shape=(N**2, ), dtype = np.complex64
    # eG, np.array, shape=(2N-1, 2N-1), dtype = np.complex64
    # output:
    # matvec, np.array, shape = (N**2, ), dtype = np.complex64
    N = int(np.sqrt(x.shape[0]))
    x_ready = np.pad(x.reshape((N, N)), ((0, N - 1), (0, N - 1)), 'constant')
    matvec = np.fft.ifft2(np.fft.fft2(eG) * np.fft.fft2(x_ready))[0:N,0:N]
    return matvec.reshape(N**2, 1).astype(np.complex64)

3. Complexity (3 pts)

Big-O complexity of one matvec operation is $\mathcal{O}(N^2\log{}N)$

4. LinearOperator (2 pts)


In [236]:
from scipy.sparse import linalg
L_Gx = linalg.LinearOperator((N ** 2, N ** 2), matvec=lambda x, eG=eG: Gx(x, eG))

5. Reconstruction (15pts)


In [238]:
from scipy.sparse.linalg import gmres
maxiter = 200

def normalize(mask): #proper normalization to binary mask
    mask = np.clip(mask, a_min=0, a_max=1)
    mask = np.round(mask)
    mask = np.asarray(mask, dtype=int)
    return mask

errs=[]
def callback(err): #callback function to store the history of convergence
    global errs
    errs.append(err)
    return 

mask, _ = gmres(L_Gx, radiointel.reshape(N**2,1), maxiter=maxiter, callback = callback)

plt.figure(figsize=(7,7))
plt.imshow(normalize(mask.real).reshape((N, N)), cmap='binary')
plt.title('Restored building plan', size=16)
plt.colorbar()

plt.figure(figsize=(14,7))
plt.semilogy(errs)
plt.grid()
plt.title('Convergence', size=16)


Out[238]:
Text(0.5,1,'Convergence')

In [ ]: