Lecture 9: Structured matrices

Syllabus

Week 1: Python intro
Week 2: Matrices, vectors, norms, ranks
Week 3: Linear systems, eigenvectors, eigenvalues
Week 4: Singular value decomposition + test + homework seminar
Week 5: Sparse & structured matrices

Recap of the previous lecture

  • Sparse matrices, compressed sparse row format
  • Gaussian elimination and graphs

Today lecture

Today we will talk about structured matrices.

  • Shift-invariant structure: Toeplitz matrices
  • Circulant matrices and FFT
  • What is an inverse of a Toeplitz matrix?
  • Displacement rank (and Gohberg-Semencul formula)
  • Multilevel matrices

In many applications, the linear transformation is described by a shift-invariant integral operator, also called convolution: $$ f(x) = \int_{\Pi} K(x -y) u(y) dy, $$ where $\Pi$ is a square in $\mathbb{R}^d$.

Example applications:

  • Image processing (blurring), $K(x) = \exp(-\sigma\Vert x \Vert^2)$.
  • Filters in image processing/signal processing
  • Integral equations, $K(x) = \frac{1}{\Vert x \Vert}$ - electrostatics, $K(x)=\exp(i k \Vert x \Vert)/(\Vert x \Vert)$.
  • Autoregressive processes ($y(t) = \sum_k c_k y(t - t_k)$)

After the discretization on a uniform grid, we get discrete convolution:
$$ y_i = \sum_{j} t_{i - j} x_j, $$ and the matrix $T_{ij} = t_{i - j}, i, j = 1, \ldots, n$ is called a Toeplitz matrix: on each diagonal we have the same value.

For $4 \times 4$ case it looks as follows: $$ T = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{-3} \\ t_1 & t_0 & t_{-1} & t_{-2} \\ t_2 & t_1 & t_0 & t_{-1} \\ t_3 & t_2 & t_1 & t_0 \end{pmatrix}, $$ and the matrix is completely defined by 2n-1 parameters (first column, first row minus the first element).

Now, let us do a short demo on how the Toeplitz matrices can be used.

Consider a one-dimensional signal


In [1]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
n = 128
u = np.zeros(n)
u[9] = 0.25
u[10] = 0.5
u[11] = 0.25

u[70] = 1
plt.plot(u)


Out[1]:
[<matplotlib.lines.Line2D at 0x10e4552d0>]

Do some blurring of the signal


In [7]:
c = 1
T = [[np.exp(-0.1 * (i - j) ** 2) for i in xrange(n)] for j in xrange(n)]
T = c * np.array(T)
y = T.dot(u)
p = np.arange(n)
plt.plot(p, y, label='blurred')
plt.plot(p, u, label='original')
plt.legend(loc='best')


Out[7]:
<matplotlib.legend.Legend at 0x10eed7b10>

Now add some noise


In [9]:
e = np.random.randn(n)
w = y + 1e-3 * e
plt.plot(w)
plt.plot(y)


Out[9]:
[<matplotlib.lines.Line2D at 0x10eddb210>]

Now let us do the "deblurring" (we assume, that we know the matrix T).


In [10]:
urec = np.linalg.solve(T, w)
plt.plot(urec)


Out[10]:
[<matplotlib.lines.Line2D at 0x10f46cd10>]

It fails completely! What to do? Use SVD approximation to the matrix T.


In [16]:
u1, s1, v1 = np.linalg.svd(T)
r = 70
u1 = u1[:, :r]
s1 = s1[:r]
v1 = v1[:r, :]
urec_svd = v1.T.dot(u1.T.dot(w) / s1)
plt.plot(urec_svd)
plt.plot(u)


Out[16]:
[<matplotlib.lines.Line2D at 0x110293b90>]

Questions

  1. What is the complexity of taking Toeplitz matrix-by-vector product?
  2. What is the complexity of solving linear systems with Toeplitz matrices?

Matrix-by-vector product

A Toeplitz matrix is defined by $2n-1$ parameters, so we migh expect a $\mathcal{O}(n \log n)$ algorithm.
How we can do that?

Convolutions on the continious level

There is a well-known result that convolution is a product in the Fourier domain:
$$f = g * h, \quad \mbox{or } f(x) = \int g(x) h(x - y) dy,$$
$$\widehat{f} = \widehat{g} \widehat{h},$$ where $$ \widehat{f}(k) = \int e^{i k x} f(x) dx. $$ is the Fourier transform of a function.

Thus, the algorithm can look as follows:

  1. Compute Fourier transforms $\widehat{g}, \widehat{h}$.
  2. Multiply them $\widehat{f} = \widehat{g} \widehat{h}$.
  3. Compute inverse Fourier transform $$ f(x) = \int e^{-ik x} \widehat{f}(k) dk. $$

But we have to work on the discrete level, so we might replace the continious Fourier transform by the discrete Fourier transform (DFT).

Discrete Fourier transform

The DFT matrix is defined as $$ F_{kl} = w^{kl}, \quad k, l = 0, \quad n-1, $$ and $w = \exp(\frac{2 \pi i}{n})$ is the root of unity.
It is a discretization of the integral Fourier transform.

But a Toeplitz matrix is defined by $(2n-1)$ parameters!
To apply Fast Fourier Transform (FFT) we have to consider a subclass of Toeplitz matrices:
circulant matrices.

Circulant matrices

A circulant matrix is a Toeplitz matrix with
$$ t_{n - i} = t_i. $$ Example: $$ C = \begin{pmatrix} c_0 & c_3 & c_2 & c_1 \\ c_1 & c_0 & c_3 & c_2 \\ c_2 & c_1 & c_0 & c_3 \\ c_3 & c_2 & c_1 & c_0 \end{pmatrix}. $$ A circulant matrix is uniquely defined by its first column and has $n$ independent parameters.

Spectral theorem

Any circulant matrix $C$ is diagonalized by the Fourier matrix F.
$$ C = F^* \Lambda F, $$ and $\Lambda_{ii} = \lambda_i$ is the diagonal matrix and
$$ \lambda = Fc, $$ where $c$ is the first column of the circulant matrix.

Proof

Fourier matrix is unitary, $FF^* = I$. Thus, it is sufficient to show that the rows of the Fourier matrix are eigenvectors of the circulant matrix. And this is obvious from the definition. The eigenvalues also follow immediately.

Fast circulant matrix-by-vector product

Given the spectral theorem, it is easy to provide a $\mathcal{O}(n \log n)$ algorithm for circulant matrix-by-vector product via FFT.

Indeed, $$ y = Cx = F^* \Lambda F x = F^* \Big( \mathrm{diag}(Fc) \Big) Fx, $$ so you see a direct analogue with the continious case:

  1. Compute two direct FFTs
  2. Compute their elementwise product (diagonal matrix by vector)
  3. Compute one inverse FFT

A quick demo on the spectral theorem


In [34]:
import numpy as np
import scipy.linalg
n = 50
b = np.random.randn(n)
mm = scipy.linalg.circulant(b)
mm1 = np.fft.ifft(mm, axis = 0)
mm1 = np.fft.fft(mm1, axis = 1) 
#mm1 = np.fft.fft2(mm)
print np.linalg.norm(mm1 - np.diag(np.diag(mm1)))
w1 = np.linalg.eigvals(mm) 
w2 = np.fft.ifft(b) * n
plt.plot(w1.real, w1.imag, ls='', marker='x')
plt.plot(w2.real, w2.imag, ls='', marker='.')


1.04643809636e-14
Out[34]:
[<matplotlib.lines.Line2D at 0x111bd3dd0>]

Toeplitz matrix-by-vector product

Fine, but how it helps is with the Toeplitz matrices.
The eigenvectors of the Toeplitz matrices are not the Fourier vectors (although they are asymptotically similar). The idea is to embed a Toeplitz matrix into larger circulant matrix.

$$ T = \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{-3} \\ t_1 & t_0 & t_{-1} & t_{-2} \\ t_2 & t_1 & t_0 & t_{-1} \\ t_3 & t_2 & t_1 & t_0 \end{pmatrix} \rightarrow \begin{pmatrix} t_0 & t_{-1} & t_{-2} & t_{-3} & t_3& t_2 & t_1 \\ t_1 & t_0 & t_{-1} & t_{-2} & t_{-3} & t_3 & t_2\\ t_2 & t_1 & t_0 & t_{-1} & t_{-2} & t_{-3} & t_3\\ t_3 & t_2 & t_1 & t_0 & t_{-1} & t_{-2} & t_{-3} \\ t_{-3} & t_3 & t_2 & t_1 & t_0 & t_{-1} & t_{-2}\\ t_{-2} & t_{-3} & t_3 & t_2 & t_1 & t_0 & t_{-1} \\ t_{-1} & t_{-2} & t_{-3} & t_3 & t_2 & t_1 & t_0 \\ \end{pmatrix} $$

or $$ C = \begin{bmatrix} T & * \

     * & * 
    \end{bmatrix},
$$ i.e. $T$ is a submatrix of a circulant matrix, and $$
 C \begin{bmatrix}
     q \\
     0
   \end{bmatrix} = \begin{bmatrix}
   T q \\
   *
    \end{bmatrix}

$$


In [37]:
n = 512
a = np.random.randn(2*n)
b = np.random.randn(2*n-1)
%timeit np.fft.fft(a)
%timeit np.fft.fft(b)


100000 loops, best of 3: 12.5 µs per loop
10000 loops, best of 3: 26.4 µs per loop

Matrix-by-vector product by a Toeplitz matrix

Final algorithm:

  1. $c = [t_i; t_{n-i}]$, $x = [x; 0]$
  2. Compute circulant matrix-by-vector product
  3. Take the part of the solution

This is an $\mathcal{O}(n \log n)$ algorithm.

Multidimensional convolution

In two dimensions (one of the most important cases) the discrete convolution is defined as $$ y_{ii'} = \sum_{jj'} T_{(i-i')(j-j')} x_{jj'}, $$ which leads to block Toeplitz matrices: block Toeplitz with Toeplitz blocks (BTTB). (draw a picture on the board).

Block case

The situation is the same: embed into two-level circulant matrix (of size $[(2n-1) \times (2n-1)]^2$), do two-dimensional FFT, multiply and do inverse two-dimensional FFT.

A two-dimensional FFT is simple: it it just Kronecker product of two one-dimensional FFT. $$ F_2 = F \otimes F. $$ In practice, it means "apply FFT to columns, and apply FFT to rows".

Inversion of Toeplitz matrices

Inversion of the circulant matrix is trivial: $$C = F D F^* \rightarrow C^{-1} = F D^{-1} F^*,$$ and the inverse of a circulant matrix is a circulant matrix.
Inversion of Toeplitz matrices is notoriously difficult.

Inverse of Toeplitz matrix is not a Toeplitz matrix!


In [41]:
import numpy as np
import scipy
from scipy.linalg import toeplitz
n = 6
c = np.arange(n)
t = scipy.linalg.toeplitz(c**3)
print t
ii = np.linalg.inv(t)
print 'Inverse:'
print ii


[[  0   1   8  27  64 125]
 [  1   0   1   8  27  64]
 [  8   1   0   1   8  27]
 [ 27   8   1   0   1   8]
 [ 64  27   8   1   0   1]
 [125  64  27   8   1   0]]
Inverse:
[[ 0.04638091 -0.16018559  0.18135219 -0.06522316  0.07954043 -0.02896156]
 [-0.16018559  0.58152894 -0.73685373  0.36588599 -0.22669023  0.07954043]
 [ 0.18135219 -0.73685373  1.15422006 -0.88002651  0.36588599 -0.06522316]
 [-0.06522316  0.36588599 -0.88002651  1.15422006 -0.73685373  0.18135219]
 [ 0.07954043 -0.22669023  0.36588599 -0.73685373  0.58152894 -0.16018559]
 [-0.02896156  0.07954043 -0.06522316  0.18135219 -0.16018559  0.04638091]]

Displacement rank

It is important to have a structure that defines the inverse of a Toeplitz matrices in $\mathcal{O}(n \log n)$ parameters (otherwise we will have to store $\mathcal{O}(n^2)$ parameters).
The key idea here is the displacement rank concept.

Introduce a shift matrix $$ Z_f = \begin{bmatrix} 0 & 0 & 0 & f \\ 1 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ \end{bmatrix}. $$

For any Toeplitz matrix $T$ it holds that the matrix $$ M = L(T) = Z_f T - T Z_e $$ has rank 2. This is the definition of displacement rank: for a fixed pair $Z_f$, $Z_e$, $L(T)$ has rank at most $r$.

Displacement rank is preserved by inversion!

Theorem If a matrix has displacement rank $r$ for a pair of generators $Z_f, Z_e$, then the inverse also has displacement rank for anot higher then $r$ for a pair of generators $Z^{\top}_e, Z^{\top}_f$.

Proof

$$Z_f T - T Z_e = UV^{\top},$$

therefore $$ T^{-1} Z_f - Z_e T^{-1} = T^{-1} U V^{\top} T^{-1}, $$ and the right-hand side has rank at most $r$. Moreover, if the right-hand side is given, it is easy to recover the elements of the matrix, or even compute matrix-by-vector products.
Further reading is a paper by Victor Pan

This is not an inversion algorithm, it is only the structure of the inverse.
Gohberg-Semencul algorithm was the first work on the subject and presented an $\mathcal{O}(n^2)$ algorithm.
Superfast algorithms ($\mathcal{O}(n \log n)$) are typically quite complicated, not stable and require the recursion.

Cauchy matrices

A matrix is called a Cauchy-like matrix (another example of a structured matrix), if
$$ a_{ij} = \frac{b_{ij}}{(x_i - y_j)}, $$ where matrix $B$ is of low rank. A Cauchy matrix corresponds to the case when $b_{ij} = 1$.

Displacement structure of the Cauchy matrix

Displacement structure of the Cauchy-like matrix is simple:
$$ x_i a_{ij} - a_{ij} y_j = b_{ij} (x_i - y_j), $$ i.e. $L(A) = D_1 A - A D_2$.
We can transform generators from one case to another. If we diagonalize $Z_e, Z_f$, we get the diagonal matrices,
i.e. by FFT we can reduce the Toeplitz matrix to a Cauchy-like matrix.

Good news about the Cauchy-like matrix is that there is an explicit inverse formula, $$ a_{ij} = \frac{b_{ij}}{(x_i - y_j)}, $$ which also allows for pivoting
(permuting rows and columns of a Cauchy-like matrix leaves it a Cauchy-like matrix).

FFT of a Toeplitz matrices and low-rank

The transformation between the Cauchy-like matrices and Toeplitz matrices is just a Fourier transform:
$$ P = F T F^*. $$ For a circulant matrix we will get the diagonal matrix.
But what we will get for a general Toeplitz matrix?


In [42]:
import numpy as np
import scipy.sparse
n = 100
a = [[1.0 / (i - j + 0.5) for i in xrange(n)] for j in xrange(n)]
a = np.array(a)
#ex = np.ones(n);
#a = scipy.sparse.spdiags(np.vstack((ex,  -2*ex, ex)), [-1, 0, 1], n, n, 'csr'); 
#a = a.todense()
mm1 = np.fft.ifft(a, axis=0)
mm1 = np.fft.fft(mm1, axis=1)

In [43]:
%matplotlib inline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
#fig, ax = ppl.subplots(111, projection='3d')
t = np.arange(n)
x, y = np.meshgrid(t, t)
ax.plot_surface(x, y, mm1.real, rstride=10, cstride=10)
ax.autoscale_view(tight=True)
ax.view_init(elev=10., azim=40)
fig.tight_layout()
#X, Y, Z = axes3d.get_test_data(0.05)
#ax.plot_wireframe(X, Y, Z, rstride=10, cstride=10)


The singularity is along the diagonal!
Off the diagonal we have nice and low-rank blocks!


In [44]:
b11 = mm1[n/2:, :][:, :n/2]
s = np.linalg.svd(b11)[1]
plt.semilogy(s)


Out[44]:
[<matplotlib.lines.Line2D at 0x112eb9e50>]

Block-low rank matrices

In fact there is more: every block outside the main diagonal has low rank. This class is called quasiseparable matrices. There exist exact $\mathcal{O}(n)$ algorithms for their inversion (and the inverse of a quasiseparable matrix). Thus, the algorithm:

  1. Compute quasiseparable representation: here you can do it in a fast way!
  2. Compute the inverse

A note on the block case

For multilevel Toeplitz matrices there is not a lot known about the inversion.
The best known algorithms have complexity $\mathcal{O}(N^{3/2})$. There is no explicit representation of the inverse.
However, there are a lot of possibilities for the iterative algorithms, since matrix-by-vector product is fast.

Take home message

  • Toeplitz matrices
  • Circulant matrices and FFT
  • Displacement structure
  • Low-rank blocks in the Fourier transform (and quasiseparable matrices)
Questions?

In [ ]:
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()


Out[ ]: