Lecture 3: Matrix norms, scalar products, unitary matrices

Recap of the previous lecture

Matrix multiplication is the core of NLA
This is all about computer memory hierarchy
Concept of block algorithms
(Advanced topic) Strassen and trilinear

Matrices and norms

Recall vector norms that allow to evaluate distance between two vectors or how large are elements of a vector.

How to generalize this concept to matrices?

A trivial answer is that there is no big difference between matrices and vectors, and here comes the simplest matrix norm - Frobenius norm: $$ \Vert A \Vert_F = \Big(\sum_{i=1}^n \sum_{j=1}^m |a_{ij}|^2\Big)^{1/2} $$

Matrix norms

$\Vert \cdot \Vert$ is called a matrix norm if it is a vector norm on the linear space of $n \times m$ matrices:

$\|A\| \geq 0$
$\|A\| = 0$ iff $A = O$
$\|\alpha A\| = |\alpha| \|A\|$
$\|A+B\| \leq \|A\| + \|B\|$

Additionally some matrix norms can satisfy the submultiplicative property

$\Vert A B \Vert \leq \Vert A \Vert \Vert B \Vert$

The submultiplicative property is needed in many places, for example in the estimates for the error of solution of linear systems (we will cover this subject later).

Example of a non-submultiplicative norm: Chebyshev norm $\|A\|_C = \displaystyle{\max_{i,j}}\, |a_{ij}|$

Operator norms

The most important class of the norms is the class of operator norms. Mathematically, they are defined as

$$ \Vert A \Vert_{*,**} = \sup_{x \ne 0} \frac{\Vert A x \Vert_*}{\Vert x \Vert_{**}}, $$

where $\Vert \cdot \Vert_*$ and $\| \cdot \|_{**}$ are vector norms.

It is easy to check that operator norms are submultiplicative.

Frobenius norm is not an operator norm, i.e. you can not find $\Vert \cdot \Vert_*$ and $\| \cdot \|_{**}$ that induce it. A nontrivial fact.

Matrix $p$-norms

Important case of operator norms are matrix $p$-norms, which are defined for $\|\cdot\|_* = \|\cdot\|_{**} = \|\cdot\|_p$.
Among all $p$-norms three norms are the most common ones:

$p = 2, \quad$ spectral norm, denoted by $\Vert A \Vert_2$.
$p = \infty, \quad \Vert A \Vert_{\infty} = \max_i \sum_j |a_{ij}|$.
$p = 1, \quad \Vert A \Vert_{1} = \max_j \sum_i |a_{ij}|$.

Let us check it for $p=\infty$ on a blackboard.

Spectral norm

Spectral norm, $\Vert A \Vert_2$ is one of the most used matrix norms (along with the Frobenius norm). It can not be computed directly from the entries using a simple formula, like the Euclidean norm, however, there are efficient algorithms to compute it. It is directly related to the singular value decomposition (SVD) of the matrix. It holds

$$ \Vert A \Vert_2 = \sigma_1(A) = \sqrt{\lambda_\max(A^*A)} $$

where $\sigma_1(A)$ is the largest singular value of the matrix $A$ and $^*$ is a conjugate transpose of the matrix. We will soon learn all about the SVD. Meanwhile, we can already compute the norm in Python.



In [1]:

    
import numpy as np
n = 100
a = np.random.randn(n, n) #Random n x n matrix
s1 = np.linalg.norm(a, 2) #Spectral
s2 = np.linalg.norm(a, 'fro') #Frobenius
s3 = np.linalg.norm(a, 1) #1-norm
s4 = np.linalg.norm(a, np.inf) #It was trick to find the infinity
print 'Spectral:', s1, '\nFrobenius:', s2, '\n1-norm', s3, '\ninfinity', s4









    



Spectral: 19.4348433329 
Frobenius: 99.4981700135 
1-norm 94.6994776999 
infinity 92.6149263332

Examples

Several examples where matrix norms arise:

$\displaystyle{\min_{\mathrm{rank}(A_r)=r}}\| A - A_r\|$ - finding best rank-r approximation. SVD helps to solve this problem for $\|\cdot\|_2$ and $\|\cdot\|_F$.

$\displaystyle{\min_B}\| P_\Omega \odot(A - B)\| + \mathrm{rank}(B)$ - matrix completion. $(P_\Omega)_{ij} = 1$ if $i,j\in\Omega$ and $0$ otherwise. $\odot$ denotes Hadamard product (elementwise)

$\displaystyle{\min_{B,C\geq 0}} \|A - BC\|$, nonnegative matrix factorization. Symbol $B\geq0$ here means that all elements of $B$ are nonnegative.

Scalar product

While norm is a measure of distance, the scalar product takes angle into account.

The scalar product is defined as $$ (x, y) = x^* y = \sum_{i=1}^n \overline{x}_i y_i , $$ where $\overline{x}$ denotes the complex conjugate of $x$. The Euclidean norm is then

$$ \Vert x \Vert_2 = (x, x)^{1/2}, $$

or it is said the the norm is induced by scalar product.

Remark. The angle between two vectors is defined as $$ \cos \phi = \frac{(x, y)}{\Vert x \Vert_2 \Vert y \Vert_2} $$

An important property of the scalar product is the Cauchy-Schwarz-Bunyakovski inequality: $$ |(x, y)| \leq \Vert x \Vert_2 \Vert y \Vert_2, $$ and thus the angle between two vectors is defined properly.

Matrices preserving the norm

For stability it is really important that the error does not grow after we apply some transformations. Suppose that you approximately get a vector,

$$ \Vert x - \widehat{x} \Vert \leq \varepsilon. $$

Let final result be (some) linear transformation of $x$:
$$ y = U x, \quad \widehat{y} = U \widehat{x}. $$ If we want to estimate a difference between $\widehat{y}$ and $y$:

$$ \Vert y - \widehat{y} \Vert = \Vert U ( x - \widehat{x}) \Vert \stackrel{\text{?}}{\leq} \varepsilon $$

The question is for which kind of matrices the norm of the vector will not change.

For the euclidean norm $\|\cdot\|_2$ the answer is unitary (or orthogonal in the real case) matrices.

Unitary (orthogonal) matrices

Let $U$ be complex $n \times r$ matrix, and $\Vert U z \Vert_2 = \Vert z \Vert_2$ for all $z$. This can happen if and only if

$$ U^* U = I_r, $$

where $I_r$ is an identity matrix $r\times r$.

Indeed, $$\Vert Uz \Vert_2^2 = (Uz, Uz) = (Uz)^* Uz = z^* (U^* U) z = z^* z = \|z\|_2^2,$$

which can only hold if $U^* U = I_r$.

Complex $n\times n$ square matrix is called unitary if $$ U^*U = UU^* = I_n, $$ which means that columns and rows of unitary matrices both form orthonormal basis in $\mathbb{C}^{n}$.

For rectangular matrices of size $m\times n$ ($n\not= m$) only one of the equalities can hold

$$ U^*U = I_n \text{ - left unitary for $m>n$} \quad \text{or} \quad UU^* = I_m \text{ - right unitary for $m<n$}. $$

In case of real matrices $U^* = U^T$ and matrices $$ U^TU = UU^T = I $$ are called orthogonal.

Unitary matrices

Important property: a product of two unitary matrices is a unitary matrix:

$$(UV)^* UV = V^* (U^* U) V = V^* V = I,$$

Later we will show that there are types of matrices (Householder reflections and Givens rotations) composition of which is able to produce arbitrary unitary matrix.
This idea is a core of some algorithms (e.g. QR decomposition).

Examples of unitary matrices

There are two important classes of unitary matrices, using composition of which we can make any unitary matrix:

Householder matrices
Givens (Jacobi) matrices

Other important examples are

Permutation matrix $P$ whose rows (columns) are permutation of rows (columns) of the identity matrix.
Fourier matrix $F_n = \frac{1}{\sqrt{n}} \{ e^{-i\frac{2\pi kl}{n}}\}_{k,l=0}^{n-1}$

Householder matrices

Householder matrix is the matrix of the form

$$H \equiv H(v) = I - 2 vv^*,$$

where $v$ is an $n \times 1$ column and $v^* v = 1$. Can you show that $H$ is unitary and Hermitian ($H^* = H$)? It is also a reflection:

$$ Hx = x - 2(v^* x) v$$

A simple proof: $H^* H = (I - 2 vv^*)(I - 2 v v^*) = I$

Important property of Householder reflection

A nice property of Housholder transformation is that it can zero all elements of a vector except for the first one: $$ H \begin{bmatrix} \times \\ \times \\ \times \\ \times \end{bmatrix} = \begin{bmatrix} \times \\ 0 \\ 0 \\ 0 \end{bmatrix}. $$

Proof. Let $e_1 = (1,0,\dots, 0)^T$, then we want to find $v$ such that

$$ H x = x - 2(v^* x) v = \alpha e_1, $$

where $\alpha$ is unknown constant. Multiplying by $x^*$ we get

$$ x^* x - 2 (v^* x) x^* v = \alpha x_1; $$$$ \|x\|_2^2 - 2 (v^* x)^2 = \alpha x_1 $$$$ (v^* x)^2 = \frac{\|x\|_2^2 - \alpha x_1}{2} $$

Since $v^*v =1$, $v = \dfrac{x-\alpha e_1}{2 v^* x}$ and $(v^* x)^2 = \dfrac{\|x\|_2^2 - \alpha x_1}{2}$ we obtain

$$ 1 = v^* v = \dfrac{x^*x - 2\alpha x_1 + \alpha^2}{4 (v^* x)^2} = \dfrac{\|x\|_2^2 - 2\alpha x_1 + \alpha^2}{2(\|x\|_2^2 - \alpha x_1)}; $$$$ \|x\|_2^2 - 2\alpha x_1 + \alpha^2 = 2\|x\|_2^2 - 2\alpha x_1; $$$$ \alpha^2 = \|x\|_2^2; $$$$ \alpha = \pm \|x\|_2. $$

So, $v$ exists and equals $$ v = \dfrac{x \pm \|x\|_2 e_1}{2v^* x} = \dfrac{x \pm \|x\|_2 e_1}{\pm\sqrt{2(\|x\|_2^2 \mp \|x\|_2 x_1)}} $$

Housholder algorithm for QR decomposition

Using the obtained property we can make arbitrary matrix $A$ lower triangular: $$ H_2 H_1 A = \begin{bmatrix} \times & \times & \times & \times \\ 0 & \times & \times & \times \\ 0 & 0 & \boldsymbol{\times} & \times\\ 0 &0 & \boldsymbol{\times} & \times \\ 0 &0 & \boldsymbol{\times} & \times \end{bmatrix} $$ then finding $H_3=\text{diag}(I_2, \tilde H_3)$ such that $$ \tilde H_3 \begin{bmatrix} \boldsymbol{\times}\\ \boldsymbol{\times} \\ \boldsymbol{\times} \end{bmatrix} = \begin{bmatrix} \times \\ 0 \\ 0 \end{bmatrix}. $$ we get $$ H_3 H_2 H_1 A = \begin{bmatrix} \times & \times & \times & \times \\ 0 & \times & \times & \times \\ 0 & 0 & {\times} & \times\\ 0 &0 & 0 & \times \\ 0 &0 & 0 & \times \end{bmatrix} $$ Finding $H_4$ by analogy we arrive at upper-triangular matrix.

Corollary: (QR decomposition) Every $A\in \mathbb{C}^{n\times m}$, can be represented as $$ A = QR, $$ where $Q$ is unitary and $R$ is upper triangular.

Givens (Jacobi) matrix

A Givens matrix is a matrix

$$ G = \begin{bmatrix} \cos \alpha & -\sin \alpha \\ \sin \alpha & \cos \alpha \end{bmatrix}, $$

which is a rotation. For a general case, we select two $(i, j)$ planes and rotate vector $x$
$$ x' = G x, $$

only in $i$-th and $j$-th positions:

$$ x'_i = x_i\cos \alpha - x_j\sin \alpha , \quad x'_j = x_i \sin \alpha + x_j\cos\alpha, $$

with all other $x_i$ remain unchanged. Therefore, we can make elements in the $j$-th position zero by choosing $\alpha$ such that $$ \cos \alpha = \frac{x_i}{\sqrt{x_i^2 + x_j^2}}, \quad \sin \alpha = -\frac{x_j}{\sqrt{x_i^2 + x_j^2}} $$

QR via Givens rotations

Similarly we can make matrix upper-triagular using Givens rotations:

$$ \begin{bmatrix} \times & \times & \times \\ \bf{*} & \times & \times \\ \bf{*} & \times & \times \end{bmatrix} \to \begin{bmatrix} * & \times & \times \\ * & \times & \times \\ 0 & \times & \times \end{bmatrix} \to \begin{bmatrix} \times & \times & \times \\ 0 & * & \times \\ 0 & * & \times \end{bmatrix} \to \begin{bmatrix} \times & \times & \times \\ 0 & \times & \times \\ 0 & 0 & \times \end{bmatrix} $$

Givens vs. Housholder

Housholder is useful for dense matrices (complexity is $\approx$ twice smaller than for Jacobi) and we need to zero large number of elements.
Givens rotations are more suitable for sparse matrice or parallel machines as it acts locally on elements.

Summary

Unitary matrices preserve the norm
There are two "basic" classes of unitary matrices, Householder and Givens.
Every unitary matrix can be represented as a product of those.

Next week

SVD
You got PSet 1
Think of course projects

Questions?



In [1]:

    
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()









    Out[1]: