Lecture 2: Norms

Syllabus

Week 2: Matrices, vectors, norms, ranks

Recap of the previous lecture

Matrices are rectangular tables of numbers.
Matrix-by-vector product is $\mathcal{O}(n^2)$ and matrix-by-matrix is $\mathcal{O}(n^3)$.
Strassen algorithm is $\mathcal{O}(n^{2.78..})$, but big constant
Block algorithms are important to use computer memory hierarchy

Vector norm

Vectors typically provide an (approximate) description of a physical (or some other) object. One of the main question is how accurate the approximation is (1%, 10%). What is an acceptable representation, of course, depends on the particular applications. For example:

In partial differential equations accuracies $10^{-5} - 10^{-10}$ are the typical case
In data mining sometimes an error of $80\%$ is ok, since the interesting signal is corrupted by a huge noise.

Distances and norms

Norm is a qualitative measure of smallness of a vector and is typically denoted as $\Vert x \Vert$ The norm should satisfy certain properties:

$\Vert \alpha x \Vert = |\alpha| \Vert x \Vert$,
$\Vert x + y \Vert \leq \Vert x \Vert + \Vert y \Vert$ (triangle inequality),
If $\Vert x \Vert = 0$ then $x = 0$.

The distance between two vectors is then defined as $$ d(x, y) = \Vert x - y \Vert. $$

Standard norms

The most well-known and widely used norm is euclidean norm: $$\Vert x \Vert_2 = \sqrt{\sum_{i=1}^n |x_i|^2},$$ which corresponds to the distance in our real life (the vectors might have complex elements, thus is the modulus here).

P-norm

Euclidean norm, or $2$-norm, is a subclass of an important class of $p$-norms: $$ \Vert x \Vert_p = \Big(\sum_{i=1}^n |x_i|^p\Big)^p. $$ There are two very important special cases:

Infinity norm, or Chebyshev norm which is defined as the maximal element: $\Vert x \Vert_{\infty} = \max_i | x_i|$
$L_1$ norm (or Manhattan distance) which is defined as the sum of modules of the elements of $x$: $\Vert x \Vert_1 = \sum_i |x_i|$

We will give examples where Manhattan is very important: it all relates to the compressed sensing methods that emerged in the mid-2005 as one of the most popular research topics.

Computing norms in Python

The numpy package has all you need for computing norms (np.linalg.norm function)



In [9]:

    
import numpy as np
n = 100
a = np.ones(n)
b = a + 1e-3 * np.random.randn(n)
print 'Relative error:', np.linalg.norm(a - b) / np.linalg.norm(b)









    



Relative error: 0.0009807031019

Equivalence of the norms

All norms are equivalent in the sense that $$ C_1 \Vert x \Vert_* \leq \Vert x \Vert_{**} \leq C_2 \Vert x \Vert_* $$ for some constants $C_1(n), C_2(n)$ for any pairs of norms $\Vert \cdot \Vert_*$ and $\Vert \cdot \Vert_{**}$. You will have some problems on that in your homework! The equivalence of the norms basically means that if the vector is small in one norm, it is small in another norm. However, the constants can be large.

Unit disks in different norms

A unit disk is a set of point such that $\Vert x \Vert \leq 1$. For the Frobenius norm is a disk; for other norms the "disks" look different.



In [45]:

    
%matplotlib inline
import prettyplotlib as ppl
import matplotlib.pyplot as plt
import numpy as np
p = 3 #Which norm do we use
M = 100000 #Number of sampling points
a = np.random.randn(M, 2)
b = []
for i in xrange(M):
    if np.linalg.norm(a[i, :], p) <= 1:
        b.append(a[i, :])
b = np.array(b)
plt.fill(b[:, 0], b[:, 1])
plt.axis('equal')









    Out[45]:





(-1.0, 1.0, -1.0, 1.0)

Why $L_1$-norm can be important

$L_1$ norm, as it was discovered quite recently, plays an important role in compressed sensing. The simplest formulation is as follows:

You have some observations $f$
You have a linear model $Ax = f$, where $A$ is an $n \times m$ matrix, $A$ is known
The number of equations, $n$ is less than the number of unknowns, $m$ The question: can we find the solution?

The solution is obviously non-unique, so the natural approach is to find the solution that is minimal in the certain sense: $$ \Vert x \Vert \rightarrow \min, \quad \mbox{subject to } Ax = f$$

Typical choice of $\Vert x \Vert = \Vert x \Vert_2$ leads to the linear least squares problem (and has been used for ages).
The choice $\Vert x \Vert = \Vert x \Vert_1$ leads to the compressed sensing and what happens, it typically yields the sparsest solution.
A short demo

It is worth to note that formally $\Vert x \Vert_0$ is the number of non-zero elements in $x$, so $\Vert x \Vert_1$ is an approximation to the number of non-zeros.

Matrices and norms

How to measure distances between matrices? A trivial answer is that there is no big differences between matrices and vectors, and here comes the Frobenius norm of the matrix: $$ \Vert A \Vert_F = \Big(\sum_{i=1}^n \sum_{j=1}^m |a_{ij}|^2\Big)^{1/2} $$ There is a more general definition though:

Matrix norms

$\Vert \cdot \Vert$ is called a matrix norm if it is a vector norm on the linear space of $n \times m$ matrices, and it also is consistent with the matrix-by-matrix product, i.e. $$\Vert A B \Vert \leq \Vert A \Vert \Vert B \Vert$$

The multiplicative property is needed in many places, for example in the estimates for the error of solution of linear systems (we will cover this subject later).
Can you think of some matrix norms?

Operator norms

The most important class of the norms is the class of operator norms. Mathematically, they are defined as $$ \Vert A \Vert_* = \sup_{x \ne 0} \frac{\Vert A x \Vert_*}{\Vert x \Vert_*}, $$ where $\Vert \cdot \Vert_*$ is a vector norm. It is not diffcult to show that operator norm is a matrix norm. Among all operator norms $p$-norms are used, where $p$-norm is used as the vector norm. Among all $p$-norms three norms are the most common ones:

$p = 2, \quad$ spectral norm, denoted by $\Vert A \Vert_2$.
$p = \infty, \quad \Vert A \Vert_{\infty} = \max_i \sum_j |A_{ij}|$.
$p = 1, \quad \Vert A \Vert_{1} = \max_j \sum_i |A_{ij}|$.

Note that Frobenius norm is not an operator one.

Spectral norm

Spectral norm, $\Vert A \Vert_2$ is undoubtedly the most used matrix norm. It can not be computed directly from the entries using a simple formula, like the Euclidean norm, however, there are efficient algorithm to compute it. It is directly related to the singular value decomposition (SVD) of the matrix. It holds $$ \Vert A \Vert_2 = \sigma_1(A) $$ where $\sigma_1(A)$ is the largest singular value of the matrix $A$. We will soon learn all about this. Meanwhile, we can already compute the norm in Python.



In [25]:

    
import numpy as np
n = 100
a = np.random.randn(n, n) #Random n x n matrix
s1 = np.linalg.norm(a, 2) #Spectral
s2 = np.linalg.norm(a, 'fro') #Frobenius
s3 = np.linalg.norm(a, 1) #1-norm
s4 = np.linalg.norm(a, np.inf) #It was trick to find the infinity
print 'Spectral:', s1, 'Frobenius:', s2, '1-norm', s3, 'infinity', s4









    



Spectral: 19.7629070969 Frobenius: 101.273176213 1-norm 97.6821174651 infinity 98.3147621883

Take home message

Norms are measures of smallness, used to compute the accuracy
$1$, $p$ and Euclidean norms
Matrix norms
$L_1$ is used in compressed sensing as a surrogate for sparsity.

Questions?



In [1]:

    
from IPython.core.display import HTML
def css_styling():
    styles = open("./styles/custom.css", "r").read()
    return HTML(styles)
css_styling()









    Out[1]: