If norm is a measure of distances, then the scalar product takes angle into account.

The scalar product is defined as
$$
(x, y) = \sum_{i=1}^n \overline{x}_i y_i,
$$
where $\overline{x}$ denotes the *complex conjugate* of $x$. The Euclidean norm is then
$$
\Vert x \Vert^2 = (x, x),
$$
or it is said the the norm is **induced** by the scalar product.

**Remark** For the angle between two vectors is defined as
$$
\cos \phi = \frac{(x, y)}{\Vert x \Vert \Vert y \Vert}
$$

**Cauchy-Bunyakovski inequality**:
$$
|(x, y)| \leq \Vert x \Vert \Vert y \Vert,
$$
and thus the angle between two vectors is defined properly.

$$
(x, y) = x^* y,
$$
where $^*$ is a **conjugate transpose** of the matrix:

$$
B = A^*, \quad B_{ij} = \overline{A_{ji}}.
$$

For stability it is really important that the error does not grow. Suppose that you approximately get a vector,

$$
\Vert x - \widehat{x} \Vert \leq \varepsilon.
$$
Then the final result is (some) linear transformation of $x$:

$$
y = Ux, \quad \widehat{y} = U \widehat{x}.
$$
If we want to estimate a difference between $\widehat{y}$ and $y$:

$$
\Vert y - \widehat{y} \Vert = \Vert U ( x - \widehat{x}) \Vert \leq \Vert U \Vert \varepsilon.
$$
The question is for which kind of matrices the length of the vector **will not change**. For the euclidean norm this produces a very important class of matrices: **unitary** (or orthogonal in the real case) matrices.

Let $U$ be an $n \times r$ matrix, and $\Vert U z \Vert = \Vert z \Vert$ for all $z$. This can happen if and only if

$$
U^* U = I,
$$
where $I$ is an **identity matrix**
Indeed, $$\Vert Uz \Vert^2 = (Uz, Uz) = (Uz)^* Uz = z^* (U^* U) z = z^* z,$$
which can only hold if $U^* U = I$.

In the real case, when $U^* = U^{\top}$, the matrix is called orthogonal.

Are there many unitary matrices? First of all, **a product of two unitary matrices is a unitary matrix:**

$$(UV)^* UV = V^* (U^* U) V = V^* V = I,$$
thus if we give some non-trivial examples of unitary matrices, we will be able to get any unitary transformation.

A Givens matrix is a matrix

$$
A = \begin{bmatrix}
\cos \alpha & \sin \alpha \\
-\sin \alpha & \cos \alpha
\end{bmatrix},
$$
which is a rotation. For a general case, we select two $(i, j)$ planes and rotate only in those:
$$
x'_i = \cos \alpha x_i + \sin \alpha x_j, \quad x'_j = -\sin \alpha x_i + \cos\alpha x_j,
$$
with all other $x_i$ remain unchanged.

A matrix can be considered as a sequence of vectors, its columns:
$$
A = [a_1, \ldots, a_m],
$$
where $a_m \in \mathbb{C}^n$.

A matrix-by-vector product is equivalent to taking a linear combination of those vectors

$$
y = Ax, \quad \Longleftrightarrow y = a_1 x_1 + a_2 x_2 + \ldots +a_m x_m.
$$

A **linear space** is defined as all possible vectors of the form
$$
\sum_{i=1}^m a_i x_i,
$$
where $x_i$ are some coefficients and $a_i, i = 1, \ldots, m$ are given vectors. In the matrix form, the linear space is a collection of the vectors of the form

$$A x.$$
This set is also called the **range** of the matrix.

** Theorem**

The dimension of the column space of the matrix is equal to the dimension of the row space of the matrix.

A very useful representation for computation of the matrix rank is the **skeleton decomposition**, which can be graphically represented as follows:

**nonsingular**.

**Remark.**

An inverse of the matrix $P$ the matrix $Q = P^{-1}$ such that

$ P Q = QP = I$.

If the matrix is square and has full rank (i.e. equal to $n$) then the inverse exists.

Let $U$ be the $r$ columns based on the submatrix $\widehat{A}$. Then they are linearly independent. Take any other column $u$ of $A$. Then it is a linear combination of the columns of $U$, i.e.

$$u = U x$$.

These are $n$ equations. We take $r$ of those corresponding to the rows that contain $\widehat{A}$ and get the equation

$$\widehat{u} = \widehat{A} x,$$ therefore

$$x = \widehat{A}^{-1} \widehat{u},$$ and that gives us the result.

Any rank-$r$ matrix can be written in the form $$A = U \widehat{A}^{-1} V^{\top},$$ where $U$ is $n \times r$, $V$ is $m \times r$ and $\widehat{A}$ is $r \times r$, or $$ A = U' V'^{\top}. $$ So, every rank-$r$ matrix can be written as a product of a "tall" matrix $U'$ by a long matrix $V'$.

$$
a_{ij} = \sum_{\alpha=1}^r u_{i \alpha} v_{j \alpha}.
$$
For rank 1 we have
$$
a_{ij} = u_i v_j,
$$
i.e. it is a separation of indices and rank-$r$ is a sum of rank-$1$ matrices!

```
In [1]:
```import numpy as np
n = 1000
r = 10
u = np.random.randn(n, r)
v = np.random.randn(n, r)
a = u.dot(v.T) #u.dot(v.T.conj())
x = np.random.randn(n)
%timeit a.dot(x)
%timeit u.dot(v.T.dot(x))

```
```

We can also try to compute the matrix rank using the built-in `np.linalg.matrix_rank`

function

```
In [3]:
```#Computing matrix rank
import numpy as np
n = 50
a = np.ones((n, n))
print 'Rank of the matrix:', np.linalg.matrix_rank(a)
b = a + 1e-10 * np.random.randn(n, n)
print 'Rank of the matrix:', np.linalg.matrix_rank(b)

```
```

For any rank-$r$ matrix $A$ with $r < \min(m, n)$ there is a matrix $B$ such that its rank rank is equal to $\min(m, n)$ and
$$
\Vert A - B \Vert = \varepsilon.
$$
So, does this mean that numerically matrix rank has no meaning? (I.e., small perturbations lead to full rank)!

The solution is to compute the best **rank-r** approximation to a matrix.

To compute low-rank approximation, we need to compute **singular value decomposition**.

Any matrix $A$ can be written as a product of three matrices:

$$
A = U \Sigma V^*,
$$
where $U$ is an $n \times K$ unitary matrix, $V$ is an $m \times K$ unitary matrix, $K = \min(m, n)$ $\Sigma$ is a diagonal matrix with non-negative elements $\sigma_1 \geq \ldots, \geq \sigma_K$ on the diagonal.

Computation of the best rank-$r$ approximation is equivalent to setting $\sigma_{r+1}= 0, \ldots, \sigma_K = 0$. The error is then determined by the omitted singular value!

$$
\min_{A_r} \Vert A - A_r \Vert = \sigma_{r+1},
$$
that is why it is important to look at the decay of the singular vectors.

$\Vert A - A_r \Vert = \Vert U \Sigma V^* - A_r \Vert = \Vert \Sigma - U^* A_r V \Vert$,

where we used that for spectral norm (and for the Frobenius as well) we have

for any unitary $Q$ and any matrix $X$. This is called **unitary equivalence** of the spectral norm (and Frobenius as well).

What is left is to note that the best rank-$r$ approximation of the diagonal matrix is its subdiagonal.

```
In [4]:
```#Computing matrix rank
import numpy as np
n = 50
a = np.ones((n, n))
print 'Rank of the matrix:', np.linalg.matrix_rank(a)
b = a + 1e-5 * np.random.randn(n, n)
print 'Rank of the matrix:', np.linalg.matrix_rank(b)

```
```

```
In [15]:
```u, s, v = np.linalg.svd(b)
print s[1]
r = 1
u1 = u[:, :r]
s1 = s[:r]
v1 = v[:r, :]
a1 = u1.dot(np.diag(s1).dot(v1))
print np.linalg.norm(a - a1, 'fro')/np.linalg.norm(a)

```
```

**function-related** matrices, i.e. the matrices of the form
$$a_{ij} = f(x_i, y_j),$$
where $f$ is a certain function, and $x_i, \quad i = 1, \ldots, n$ and $y_j, \quad j = 1, \ldots, m$ are some **one-dimensional grids**.

```
In [32]:
```%matplotlib inline
import prettyplotlib as ppl
n = 1000
a = [[1.0/(i+j+1) for i in xrange(n)] for j in xrange(n)] #Hilbert matrix
a = np.array(a)
u, s, v = np.linalg.svd(a)
fig, ax = ppl.subplots(1, 1)
ax.semilogy(s[1:30])
#We have very good low-rank approximation of it!

```
Out[32]:
```

Now we can do something more interesting, like function approximation

```
In [23]:
```import numpy as np
n = 128
t = np.linspace(0, 5, n)
x, y = np.meshgrid(t, t)
f = 1.0 / (x + y ** 2 + 0.5)
u, s, v = np.linalg.svd(f, full_matrices=False)
r = 10
u = u[:, :r]
s = s[:r]
v = v[:r, :] #Mind the transpose here!
fappr = u.dot(np.diag(s).dot(v))
er = np.linalg.norm(fappr - f, 'fro') / np.linalg.norm(f, 'fro')

And do some 3D plotting

```
In [27]:
```%matplotlib inline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(12, 5))
ax = fig.add_subplot(131, projection='3d')
ax.plot_surface(x, y, f)
ax.set_title('Original function')
ax = fig.add_subplot(132, projection='3d')
ax.plot_surface(x, y, fappr)
ax.set_title('Approximated function')
ax = fig.add_subplot(133, projection='3d')
ax.plot_surface(x, y, fappr - f)
ax.set_title('Approximation error with rank=%d, err=%3.1e' % (r, er))
fig.subplots_adjust()
fig.tight_layout()

```
```

Consider a linear factor model,

$$y = Ax, $$where $y$ is a vector of length $n$, and $x$ is a vector of length $r$.

The data is organizes as samples: we observe vectors

$$y_1, \ldots, y_T,$$
then the factor model can be written as

$$
Y = AX,
$$
where $Y$ is $n \times T$, $A$ is $n \times r$ and $X$ is $r \times T$. This is exactly a rank-$r$ model: it tells us that the vector lie in a small subspace!

We also can use SVD to recover this subspace (but not the independent components). Principal component analysis can be done by SVD!

```
In [63]:
```from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()

```
Out[63]:
```