If norm is a measure of distances, then the scalar product takes angle into account.
The scalar product is defined as
$$
(x, y) = \sum_{i=1}^n \overline{x}_i y_i,
$$
where $\overline{x}$ denotes the complex conjugate of $x$. The Euclidean norm is then
$$
\Vert x \Vert^2 = (x, x),
$$
or it is said the the norm is induced by the scalar product.
Remark For the angle between two vectors is defined as $$ \cos \phi = \frac{(x, y)}{\Vert x \Vert \Vert y \Vert} $$
An important property of the scalar product is the Cauchy-Bunyakovski inequality: $$ |(x, y)| \leq \Vert x \Vert \Vert y \Vert, $$ and thus the angle between two vectors is defined properly.
The scalar product can be written as a matrix-by-matrix product
$$
(x, y) = x^* y,
$$
where $^*$ is a conjugate transpose of the matrix:
$$
B = A^*, \quad B_{ij} = \overline{A_{ji}}.
$$
For stability it is really important that the error does not grow. Suppose that you approximately get a vector,
$$
\Vert x - \widehat{x} \Vert \leq \varepsilon.
$$
Then the final result is (some) linear transformation of $x$:
$$
y = Ux, \quad \widehat{y} = U \widehat{x}.
$$
If we want to estimate a difference between $\widehat{y}$ and $y$:
$$
\Vert y - \widehat{y} \Vert = \Vert U ( x - \widehat{x}) \Vert \leq \Vert U \Vert \varepsilon.
$$
The question is for which kind of matrices the length of the vector will not change. For the euclidean norm this produces a very important class of matrices: unitary (or orthogonal in the real case) matrices.
Let $U$ be an $n \times r$ matrix, and $\Vert U z \Vert = \Vert z \Vert$ for all $z$. This can happen if and only if
$$
U^* U = I,
$$
where $I$ is an identity matrix
Indeed, $$\Vert Uz \Vert^2 = (Uz, Uz) = (Uz)^* Uz = z^* (U^* U) z = z^* z,$$
which can only hold if $U^* U = I$.
In the real case, when $U^* = U^{\top}$, the matrix is called orthogonal.
Are there many unitary matrices? First of all, a product of two unitary matrices is a unitary matrix:
$$(UV)^* UV = V^* (U^* U) V = V^* V = I,$$
thus if we give some non-trivial examples of unitary matrices, we will be able to get any unitary transformation.
A Givens matrix is a matrix
$$
A = \begin{bmatrix}
\cos \alpha & \sin \alpha \\
-\sin \alpha & \cos \alpha
\end{bmatrix},
$$
which is a rotation. For a general case, we select two $(i, j)$ planes and rotate only in those:
$$
x'_i = \cos \alpha x_i + \sin \alpha x_j, \quad x'_j = -\sin \alpha x_i + \cos\alpha x_j,
$$
with all other $x_i$ remain unchanged.
A matrix can be considered as a sequence of vectors, its columns:
$$
A = [a_1, \ldots, a_m],
$$
where $a_m \in \mathbb{C}^n$.
A matrix-by-vector product is equivalent to taking a linear combination of those vectors
$$
y = Ax, \quad \Longleftrightarrow y = a_1 x_1 + a_2 x_2 + \ldots +a_m x_m.
$$
A linear space is defined as all possible vectors of the form
$$
\sum_{i=1}^m a_i x_i,
$$
where $x_i$ are some coefficients and $a_i, i = 1, \ldots, m$ are given vectors. In the matrix form, the linear space is a collection of the vectors of the form
$$A x.$$
This set is also called the range of the matrix.
Theorem
The dimension of the column space of the matrix is equal to the dimension of the row space of the matrix.
A very useful representation for computation of the matrix rank is the skeleton decomposition, which can be graphically represented as follows:
Remark.
An inverse of the matrix $P$ the matrix $Q = P^{-1}$ such that
$ P Q = QP = I$.
If the matrix is square and has full rank (i.e. equal to $n$) then the inverse exists.
Let $U$ be the $r$ columns based on the submatrix $\widehat{A}$. Then they are linearly independent. Take any other column $u$ of $A$. Then it is a linear combination of the columns of $U$, i.e.
$$u = U x$$.
These are $n$ equations. We take $r$ of those corresponding to the rows that contain $\widehat{A}$ and get the equation
$$\widehat{u} = \widehat{A} x,$$ therefore
$$x = \widehat{A}^{-1} \widehat{u},$$ and that gives us the result.
Any rank-$r$ matrix can be written in the form $$A = U \widehat{A}^{-1} V^{\top},$$ where $U$ is $n \times r$, $V$ is $m \times r$ and $\widehat{A}$ is $r \times r$, or $$ A = U' V'^{\top}. $$ So, every rank-$r$ matrix can be written as a product of a "tall" matrix $U'$ by a long matrix $V'$.
In the index form, it is
$$
a_{ij} = \sum_{\alpha=1}^r u_{i \alpha} v_{j \alpha}.
$$
For rank 1 we have
$$
a_{ij} = u_i v_j,
$$
i.e. it is a separation of indices and rank-$r$ is a sum of rank-$1$ matrices!
In [1]:
import numpy as np
n = 1000
r = 10
u = np.random.randn(n, r)
v = np.random.randn(n, r)
a = u.dot(v.T) #u.dot(v.T.conj())
x = np.random.randn(n)
%timeit a.dot(x)
%timeit u.dot(v.T.dot(x))
We can also try to compute the matrix rank using the built-in np.linalg.matrix_rank
function
In [3]:
#Computing matrix rank
import numpy as np
n = 50
a = np.ones((n, n))
print 'Rank of the matrix:', np.linalg.matrix_rank(a)
b = a + 1e-10 * np.random.randn(n, n)
print 'Rank of the matrix:', np.linalg.matrix_rank(b)
For any rank-$r$ matrix $A$ with $r < \min(m, n)$ there is a matrix $B$ such that its rank rank is equal to $\min(m, n)$ and
$$
\Vert A - B \Vert = \varepsilon.
$$
So, does this mean that numerically matrix rank has no meaning? (I.e., small perturbations lead to full rank)!
The solution is to compute the best rank-r approximation to a matrix.
To compute low-rank approximation, we need to compute singular value decomposition.
Any matrix $A$ can be written as a product of three matrices:
$$
A = U \Sigma V^*,
$$
where $U$ is an $n \times K$ unitary matrix, $V$ is an $m \times K$ unitary matrix, $K = \min(m, n)$ $\Sigma$ is a diagonal matrix with non-negative elements $\sigma_1 \geq \ldots, \geq \sigma_K$ on the diagonal.
Computation of the best rank-$r$ approximation is equivalent to setting $\sigma_{r+1}= 0, \ldots, \sigma_K = 0$. The error is then determined by the omitted singular value!
$$
\min_{A_r} \Vert A - A_r \Vert = \sigma_{r+1},
$$
that is why it is important to look at the decay of the singular vectors.
$\Vert A - A_r \Vert = \Vert U \Sigma V^* - A_r \Vert = \Vert \Sigma - U^* A_r V \Vert$,
where we used that for spectral norm (and for the Frobenius as well) we have
for any unitary $Q$ and any matrix $X$. This is called unitary equivalence of the spectral norm (and Frobenius as well).
What is left is to note that the best rank-$r$ approximation of the diagonal matrix is its subdiagonal.
Computing the SVD is tricky; we will talk about the algorithms hidden in this computation later. But we already can do this in Python!
In [4]:
#Computing matrix rank
import numpy as np
n = 50
a = np.ones((n, n))
print 'Rank of the matrix:', np.linalg.matrix_rank(a)
b = a + 1e-5 * np.random.randn(n, n)
print 'Rank of the matrix:', np.linalg.matrix_rank(b)
In [15]:
u, s, v = np.linalg.svd(b)
print s[1]
r = 1
u1 = u[:, :r]
s1 = s[:r]
v1 = v[:r, :]
a1 = u1.dot(np.diag(s1).dot(v1))
print np.linalg.norm(a - a1, 'fro')/np.linalg.norm(a)
We can use SVD to compute approximations of function-related matrices, i.e. the matrices of the form $$a_{ij} = f(x_i, y_j),$$ where $f$ is a certain function, and $x_i, \quad i = 1, \ldots, n$ and $y_j, \quad j = 1, \ldots, m$ are some one-dimensional grids.
In [32]:
%matplotlib inline
import prettyplotlib as ppl
n = 1000
a = [[1.0/(i+j+1) for i in xrange(n)] for j in xrange(n)] #Hilbert matrix
a = np.array(a)
u, s, v = np.linalg.svd(a)
fig, ax = ppl.subplots(1, 1)
ax.semilogy(s[1:30])
#We have very good low-rank approximation of it!
Out[32]:
Now we can do something more interesting, like function approximation
In [23]:
import numpy as np
n = 128
t = np.linspace(0, 5, n)
x, y = np.meshgrid(t, t)
f = 1.0 / (x + y ** 2 + 0.5)
u, s, v = np.linalg.svd(f, full_matrices=False)
r = 10
u = u[:, :r]
s = s[:r]
v = v[:r, :] #Mind the transpose here!
fappr = u.dot(np.diag(s).dot(v))
er = np.linalg.norm(fappr - f, 'fro') / np.linalg.norm(f, 'fro')
And do some 3D plotting
In [27]:
%matplotlib inline
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure(figsize=(12, 5))
ax = fig.add_subplot(131, projection='3d')
ax.plot_surface(x, y, f)
ax.set_title('Original function')
ax = fig.add_subplot(132, projection='3d')
ax.plot_surface(x, y, fappr)
ax.set_title('Approximated function')
ax = fig.add_subplot(133, projection='3d')
ax.plot_surface(x, y, fappr - f)
ax.set_title('Approximation error with rank=%d, err=%3.1e' % (r, er))
fig.subplots_adjust()
fig.tight_layout()
Consider a linear factor model,
$$y = Ax, $$where $y$ is a vector of length $n$, and $x$ is a vector of length $r$.
The data is organizes as samples: we observe vectors
$$y_1, \ldots, y_T,$$
then the factor model can be written as
$$
Y = AX,
$$
where $Y$ is $n \times T$, $A$ is $n \times r$ and $X$ is $r \times T$. This is exactly a rank-$r$ model: it tells us that the vector lie in a small subspace!
We also can use SVD to recover this subspace (but not the independent components). Principal component analysis can be done by SVD!
In [63]:
from IPython.core.display import HTML
def css_styling():
styles = open("./styles/custom.css", "r").read()
return HTML(styles)
css_styling()
Out[63]: