Linear Algebra Basics

Notes from chapter 1 of Numerical Algorithm by Justin Solomon

Terminology

Sets

Natural numbers $$\mathbb{N}=\{n\in\mathbb{Z}: n>0\}$$

Euclidean product $$A\times B=\{(a, b): a\in A\ and\ b \in B\}$$

Power of sets $$A^n=\underbrace{A\times A \times\ \ldots \times A}_{n\text{ times}}$$

Vector space

A vector space over $\mathbb{R}$ is a set $\mathcal{V}$ closed under addition and scalar multiplication with:

  • Additive commutativity and associativity
  • Distributivity
  • Additive identity
  • Additive inverse
  • Multiplicative identity
  • Multiplicative compatibility

A member $\vec{v}\in\mathcal{V}$ is a vector

Span

All the vectors we can reach with addition and scalar multiplication.

The span of a set $S\subseteq\mathcal{V}$ is: $$spanS=\{a_1\vec{v_1}+\ldots+a_k\vec{v_k}: \vec{v_i}\in S\ and\ a_i\in\mathbb{R}\ for\ all\ i\}$$

  • Span $S$ is a subspace of $\mathcal{V}$

Linear indepdence

A set $S\subseteq\mathcal{V}$ is linearly dependent if:

  • One of the elements in $S$ can be written as a linear combination of the other elements
  • S contains zero
  • A linear combination of elements yields 0
  • There exists a $\vec{v}\in S$ that $S=spanS\backslash\{\vec{v}\}$

Basis and dimension

  • Basis: linearly independent set $S\subset\mathcal{V}$ such that $spanS=\mathcal{V}$
  • Dimension: $|S|$
  • Standard basis: sets of vectors of $\vec{e_k}$ where $\vec{e_k}$ has all zeros except for a single one in the k-th position

$\mathbb{R}^n$

  • Vector space structure
  • Dot product
    • A metric
    • Geometry
    • $\lVert \mathbf{\vec{a}} \rVert_2 = \sqrt{\vec{a}.\vec{a}}$
    • Trigonometric identity $\vec{a}.\vec{b}=\lVert \mathbf{\vec{a}} \rVert_2\lVert \mathbf{\vec{b}} \rVert_2 \cos{\theta}$
    • Cauchy-Schwarz inequality ensures $|\vec{a}.\vec{b}|\leq \lVert \mathbf{\vec{a}} \rVert_2\lVert \mathbf{\vec{b}} \rVert_2$
    • Two vectors orthogonal when $\vec{a}.\vec{b}=0$

Matrix product

  • Product of matrix $M\in\mathbb{R}^{m\times n}$ and another matrix in $\mathbb{R}^{n\times p}$ $$M\begin{pmatrix} \mid & \mid & & \mid \\ \vec{c_1} & \vec{c_2} & \ldots & \vec{c_p} \\ \mid & \mid & & \mid \end{pmatrix}=\begin{pmatrix} \mid & \mid & & \mid \\ M\vec{c_1} & M\vec{c_2} & \ldots & M\vec{c_p} \\ \mid & \mid & & \mid \end{pmatrix}$$

Transpose

  • $\vec{a}.\vec{b}=\vec{a}^T\vec{b}$ where $\vec{a}^T$ is a row vector
  • $(AB)^T=B^TA^T$ the dimensions are not going to work if the order is not flipped
  • Scalars thought of as matrices have the property $c^T=c$

Residual norm example

$\lVert\vec{b}-A\vec{x}\lVert_2^2=(\vec{b}-A\vec{x}).(\vec{b}-A\vec{x})=(\vec{b}-A\vec{x})^T(\vec{b}-A\vec{x})=(\vec{b}^T-\vec{x}^TA^T)(\vec{b}-A\vec{x}) \\ = \vec{b}^T\vec{b}-\vec{b}^TA\vec{x}-\vec{x}^TA^T\vec{b}+\vec{x}^TA^TA\vec{x}$

Since $\vec{x}^TA^T\vec{b}$ is a scalar we can transpose it: $\vec{x}^TA^T\vec{b}=(\vec{x}^TA^T\vec{b})^T=\vec{b}^TA\vec{x}$

Which results in: $\lVert\vec{b}-A\vec{x}\lVert_2^2=\lVert A\vec{x}\lVert_2^2-2\vec{b}^TA\vec{x}+\lVert\vec{b}\lVert_2^2$

The same results can be derived using dot product identities:

$\lVert\vec{b}-A\vec{x}\lVert_2^2=(\vec{b}-A\vec{x}).(\vec{b}-A\vec{x}) \\ =\vec{b}.(\vec{b}-A\vec{x})-A\vec{x}.(\vec{b}-A\vec{x}) \\ =\vec{b}.\vec{b}-\vec{b}.A\vec{x}-A\vec{x}.\vec{b}+A\vec{x}.A\vec{x} \\ =\lVert A\vec{x}\lVert_2^2+\lVert \vec{b}\lVert_2^2-2A\vec{x}.b \\ =\lVert A\vec{x}\lVert_2^2+\lVert \vec{b}\lVert_2^2-2\vec{b}^TA\vec{x} $

$A\vec{x}=\vec{b}$

  • Write $\vec{b}$ as a linear combination of the column $A$
  • Column space of $A$
    • Space of right-hand sides $\vec{b}$ for which the system $A\vec{x}=\vec{b}$ has a solution
    • Span of the columns of $A$
    • $col\ A\equiv\{A\vec{x}: \vec{x}\in\mathbb{R}^n\}$
    • Rank of $A$ is the dimension of $col\ A$
    • $A\vec{x}=\vec{b}$ is solvable when $\vec{b}\in col\ A$

Optimization

  • $f(\vec{x})\equiv\vec{a}.\vec{x}+\vec{c} \implies \nabla f(\vec{x})=\vec{a}$
  • $f(\vec{x})\equiv\vec{x}^TA\vec{x} \implies \nabla f(\vec{x})=(A+A^T)\vec{x}$
  • $f(\vec{x})\equiv\vec{x}^TABC...Z\vec{x} \implies \nabla f(\vec{x})=(ABC...Z+(ABC...Z)^T)\vec{x}$
  • $A^TA=AA^T$ for a square matrix

1.5

$Q\vec{x}=\vec{r}$

$E=f(\vec{x})=(A\vec{x}-\vec{a})^T(A\vec{x}-\vec{a})+(B\vec{x}-\vec{b})^T(B\vec{x}-\vec{b}) \\ =(\vec{x}^TA^T-\vec{a}^T)(A\vec{x}-\vec{a}) + (\vec{x}^TB^T-\vec{b}^T)(B\vec{x}-\vec{b}) \\ =\vec{x}^TA^TA\vec{x}-\vec{x}^TA^T\vec{a}-\vec{a}^TA\vec{x}+\vec{a}^T\vec{a}+\vec{x}^TB^TB\vec{x}-\vec{x}^TB^T\vec{b}-\vec{b}^TB\vec{x}+\vec{b}^T\vec{b} \\ =\vec{x}^TA^TA\vec{x}-2\vec{a}^TA\vec{x}+\vec{a}^T\vec{a}+\vec{x}^TB^TB\vec{x}-2\vec{b}^TB\vec{x}+\vec{b}^T\vec{b} $

$\nabla\ f(\vec{x})=(A^TA+(A^TA)^T)\vec{x}-2\vec{a}^TA+(B^TB+(B^TB)^T)\vec{x}-2\vec{b}^TB=0 \\ 2A^TA\vec{x}+2B^TB\vec{x}-2\vec{a}^TA-2\vec{b}^TB=0 \\ 2(A^TA+B^TB)\vec{x}=2(\vec{a}^TA+\vec{b}^TB) \\ (A^TA+B^TB)\vec{x}=(\vec{a}^TA+\vec{b}^TB) $

$Q=A^TA+B^TB \\ \vec{r}=\vec{a}^TA+\vec{b}^TB $

N.B. $A^TA\neq AA^T$


In [2]:
import numpy as np

In [14]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# np.dot(a, a.T) == np.matmul(a, a.T)
np.dot(a, a.T) == np.dot(a.T, a)


Out[14]:
array([[False, False, False],
       [False, False, False],
       [False, False, False]], dtype=bool)

In [15]:
np.dot(a, a.T)


Out[15]:
array([[ 14,  32,  50],
       [ 32,  77, 122],
       [ 50, 122, 194]])

In [16]:
np.dot(a.T, a)


Out[16]:
array([[ 66,  78,  90],
       [ 78,  93, 108],
       [ 90, 108, 126]])

In [18]:
np.matmul(a.T, a)


Out[18]:
array([[ 66,  78,  90],
       [ 78,  93, 108],
       [ 90, 108, 126]])

In [19]:
np.matmul(a, a.T)


Out[19]:
array([[ 14,  32,  50],
       [ 32,  77, 122],
       [ 50, 122, 194]])

In [20]:
b = np.array([[1, 2], [3, 4]])
np.dot(b, b.T)


Out[20]:
array([[ 5, 11],
       [11, 25]])

In [21]:
np.dot(b.T, b)


Out[21]:
array([[10, 14],
       [14, 20]])

1.7

$A^T=\begin{pmatrix} \mid & \mid & & \mid \\ \vec{r}_1 & \vec{r}_2 & \ldots & \vec{r}_m \\ \mid & \mid & & \mid\end{pmatrix}= \begin{pmatrix} \_\_\_ & \vec{c}^T_1 & \_\_\_ \\ \_\_\_ & \vec{c}^T_2 & \_\_\_ \\ & \ldots & \\ \_\_\_ & \vec{c}^T_n & \_\_\_ \\ \end{pmatrix}$

$A^TA=\begin{pmatrix} \mid & \mid & & \mid \\ A^T\vec{c}_1 & A^T\vec{c}_2 & \ldots & A^T\vec{c}_n \\ \mid & \mid & & \mid\end{pmatrix}=\begin{pmatrix} \vec{c}^T_1\vec{c}_1 & \vec{c}^T_1\vec{c}_2 & \ldots & \vec{c}^T_1\vec{c}_n \\ \vec{c}^T_2\vec{c}_1 & \vec{c}^T_2\vec{c}_2 & \ldots & \vec{c}^T_2\vec{c}_n \\ \vdots & \vdots & \ddots & \vdots \\ \vec{c}^T_n\vec{c}_1 & \vec{c}^T_n\vec{c}_2 & \ldots & \vec{c}^T_n\vec{c}_n \end{pmatrix}$

$AA^T=\begin{pmatrix} \mid & \mid & & \mid \\ A\vec{r}_1 & A\vec{r}_2 & \ldots & A\vec{r}_m \\ \mid & \mid & & \mid\end{pmatrix}=\begin{pmatrix} \vec{r}^T_1\vec{r}_1 & \vec{r}^T_1\vec{r}_2 & \ldots & \vec{r}^T_1\vec{r}_m \\ \vec{r}^T_2\vec{r}_1 & \vec{r}^T_2\vec{r}_2 & \ldots & \vec{r}^T_2\vec{r}_m \\ \vdots & \vdots & \ddots & \vdots \\ \vec{r}^T_m\vec{r}_1 & \vec{r}^T_m\vec{r}_2 & \ldots & \vec{r}^T_m\vec{r}_m \end{pmatrix}$

1.9

$\Lambda(x,\ \lambda)=\lVert A\vec{x}\lVert_2^2-\lambda(\lVert B\vec{x}\lVert _2^2-1) \\ =\vec{x}^TA^TA\vec{x}+\lambda-\lambda\vec{x}^TB^TB\vec{x} $

$\frac{\partial \Lambda}{\partial \vec{x}}=2A^TA\vec{x}-2\lambda B^TB\vec{x}=A^TA\vec{x}-\lambda B^TB\vec{x}=0 \\ \frac{\partial \Lambda}{\partial \vec{\lambda}}=1-\vec{x}^TB^TB\vec{x}=0 $

$\lVert A\vec{x}\lVert_2^2=\vec{x}^TA^TA\vec{x}=\vec{x}^T\lambda B^TB\vec{x} \\ \lVert A\vec{x}\lVert_2 = \sqrt{\lambda\vec{x}^T B^TB\vec{x}}$

1.10

$\Lambda(\vec{x}, \lambda)=\vec{a}.\vec{x}-\lambda(\lVert \vec{x}\lVert_2-1)$

$\lVert \vec{x}\lVert_2=(\vec{x}.\vec{x})^\frac{1}{2}$

$\frac{\partial \lVert \vec{x}\lVert_2}{\partial \vec{x}}=\frac{1}{2}(\vec{x}.\vec{x})^{1/2}(2\vec{x})=\frac{\vec{x}}{\lVert \vec{x}\lVert_2}$

$\frac{\partial \Lambda}{\partial \vec{x}}=\vec{a}-\frac{\lambda\vec{x}}{\lVert \vec{x}\lVert_2} \rightarrow \vec{x}=\frac{\vec{a}}{\lambda \lVert \vec{x}\lVert_2}$

$\frac{\partial \Lambda}{\partial \lambda}=\lVert x\lVert_2-1=0 \rightarrow \lVert x\lVert_2=1$

$\vec{x}=\frac{\vec{a}}{\lambda}$

1.15

(a)

$X=B(B^TB)^{-1}B^T$

$X^2=(B(B^TB)^{-1}B^T)(B(B^TB)^{-1}B^T) \\ =B(B^TB)^{-1}(B^TB)(B^TB)^{-1}B^T$

$X^2=X$

(b)

Aside

$I^N=I$


In [45]:
from functools import partial, reduce

In [33]:
eye = lambda x: np.eye(x)

In [42]:
for i in range(1, 5):
    assert np.all(np.dot(eye(i), eye(i)) == eye(i))

In [57]:
all([
    np.all(eye(i) == reduce(np.dot,
                            [eye(i) for _ in range(n)]))
    for i in range(1, 5)
    for n in range(2, 10)
])


Out[57]:
True

In [75]:
A = np.random.randint(1, 100, (3, 3))
assert np.all(np.dot(A, eye(3)) == A)
assert np.all(np.dot(eye(3), A) == A)

$AA=A$

$X=(I-A)$

$X^2=(I-A)(I-A) \\ =I - IA - AI + A^2 \\ =I - 2A + AA $

$X^2=I-2A+A \\ =I-A=X $

(c)

$\left(\frac{1}{2}I_{n\times n}-A\right)\left(\alpha I_{n\times n}+\beta A\right)=I \\ \frac{\alpha}{2}I + \frac{\beta}{2}A-\alpha A-\beta AA=I $

$(\frac{\beta}{2}-\alpha-\beta)A + (\frac{\alpha}{2}-1)I=0$

$\frac{\alpha}{2}-1=0 \rightarrow \alpha=2$

$\frac{\beta}{2}-2-\beta=0 \rightarrow \beta=-4$

$X^{-1}=2I -4A$

(d)

$$ \begin{aligned} \lambda\vec{x} &= A\vec{x} \\ &= AA\vec{x} \\ &= A(A\vec{x}) \\ &= A(\lambda \vec{x}) \\ &= \lambda (A\vec{x}) \\ &= \lambda (\lambda \vec{x}) \\ &= \lambda^2\vec{x} \end{aligned} $$

$\lambda^2\vec{x}-\lambda\vec{x} = \lambda(\lambda-1)\vec{x} = 0$