Linear Algebra Basics

Notes from chapter 1 of Numerical Algorithm by Justin Solomon

Terminology

Identity

Sets

Natural numbers $$\mathbb{N}=\{n\in\mathbb{Z}: n>0\}$$

Euclidean product $$A\times B=\{(a, b): a\in A\ and\ b \in B\}$$

Power of sets $$A^n=\underbrace{A\times A \times\ \ldots \times A}_{n\text{ times}}$$

Vector space

A vector space over $\mathbb{R}$ is a set $\mathcal{V}$ closed under addition and scalar multiplication with:

Additive commutativity and associativity
Distributivity
Additive identity
Additive inverse
Multiplicative identity
Multiplicative compatibility

A member $\vec{v}\in\mathcal{V}$ is a vector

Span

All the vectors we can reach with addition and scalar multiplication.

The span of a set $S\subseteq\mathcal{V}$ is: $$spanS=\{a_1\vec{v_1}+\ldots+a_k\vec{v_k}: \vec{v_i}\in S\ and\ a_i\in\mathbb{R}\ for\ all\ i\}$$

Span $S$ is a subspace of $\mathcal{V}$

Linear indepdence

A set $S\subseteq\mathcal{V}$ is linearly dependent if:

One of the elements in $S$ can be written as a linear combination of the other elements
S contains zero
A linear combination of elements yields 0
There exists a $\vec{v}\in S$ that $S=spanS\backslash\{\vec{v}\}$

Basis and dimension

Basis: linearly independent set $S\subset\mathcal{V}$ such that $spanS=\mathcal{V}$
Dimension: $|S|$
Standard basis: sets of vectors of $\vec{e_k}$ where $\vec{e_k}$ has all zeros except for a single one in the k-th position

$\mathbb{R}^n$

Vector space structure
Dot product
- A metric
- Geometry
- $\lVert \mathbf{\vec{a}} \rVert_2 = \sqrt{\vec{a}.\vec{a}}$
- Trigonometric identity $\vec{a}.\vec{b}=\lVert \mathbf{\vec{a}} \rVert_2\lVert \mathbf{\vec{b}} \rVert_2 \cos{\theta}$
- Cauchy-Schwarz inequality ensures $|\vec{a}.\vec{b}|\leq \lVert \mathbf{\vec{a}} \rVert_2\lVert \mathbf{\vec{b}} \rVert_2$
- Two vectors orthogonal when $\vec{a}.\vec{b}=0$

Matrix product

Product of matrix $M\in\mathbb{R}^{m\times n}$ and another matrix in $\mathbb{R}^{n\times p}$ $$M\begin{pmatrix} \mid & \mid & & \mid \\ \vec{c_1} & \vec{c_2} & \ldots & \vec{c_p} \\ \mid & \mid & & \mid \end{pmatrix}=\begin{pmatrix} \mid & \mid & & \mid \\ M\vec{c_1} & M\vec{c_2} & \ldots & M\vec{c_p} \\ \mid & \mid & & \mid \end{pmatrix}$$

Transpose

$\vec{a}.\vec{b}=\vec{a}^T\vec{b}$ where $\vec{a}^T$ is a row vector
$(AB)^T=B^TA^T$ the dimensions are not going to work if the order is not flipped
Scalars thought of as matrices have the property $c^T=c$

Residual norm example

$\lVert\vec{b}-A\vec{x}\lVert_2^2=(\vec{b}-A\vec{x}).(\vec{b}-A\vec{x})=(\vec{b}-A\vec{x})^T(\vec{b}-A\vec{x})=(\vec{b}^T-\vec{x}^TA^T)(\vec{b}-A\vec{x}) \\ = \vec{b}^T\vec{b}-\vec{b}^TA\vec{x}-\vec{x}^TA^T\vec{b}+\vec{x}^TA^TA\vec{x}$

Since $\vec{x}^TA^T\vec{b}$ is a scalar we can transpose it: $\vec{x}^TA^T\vec{b}=(\vec{x}^TA^T\vec{b})^T=\vec{b}^TA\vec{x}$

Which results in: $\lVert\vec{b}-A\vec{x}\lVert_2^2=\lVert A\vec{x}\lVert_2^2-2\vec{b}^TA\vec{x}+\lVert\vec{b}\lVert_2^2$

The same results can be derived using dot product identities:

$\lVert\vec{b}-A\vec{x}\lVert_2^2=(\vec{b}-A\vec{x}).(\vec{b}-A\vec{x}) \\ =\vec{b}.(\vec{b}-A\vec{x})-A\vec{x}.(\vec{b}-A\vec{x}) \\ =\vec{b}.\vec{b}-\vec{b}.A\vec{x}-A\vec{x}.\vec{b}+A\vec{x}.A\vec{x} \\ =\lVert A\vec{x}\lVert_2^2+\lVert \vec{b}\lVert_2^2-2A\vec{x}.b \\ =\lVert A\vec{x}\lVert_2^2+\lVert \vec{b}\lVert_2^2-2\vec{b}^TA\vec{x} $

$A\vec{x}=\vec{b}$

Write $\vec{b}$ as a linear combination of the column $A$
Column space of $A$
- Space of right-hand sides $\vec{b}$ for which the system $A\vec{x}=\vec{b}$ has a solution
- Span of the columns of $A$
- $col\ A\equiv\{A\vec{x}: \vec{x}\in\mathbb{R}^n\}$
- Rank of $A$ is the dimension of $col\ A$
- $A\vec{x}=\vec{b}$ is solvable when $\vec{b}\in col\ A$

Optimization

$f(\vec{x})\equiv\vec{a}.\vec{x}+\vec{c} \implies \nabla f(\vec{x})=\vec{a}$
$f(\vec{x})\equiv\vec{x}^TA\vec{x} \implies \nabla f(\vec{x})=(A+A^T)\vec{x}$
$f(\vec{x})\equiv\vec{x}^TABC...Z\vec{x} \implies \nabla f(\vec{x})=(ABC...Z+(ABC...Z)^T)\vec{x}$
$A^TA=AA^T$ for a square matrix

1.5

$Q\vec{x}=\vec{r}$

$E=f(\vec{x})=(A\vec{x}-\vec{a})^T(A\vec{x}-\vec{a})+(B\vec{x}-\vec{b})^T(B\vec{x}-\vec{b}) \\ =(\vec{x}^TA^T-\vec{a}^T)(A\vec{x}-\vec{a}) + (\vec{x}^TB^T-\vec{b}^T)(B\vec{x}-\vec{b}) \\ =\vec{x}^TA^TA\vec{x}-\vec{x}^TA^T\vec{a}-\vec{a}^TA\vec{x}+\vec{a}^T\vec{a}+\vec{x}^TB^TB\vec{x}-\vec{x}^TB^T\vec{b}-\vec{b}^TB\vec{x}+\vec{b}^T\vec{b} \\ =\vec{x}^TA^TA\vec{x}-2\vec{a}^TA\vec{x}+\vec{a}^T\vec{a}+\vec{x}^TB^TB\vec{x}-2\vec{b}^TB\vec{x}+\vec{b}^T\vec{b} $

$\nabla\ f(\vec{x})=(A^TA+(A^TA)^T)\vec{x}-2\vec{a}^TA+(B^TB+(B^TB)^T)\vec{x}-2\vec{b}^TB=0 \\ 2A^TA\vec{x}+2B^TB\vec{x}-2\vec{a}^TA-2\vec{b}^TB=0 \\ 2(A^TA+B^TB)\vec{x}=2(\vec{a}^TA+\vec{b}^TB) \\ (A^TA+B^TB)\vec{x}=(\vec{a}^TA+\vec{b}^TB) $

$Q=A^TA+B^TB \\ \vec{r}=\vec{a}^TA+\vec{b}^TB $

N.B. $A^TA\neq AA^T$



In [2]:

    
import numpy as np



In [14]:

    
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
# np.dot(a, a.T) == np.matmul(a, a.T)
np.dot(a, a.T) == np.dot(a.T, a)









    Out[14]:





array([[False, False, False],
       [False, False, False],
       [False, False, False]], dtype=bool)



In [15]:

    
np.dot(a, a.T)









    Out[15]:





array([[ 14,  32,  50],
       [ 32,  77, 122],
       [ 50, 122, 194]])



In [16]:

    
np.dot(a.T, a)









    Out[16]:





array([[ 66,  78,  90],
       [ 78,  93, 108],
       [ 90, 108, 126]])



In [18]:

    
np.matmul(a.T, a)









    Out[18]:





array([[ 66,  78,  90],
       [ 78,  93, 108],
       [ 90, 108, 126]])



In [19]:

    
np.matmul(a, a.T)









    Out[19]:





array([[ 14,  32,  50],
       [ 32,  77, 122],
       [ 50, 122, 194]])



In [20]:

    
b = np.array([[1, 2], [3, 4]])
np.dot(b, b.T)









    Out[20]:





array([[ 5, 11],
       [11, 25]])



In [21]:

    
np.dot(b.T, b)









    Out[21]:





array([[10, 14],
       [14, 20]])

1.7

$A^T=\begin{pmatrix} \mid & \mid & & \mid \\ \vec{r}_1 & \vec{r}_2 & \ldots & \vec{r}_m \\ \mid & \mid & & \mid\end{pmatrix}= \begin{pmatrix} \_\_\_ & \vec{c}^T_1 & \_\_\_ \\ \_\_\_ & \vec{c}^T_2 & \_\_\_ \\ & \ldots & \\ \_\_\_ & \vec{c}^T_n & \_\_\_ \\ \end{pmatrix}$

$A^TA=\begin{pmatrix} \mid & \mid & & \mid \\ A^T\vec{c}_1 & A^T\vec{c}_2 & \ldots & A^T\vec{c}_n \\ \mid & \mid & & \mid\end{pmatrix}=\begin{pmatrix} \vec{c}^T_1\vec{c}_1 & \vec{c}^T_1\vec{c}_2 & \ldots & \vec{c}^T_1\vec{c}_n \\ \vec{c}^T_2\vec{c}_1 & \vec{c}^T_2\vec{c}_2 & \ldots & \vec{c}^T_2\vec{c}_n \\ \vdots & \vdots & \ddots & \vdots \\ \vec{c}^T_n\vec{c}_1 & \vec{c}^T_n\vec{c}_2 & \ldots & \vec{c}^T_n\vec{c}_n \end{pmatrix}$

$AA^T=\begin{pmatrix} \mid & \mid & & \mid \\ A\vec{r}_1 & A\vec{r}_2 & \ldots & A\vec{r}_m \\ \mid & \mid & & \mid\end{pmatrix}=\begin{pmatrix} \vec{r}^T_1\vec{r}_1 & \vec{r}^T_1\vec{r}_2 & \ldots & \vec{r}^T_1\vec{r}_m \\ \vec{r}^T_2\vec{r}_1 & \vec{r}^T_2\vec{r}_2 & \ldots & \vec{r}^T_2\vec{r}_m \\ \vdots & \vdots & \ddots & \vdots \\ \vec{r}^T_m\vec{r}_1 & \vec{r}^T_m\vec{r}_2 & \ldots & \vec{r}^T_m\vec{r}_m \end{pmatrix}$

1.9

$\Lambda(x,\ \lambda)=\lVert A\vec{x}\lVert_2^2-\lambda(\lVert B\vec{x}\lVert _2^2-1) \\ =\vec{x}^TA^TA\vec{x}+\lambda-\lambda\vec{x}^TB^TB\vec{x} $

$\frac{\partial \Lambda}{\partial \vec{x}}=2A^TA\vec{x}-2\lambda B^TB\vec{x}=A^TA\vec{x}-\lambda B^TB\vec{x}=0 \\ \frac{\partial \Lambda}{\partial \vec{\lambda}}=1-\vec{x}^TB^TB\vec{x}=0 $

$\lVert A\vec{x}\lVert_2^2=\vec{x}^TA^TA\vec{x}=\vec{x}^T\lambda B^TB\vec{x} \\ \lVert A\vec{x}\lVert_2 = \sqrt{\lambda\vec{x}^T B^TB\vec{x}}$

1.10

$\Lambda(\vec{x}, \lambda)=\vec{a}.\vec{x}-\lambda(\lVert \vec{x}\lVert_2-1)$

$\lVert \vec{x}\lVert_2=(\vec{x}.\vec{x})^\frac{1}{2}$

$\frac{\partial \lVert \vec{x}\lVert_2}{\partial \vec{x}}=\frac{1}{2}(\vec{x}.\vec{x})^{1/2}(2\vec{x})=\frac{\vec{x}}{\lVert \vec{x}\lVert_2}$

$\frac{\partial \Lambda}{\partial \vec{x}}=\vec{a}-\frac{\lambda\vec{x}}{\lVert \vec{x}\lVert_2} \rightarrow \vec{x}=\frac{\vec{a}}{\lambda \lVert \vec{x}\lVert_2}$

$\frac{\partial \Lambda}{\partial \lambda}=\lVert x\lVert_2-1=0 \rightarrow \lVert x\lVert_2=1$

$\vec{x}=\frac{\vec{a}}{\lambda}$

1.15

(a)

$X=B(B^TB)^{-1}B^T$

$X^2=(B(B^TB)^{-1}B^T)(B(B^TB)^{-1}B^T) \\ =B(B^TB)^{-1}(B^TB)(B^TB)^{-1}B^T$

$X^2=X$

(b)

Aside

$I^N=I$



In [45]:

    
from functools import partial, reduce



In [33]:

    
eye = lambda x: np.eye(x)



In [42]:

    
for i in range(1, 5):
    assert np.all(np.dot(eye(i), eye(i)) == eye(i))



In [57]:

    
all([
    np.all(eye(i) == reduce(np.dot,
                            [eye(i) for _ in range(n)]))
    for i in range(1, 5)
    for n in range(2, 10)
])









    Out[57]:





True



In [75]:

    
A = np.random.randint(1, 100, (3, 3))
assert np.all(np.dot(A, eye(3)) == A)
assert np.all(np.dot(eye(3), A) == A)

$AA=A$

$X=(I-A)$

$X^2=(I-A)(I-A) \\ =I - IA - AI + A^2 \\ =I - 2A + AA $

$X^2=I-2A+A \\ =I-A=X $

(c)

$\left(\frac{1}{2}I_{n\times n}-A\right)\left(\alpha I_{n\times n}+\beta A\right)=I \\ \frac{\alpha}{2}I + \frac{\beta}{2}A-\alpha A-\beta AA=I $

$(\frac{\beta}{2}-\alpha-\beta)A + (\frac{\alpha}{2}-1)I=0$

$\frac{\alpha}{2}-1=0 \rightarrow \alpha=2$

$\frac{\beta}{2}-2-\beta=0 \rightarrow \beta=-4$

$X^{-1}=2I -4A$

(d)

$$ \begin{aligned} \lambda\vec{x} &= A\vec{x} \\ &= AA\vec{x} \\ &= A(A\vec{x}) \\ &= A(\lambda \vec{x}) \\ &= \lambda (A\vec{x}) \\ &= \lambda (\lambda \vec{x}) \\ &= \lambda^2\vec{x} \end{aligned} $$

$\lambda^2\vec{x}-\lambda\vec{x} = \lambda(\lambda-1)\vec{x} = 0$