3M1 Examples Paper crib (linear algebra)

Garth N. Wells (gnw20@cam.ac.uk)

(C) 2016-2020


In [1]:
import numpy as np
from scipy import linalg

Question 1

The usual rules for real matrices extend without modification to complex valued matrices.

a. $\det(\boldsymbol{A}) = (4 + 4i)(4 + i) - (2-i)(-3 + 2i) = 16 + 13 i$

b. $\det(\boldsymbol{A}^{H}) = (4 - 4i)(4 - i) - (-3 - 2i)(2 + i) = 16 - 13i$

c. Inverse:

$$ \boldsymbol{A}^{-1} = \frac{1}{16 + 13 i } \begin{bmatrix} 4 + i & -2 + i \\ 3 - 2i & 4 + 4i \end{bmatrix} $$

Extra: SymPy can be used for symbolic computations:


In [2]:
import sympy
sympy.interactive.printing.init_printing(use_unicode=False, wrap_line=False)
A = sympy.Matrix([[4 + sympy.I*4 , 2- sympy.I], [-3 + sympy.I*2 , 4 + sympy.I]])
A


Out[2]:
$\displaystyle \left[\begin{matrix}4 + 4 i & 2 - i\\-3 + 2 i & 4 + i\end{matrix}\right]$

In [3]:
det = A.det()
sympy.simplify(det)


Out[3]:
$\displaystyle 16 + 13 i$

In [4]:
detH = A.H.det()
sympy.simplify(detH)


Out[4]:
$\displaystyle 16 - 13 i$

In [5]:
Ainv = A.inv()
sympy.simplify(Ainv)


Out[5]:
$\displaystyle \left[\begin{matrix}\frac{77}{425} - \frac{36 i}{425} & - \frac{19}{425} + \frac{42 i}{425}\\\frac{22}{425} - \frac{71 i}{425} & \frac{116}{425} + \frac{12 i}{425}\end{matrix}\right]$

In [6]:
sympy.simplify(A.multiply(A.inv()))


Out[6]:
$\displaystyle \left[\begin{matrix}1 & 0\\0 & 1\end{matrix}\right]$

Question 2

The diagonal entries must be real since $A_{ii} = \bar{A}_{ii}$.

Question 3

\begin{align} \det \left(\boldsymbol{Q} - \lambda \boldsymbol{I} \right) &= \det\left( \begin{bmatrix} \cos \theta - \lambda & - \sin \theta \\ \sin \theta & \cos \theta - \lambda \end{bmatrix} \right) \\ &= (\cos \theta - \lambda)^{2} + \sin^{2}\theta \\ &= \lambda^{2} - (2 \cos \theta) \lambda + 1 \\ &= 0 \end{align}

Computing roots,

\begin{align} \lambda &= \cos \theta \pm \sqrt{\cos^{2}\theta - 1} \\ &= \cos \theta \pm i \sin\theta \end{align}

Question 4

a. If $\boldsymbol{x}$ is an eigenvector and $\lambda$ is an eigenvalue,

$$ \boldsymbol{M} \boldsymbol{x} = \lambda \boldsymbol{x} $$

Premultiplying by $\boldsymbol{x}^{H}$,

$$ \boldsymbol{x}^{H} \boldsymbol{M} \boldsymbol{x} = \lambda \boldsymbol{x}^{H} \boldsymbol{x} $$

Taking the complex conjugate of both sides and noting that $\boldsymbol{x}^{H} \boldsymbol{x}$ is real,

$$ \left(\boldsymbol{x}^{H} \boldsymbol{M} \boldsymbol{x}\right)^{H} = \boldsymbol{x}^{H} \boldsymbol{M}^{H} \boldsymbol{x} = \bar{\lambda} \boldsymbol{x}^{H} \boldsymbol{x} $$

Since $\boldsymbol{M}$ is Hermitian ($\boldsymbol{M} =\boldsymbol{M}^{H}$), comparing the above two equations they can hold ony if $\lambda = \bar{\lambda}$, which is true only if $\lambda$ is real.

b. Consider two eigenpairs $(\lambda_{1}, \boldsymbol{x}_{1})$ and $( \lambda_{2}, \boldsymbol{x}_{2})$, where $\lambda_{1} \ne \lambda_{2}$. We have:

\begin{align} \boldsymbol{M} \boldsymbol{x}_{1} &= \lambda_{1} \boldsymbol{x}_{1} \\ \boldsymbol{M} \boldsymbol{x}_{2} &= \lambda_{2} \boldsymbol{x}_{2} \end{align}

Multiplying by $\boldsymbol{x}_{2}$ and $\boldsymbol{x}_{1}$, respectively,

\begin{align} \boldsymbol{x}_{2}^{H} \boldsymbol{M} \boldsymbol{x}_{1} &= \lambda_{1} \boldsymbol{x}_{2}^{H} \boldsymbol{x}_{1} \\ \boldsymbol{x}_{1}^{H} \boldsymbol{M} \boldsymbol{x}_{2} &= \lambda_{2} \boldsymbol{x}_{1}^{H} \boldsymbol{x}_{2} \end{align}

Taking the complex conjugate transpose of the first equation and exploiting that $\boldsymbol{M} = \boldsymbol{M}^{H}$,

\begin{equation} \left(\boldsymbol{x}_{2}^{H} \boldsymbol{M} \boldsymbol{x}_{1}\right)^{H} = \boldsymbol{x}_{1}^{H} \boldsymbol{M} \boldsymbol{x}_{2} = \lambda_{1} \boldsymbol{x}_{1}^{H} \boldsymbol{x}_{2} \end{equation}

Comparing this to $\boldsymbol{x}_{1}^{H}\boldsymbol{M} \boldsymbol{x}_{2} = \lambda_{2} \boldsymbol{x}_{1}^{H}\boldsymbol{x}_{2}$, since $\lambda_{1} \ne \lambda_{2}$ both equations can hold only if $\boldsymbol{x}_{1}^{H} \boldsymbol{x}_{2} = 0$, i.e. the eigenvectors are orthogonal.

Question 5

To show that $\boldsymbol{A}^{H} \boldsymbol{A}$ is positive semi-definite: \begin{equation} \boldsymbol{x}^{H} \boldsymbol{A}^{H} \boldsymbol{A} \boldsymbol{x} = (\boldsymbol{A}\boldsymbol{x})^{H} (\boldsymbol{A} \boldsymbol{x}) \ge 0 \end{equation} If $(\lambda, \boldsymbol{x})$ is an eigenpair of $\boldsymbol{A}^{H} \boldsymbol{A}$,

\begin{equation} \boldsymbol{A}^{H} \boldsymbol{A} \boldsymbol{x} = \lambda \boldsymbol{x} \end{equation}

Premultiplying both sides by $\boldsymbol{x}^{H}$:

\begin{equation} \boldsymbol{x}^{H} \boldsymbol{A}^{H} \boldsymbol{A} \boldsymbol{x} = \lambda \boldsymbol{x}^{H} \boldsymbol{x} \end{equation}

Since $\boldsymbol{x}^{H} \boldsymbol{A}^{H} \boldsymbol{A} \boldsymbol{x} \ge 0$ and $\boldsymbol{x}^{H} \boldsymbol{x} \ge 0$, all eigenvalues $\lambda$ must be greater than or equal to zero.

Question 6

a. Eigenvalues of $\boldsymbol{A}$ satisfy $\det(\boldsymbol{A} - \lambda\boldsymbol{I}) = 0$. For the transpose, we have \begin{equation} \det\left(\boldsymbol{A}^{T} - \lambda\boldsymbol{I}\right) = \det\left(\left(\boldsymbol{A} - \bar{\lambda} \boldsymbol{I}\right)^{T}\right) = \det(\boldsymbol{A} - \lambda \boldsymbol{I}), \end{equation} since the $\det\boldsymbol{A} = \det \boldsymbol{A}^{T}$. Hence eigenvalues of $\det\boldsymbol{A}$ and $\det \boldsymbol{A}^{T}$ are the same.

b. Noting that $\det\left(\boldsymbol{A}^{H}\right) = \det\left(\overline{\boldsymbol{A}}^{T}\right) = \overline{\det\left(\boldsymbol{A}^{T}\right)} = \overline{\det\left(\boldsymbol{A}\right)}$, \begin{equation} \det\left(\boldsymbol{A}^{H} - \lambda\boldsymbol{I}\right) = \det\left(\left(\boldsymbol{A} - \bar{\lambda} \boldsymbol{I}\right)^{H}\right) = \det\left( \overline{(\boldsymbol{A} - \bar{\lambda}\boldsymbol{I})^{T}}\right) = \overline{\det(\boldsymbol{A} - \bar{\lambda} \boldsymbol{I}}). \end{equation}

c. If $\lambda$ and $\boldsymbol{x}$ are an eigenvalue and eigenvector, respectively, of $\boldsymbol{A}\boldsymbol{B}$, then \begin{equation} \boldsymbol{A}\boldsymbol{B} \boldsymbol{x} = \lambda \boldsymbol{x} \end{equation} Multiplying $\boldsymbol{x}$ by $\boldsymbol{B}$ on both sides, \begin{equation} (\boldsymbol{B}\boldsymbol{A}) (\boldsymbol{B} \boldsymbol{x}) = \lambda (\boldsymbol{B}\boldsymbol{x}) \end{equation} We see that $\lambda$ is an eigenvalue of $\boldsymbol{B}\boldsymbol{A}$, and the eigenvector is now
$\boldsymbol{B} \boldsymbol{x}$.

Question 7

Follows directly from the definition of the norms:

\begin{equation} \max_{i=1}^{n} |x_{i}| \le \sum_{i=1}^{n} |x_{i}| \le n \max_{i=1}^{n} |x_{i}|. \end{equation}

Question 8

Recall that from the defintion of a matrix operator norm it follows that $\| \boldsymbol{A}\boldsymbol{x} \| \le \| \boldsymbol{A} \| \| \boldsymbol{x} \|$

a. From the definition of the matrix norm:

\begin{equation} \| \boldsymbol{A} \boldsymbol{B} \boldsymbol{x} \| \le \| \boldsymbol{A} \| \| \boldsymbol{B} \boldsymbol{x}\| \le \| \boldsymbol{A} \| \| \boldsymbol{B}\| \| \boldsymbol{x}\| \end{equation}

for all $\boldsymbol{x}$. Re-arranging,

\begin{equation} \frac{\| \boldsymbol{A} \boldsymbol{B} \boldsymbol{x} \|}{\|\boldsymbol{x}\|} \le \| \boldsymbol{A} \| \| \boldsymbol{B}\| \end{equation}

for all $\boldsymbol{x} \ne \boldsymbol{0}$. From the definition of the norm, $\| \boldsymbol{A} \boldsymbol{B} \|$ is the largest possible value of $\| \boldsymbol{A} \boldsymbol{B} \boldsymbol{x} \| / \|\boldsymbol{x}\|$ (over all $\boldsymbol{x}$, exluding the zero vector), hence

\begin{equation} \| \boldsymbol{A} \boldsymbol{B} \| \le \| \boldsymbol{A} \| \| \boldsymbol{B}\| \end{equation}

b. Consider \begin{equation} \| (\boldsymbol{A} + \boldsymbol{B}) \boldsymbol{x} \| = \| \boldsymbol{A}\boldsymbol{x} + \boldsymbol{B} \boldsymbol{x} \| \end{equation}

From the triagle inequality for vectors,

\begin{align} \| \boldsymbol{A}\boldsymbol{x} + \boldsymbol{B} \boldsymbol{x} \| &\le \| \boldsymbol{A}\boldsymbol{x} \| + \| \boldsymbol{B} \boldsymbol{x} \| \\ &\le \| \boldsymbol{A} \| \| \boldsymbol{x} \| + \| \boldsymbol{B} \| |\boldsymbol{x} \| \end{align}

Re-arranging

\begin{equation} \frac{\| (\boldsymbol{A} + \boldsymbol{B}) \boldsymbol{x} \|}{\| \boldsymbol{x} \|} \le \| \boldsymbol{A} \| + \| \boldsymbol{B} \| \end{equation}

Using same argument as from part a), we get $\| \boldsymbol{A} + \boldsymbol{B} \| \le \| \boldsymbol{A} \| + \| \boldsymbol{B} \|$.

Question 9

Recall that the smallest possible value of $C$ for which $\|{\boldsymbol{A}\boldsymbol{x}}\| \le C \| \boldsymbol{x}\|$ holds is the operator norm $\|\boldsymbol{A}\|$. The task is to find $C$ for the different norms. It simplifies the proofs if we consider all vectors for which $\|\boldsymbol{x}\| = 1$ (this is possible since $\|\alpha \boldsymbol{x}\| = |\alpha| \|\boldsymbol{x}\|$), in which case the task is to find the smallest possible $C$ such that $\|\boldsymbol{A}\boldsymbol{x}\| \le C$.

What we need to do is (i) prove the inequality, and then (ii) find an equality to show that is is a weak inequality.

a. For the 1-norm, denoting the $j$th column of $\boldsymbol{A}$ by $\boldsymbol{a}_{j}$, and for a vector $\|\boldsymbol{x}\|_{1} = 1$:

\begin{align*} \| \boldsymbol{A} \boldsymbol{x}\|_{1} = \|\sum_{j=1}^{n} \boldsymbol{a}_{j} x_{j}\|_{1} \le \sum_{j=1}^{n} \|\boldsymbol{a}_{j}\|_{1} |x_{j}| \le \max_{j=1}^{n} \|\boldsymbol{a}_{j}\|_{1} \end{align*}

This is the maximum column sum.

The term on the right could possibly be larger than $\| \boldsymbol{A} \|_{1}$, whereas the norm is the smallest possible value for the RHS that still satisfies the inequalities. If we can show a case for which equality is reached, $\|\boldsymbol{A} \boldsymbol{x}\|_{1} = \max_{j=1}^{n} \|\boldsymbol{a}_{j}\|_{1}$, we have the norm. For the vector the $\boldsymbol{e}_{j}$ with $e_{j} = 1$, where $j$ is the column with the greatest 1-norm, and $e_{i \ne j} = 0$, we have \begin{equation} \|\boldsymbol{A} \boldsymbol{e}_{j}\|_{1} = \max_{j=1}^{n} \|\boldsymbol{a}_{j}\|_{1} \end{equation} Therefore, \begin{equation} \|\boldsymbol{A}\|_{1} = \max_{j=1}^{n} \|\boldsymbol{a}_{j}\|_{1} \end{equation}

b. For a vector $\|\boldsymbol{x}\|_{\infty} = 1$:

\begin{align*} \|\boldsymbol{A} \boldsymbol{x}\|_{\infty} = \max_{i=1}^{m} |\sum_{j=1}^{n} a_{ij} x_{j}| &\le \max_{i=1}^{m} \sum_{j=1}^{n} |a_{ij}| |x_{j}| \\ &\le \max_{i=1}^{m} \sum_{j=1}^{n} |a_{ij}| \end{align*}

The RHS is the maximum row sum.

As before, we need to find a case with equality. If the row with the maximum sum is row $k$, we choose a vector $\boldsymbol{x}$ where $x_{i} = \pm 1$ such that the sign of $x_{i}$ is the same as the sign of the entry $a_{ki}$. We then have

\begin{equation} \|\boldsymbol{A}\|_{\infty} = \max_{i=1}^{m} \sum_{j=1}^{n} |a_{ij}| \end{equation}

Question 10

Recall that the $l_{2}$ norm is the square root of the maximum eigenvalue of $\boldsymbol{A}^{T}\boldsymbol{A}$ (matrix is real in this question). We have $$

\boldsymbol{A}^{T}\boldsymbol{A}
=
 \begin{bmatrix}
   26 & 5 \\ 5 & 1
 \end{bmatrix}
$$ For this matrix, $\lambda = (27 \pm \sqrt{27^{2} -4})/2$. Hence $\|\boldsymbol{A}\|_{2} \approx 5.1926$. Checking for a vector of length $1$: $$
\boldsymbol{A}
\begin{bmatrix} \cos \theta \\ \sin \theta   \end{bmatrix}
=    
\begin{bmatrix} \cos \theta \\ 5\cos \theta + \sin \theta  \end{bmatrix}
$$ Comptuing the norm, $$

F(\theta) = |\boldsymbol{A}\boldsymbol{x}|_{2}^{2} = 25 \cos^{2} \theta + 10 \cos\theta\sin\theta + 1 $$ To find the extreme points of the function, we differentiate $F$ with respect to $\theta$ and set the derivative equal to zero: \begin{align} \frac{d F}{d \theta} &= -50 \cos\theta \sin\theta + 10 (\cos^{2} \theta - \sin^{2} \theta) \\ &= -25 \sin 2\theta + 10 \cos 2\theta \\ &= 0 \end{align} The norm is maximised when $\theta = \arctan(2/5) /2$. Evaluating:


In [7]:
θ = np.arctan(2/5)/2
print("θ =", θ)

F = 25*np.cos(θ)*np.cos(θ) + 10*np.cos(θ)*np.sin(θ) + 1
print("F =", np.sqrt(F))


θ = 0.19025318855618245
F = 5.192582403567252

Question 11 (condition number)

Recall that $\kappa(\boldsymbol{A}) = \| \boldsymbol{A} \| \| \boldsymbol{A}^{-1} \|$.

We have $\| \boldsymbol{A}\|_{1} = 3$ and $\| \boldsymbol{A}\|_{\infty} \approx 2$. Note that $$ \boldsymbol{A}^{-1} \approx -\frac{1}{2} \begin{bmatrix} 1 & -2 \\ -1 & 10^{-4} \end{bmatrix} $$ so $\| \boldsymbol{A}^{-1} \|_{1} \approx 1$ and $\| \boldsymbol{A}^{-1} \|_{\infty} = 3/2$. Therefore $\kappa_{1}(\boldsymbol{A}) \approx 3$ and $\kappa_{\infty}(\boldsymbol{A}) \approx 3$.

Recall that $\kappa_{2}(\boldsymbol{A}) = \sqrt{\lambda_{\max}(\boldsymbol{A}^{T}\boldsymbol{A}})/\sqrt{\lambda_{\min}(\boldsymbol{A}^{T}\boldsymbol{A}})$. As an approximation we ignore the $(1, 1)$ entry: $$ \boldsymbol{A}^{T}\boldsymbol{A} \approx \begin{bmatrix} 1 & 1 \\ 1 & 5 \end{bmatrix} $$ Computing the eigenvalues, $\lambda \approx 3 \pm \sqrt{5}$, therefore $\kappa_{2} \approx \sqrt{3 + \sqrt{5}}/\sqrt{3 - \sqrt{5}} \approx 2.618$.

The matrix is very well conditioned, however LU factorisation may require pivoting. This is an issue with LU factorisation rather than a pathological problem with the matrix.

Question 12 (least squares)

A projection matrix $\boldsymbol{P}$ has the property $\boldsymbol{P} = \boldsymbol{P}\boldsymbol{P}$, and $\boldsymbol{P}^{H} = \boldsymbol{P}$.

a. The solution to the least squares problem is $\hat{\boldsymbol{x}} = \boldsymbol{A} (\boldsymbol{A}^{H} \boldsymbol{A})^{-1} \boldsymbol{A}^{H} \boldsymbol{b}$. Therefore $\boldsymbol{r} = \boldsymbol{A} \hat{\boldsymbol{x}} - \boldsymbol{b} = \boldsymbol{A}(\boldsymbol{A}^{H} \boldsymbol{A})^{-1} \boldsymbol{A}^{H} \boldsymbol{b} - \boldsymbol{b}$. Insert this expression for $\boldsymbol{r}$ into the expression in the question and re-arrange. to show the result.

Vectors $\boldsymbol{A}\boldsymbol{z}$ lie in the column space of $\boldsymbol{A}$, hence the expression says that the least-squares residual is orthogonal to the column space of $\boldsymbol{A}$.

b. $$ \boldsymbol{P}\boldsymbol{P} = \boldsymbol{A} (\boldsymbol{A}^{H}\boldsymbol{A})^{-1}\boldsymbol{A}^{H} \boldsymbol{A}(\boldsymbol{A}^{H}\boldsymbol{A})^{-1}\boldsymbol{A}^{H} = \boldsymbol{A} (\boldsymbol{A}^{H}\boldsymbol{A})^{-1}\boldsymbol{A}^{H} = \boldsymbol{P} $$ and $$ \boldsymbol{P}^{H} = \boldsymbol{A}(\boldsymbol{A}^{H}\boldsymbol{A})^{-H}\boldsymbol{A}^{H} = \boldsymbol{A}(\boldsymbol{A}^{H}\boldsymbol{A})^{-1}\boldsymbol{A}^{H} $$ by $\boldsymbol{A}^{H}\boldsymbol{A}$ being Hermitian

c. We can phrase a least squares problem as $$ \boldsymbol{A} \hat{\boldsymbol{x}} = \boldsymbol{A}(\boldsymbol{A}^{H}\boldsymbol{A})^{-1}\boldsymbol{A}^{H} \boldsymbol{b} = \boldsymbol{P} \boldsymbol{b} $$ which says that $\boldsymbol{P}$ projects $\boldsymbol{b}$ into the column space of $\boldsymbol{A}$. If $\boldsymbol{b}^{\prime} = \boldsymbol{P}\boldsymbol{b}$ is in the column space of $\boldsymbol{A}$, then $\boldsymbol{A} \hat{\boldsymbol{x}} = \boldsymbol{b}^{\prime}$ has a solution.

Question 13 (pseudo inverse)

a.

  • Firstly, if the $m \times n$ matrix $\boldsymbol{A}$ has linearly independent rows, then the rank of $\boldsymbol{A}$ is $m$ and the column space of $\boldsymbol{A}$ spans $\mathbb{C}^{m}$, and the nullspace space of $\boldsymbol{A}^{H}$ contains the zero vector only.

  • Now, consider the nullspace of $\boldsymbol{A}\boldsymbol{A}^{H}$:

    \begin{equation} \boldsymbol{A}\boldsymbol{A}^{H} \boldsymbol{x} = \boldsymbol{0} \ \rightarrow \ \boldsymbol{x}^{H}\boldsymbol{A} \boldsymbol{A}^{H} \boldsymbol{x} = 0 \ \rightarrow \ (\boldsymbol{A}^{H} \boldsymbol{x})^{H}\boldsymbol{A}^{H} \boldsymbol{x} = 0 \end{equation}

    The above holds only if $\boldsymbol{A}^{H} \boldsymbol{x} = \boldsymbol{0}$, which says that $\boldsymbol{x}$ must come from the nullspace of $\boldsymbol{A}^{H}$. We have already determined that the nullspace of $\boldsymbol{A}^{H}$ contains only the zero vector, therefore $\boldsymbol{A}\boldsymbol{A}^{H}$ is full rank (the nullspace of $\boldsymbol{A}\boldsymbol{A}^{H}$ contains the zero vector only) and can be inverted.

b. Since $\boldsymbol{A}\boldsymbol{A}^{H}$ is square and full rank, it can be inverted, $$ \boldsymbol{A} \boldsymbol{A}^{+} = \boldsymbol{A} \boldsymbol{A}^{H} (\boldsymbol{A}\boldsymbol{A}^{H})^{-1} = \boldsymbol{I}. $$ hence $\boldsymbol{A}^{+}$ is a right-inverse of $\boldsymbol{A}$.

Question 14 (stationary iterative methods)

Define matrix $\boldsymbol{A}$:


In [8]:
A = np.array([[2, -1], [-1, 2]])
print(A)


[[ 2 -1]
 [-1  2]]

We split $\boldsymbol{A}$ such that $\boldsymbol{A} = \boldsymbol{N} - \boldsymbol{P}$. A method will converge if the largest absolute eigenvalue of $\boldsymbol{N}^{-1}\boldsymbol{P}$ is less the one.

For the Richardson method $\boldsymbol{N} = \boldsymbol{I}$:


In [9]:
# Richardson
N = np.identity(2)
P = N - A
M = np.linalg.inv(N).dot(P) 
print(np.linalg.eigvals(M))


[ 0. -2.]

Largest eigenvalue (absolute value) is greater than 1, therefore method will not converge.

For the Jacobi method, $\boldsymbol{N} = \text{diag}(\boldsymbol{A})$:


In [10]:
# Jacobi
N = np.diag(np.diag(A))
P = N - A
M = np.linalg.inv(N).dot(P) 
print(np.linalg.eigvals(M))


[ 0.5 -0.5]

Largest eigenvalue (absolute value) is less than 1, therefore method will converge.

For Gauss-Seidel, $\boldsymbol{N}$ is the lower triangular part of $\boldsymbol{A}$:


In [11]:
# Gauss-Seidel
N = np.tril(A)
P = N - A
M = np.linalg.inv(N).dot(P) 
print(np.linalg.eigvals(M))


[0.   0.25]

Gauss-Seidel will converge because largest eigenvalue is less than one.

Question 15 (SVD)

\begin{align} \boldsymbol{A}^{-1} &= \left(\boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^{H}\right)^{-1} \\ &=\boldsymbol{V}^{-H} \boldsymbol{\Sigma}^{-1} \boldsymbol{U}^{-1} \\ &=\boldsymbol{V} \boldsymbol{\Sigma}^{-1} \boldsymbol{U}^{H} \end{align}

Non-singular matrix cannot have any zero singular values. In fact, smallest singular values is a measure of the 'distance' to a singular matrix.

Question 16 (SVD)

Define matrix $\boldsymbol{A}$:


In [12]:
A = np.array([[1, 3], [2, 2], [3, 1]])
print(A)


[[1 3]
 [2 2]
 [3 1]]

Compute the reduced SVD (recall that NumPy uses $\boldsymbol{A} = \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}$ rather than $\boldsymbol{A} = \boldsymbol{U} \boldsymbol{\Sigma} \boldsymbol{V}^{T}$):


In [13]:
U, s, V = np.linalg.svd(A, full_matrices=False)
print(s)
print(U)
print(V)


[4.89897949 2.        ]
[[-5.77350269e-01  7.07106781e-01]
 [-5.77350269e-01  8.98662938e-17]
 [-5.77350269e-01 -7.07106781e-01]]
[[-0.70710678 -0.70710678]
 [-0.70710678  0.70710678]]

Compute pseudoinverse by creating $\boldsymbol{\Sigma}^{+} = \boldsymbol{\Sigma}_{1}^{-1}$


In [14]:
# Pseudoinverse
Sigma_p = np.diag(1.0/s)
Ap = (V.T).dot(Sigma_p.dot(U.T))
print(Ap)


[[-0.16666667  0.08333333  0.33333333]
 [ 0.33333333  0.08333333 -0.16666667]]

Compute $\boldsymbol{A}^{+}\boldsymbol{A}$:


In [15]:
# Check that A^{+}A = I
print(Ap.dot(A))


[[ 1.00000000e+00 -3.88578059e-16]
 [ 2.49800181e-16  1.00000000e+00]]

which is the identity. Compute now $\boldsymbol{A}\boldsymbol{A}^{+}$


In [16]:
# Check that AA^{+} \ne I
print(A.dot(Ap))


[[ 0.83333333  0.33333333 -0.16666667]
 [ 0.33333333  0.33333333  0.33333333]
 [-0.16666667  0.33333333  0.83333333]]

Which is clearly not the identity.

(e) Recall that from $\boldsymbol{A}^{T} \boldsymbol{A} \hat{\boldsymbol{x}} = \boldsymbol{A}^{T} \boldsymbol{b}$ we have \begin{align} \hat{\boldsymbol{x}} &= (\boldsymbol{A}^{T} \boldsymbol{A})^{-1}\boldsymbol{A}^{T} \boldsymbol{b} \\ &= \boldsymbol{A}^{+} \boldsymbol{b} \end{align} Multiplying both sides by $\boldsymbol{A}$, $$ \boldsymbol{A}\hat{\boldsymbol{x}} = \underbrace{\boldsymbol{A} \boldsymbol{A}^{+}}_{\boldsymbol{P}} \boldsymbol{b} $$ where $\boldsymbol{P}$ is the projection matrix from an earlier questions. Recall that $\boldsymbol{P}$ projects a vector into the column space of $\boldsymbol{A}$. Since $\boldsymbol{P}$ is a projector, it does nothing if $\boldsymbol{b}$ is already in column space. Therefore any vector $\boldsymbol{b}$ in column space of $\boldsymbol{A}$ is a solution.

Question 17 (pseudo inverse)

Define matrix $\boldsymbol{A}$:


In [17]:
A = np.array([[0, 3, 0], [2, 0, 0]])
print(A)


[[0 3 0]
 [2 0 0]]

Compute SVD:


In [18]:
U, s, V = np.linalg.svd(A, full_matrices=False)
print(U)
print(s)
print(V)


[[1. 0.]
 [0. 1.]]
[3. 2.]
[[0. 1. 0.]
 [1. 0. 0.]]

Compute $\boldsymbol{\Sigma}^{+} = \boldsymbol{\Sigma}_{1}^{-1}$, and then $\boldsymbol{A}^{+}$


In [19]:
# \Sigma^{+}
Sigma_p = np.diag(1.0/s)

# A^{+}
Ap = (V.T).dot(Sigma_p).dot(U)
print(Ap)


[[0.         0.5       ]
 [0.33333333 0.        ]
 [0.         0.        ]]

Compute $\boldsymbol{A}^{+}\boldsymbol{A}$


In [20]:
ApA = Ap.dot(A)
print(ApA)


[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 0.]]

For this matrix, any $\boldsymbol{x} = [x_{1} \ \ x_{2} \ \ 0]$ (which is from the row space of $\boldsymbol{A}$) satisfies $\boldsymbol{A}^{+} \boldsymbol{A} \boldsymbol{x} = \boldsymbol{x}$.

Question 18 (rank deficient least squares)

Define matrix and RHS vector


In [21]:
A = np.array([[1, 0, 0], [1, 0, 0], [1, 1, 1]])
print(A)

b = np.array([0, 2, 2])
print(b)


[[1 0 0]
 [1 0 0]
 [1 1 1]]
[0 2 2]

Compute the SVD and print singular values:


In [22]:
# Compute SVD
U, s, V = np.linalg.svd(A)
print(s)


[2. 1. 0.]

There is one zero singular value, so we need to 'trim' the last column from $\boldsymbol{U}$ and the last row from $\boldsymbol{U}$, and compute $\boldsymbol{\sigma}^{+}$


In [23]:
# Create view of U with last columns 
U1 = U[:, :2]
print(U1)
V1 = V[:2,:]
print(V1)

# Create Sigma^{+}
S1 = np.diag(1.0/s[:-1])
print(S1)


[[-0.40824829 -0.57735027]
 [-0.40824829 -0.57735027]
 [-0.81649658  0.57735027]]
[[-0.81649658 -0.40824829 -0.40824829]
 [-0.57735027  0.57735027  0.57735027]]
[[0.5 0. ]
 [0.  1. ]]

Solve problem $\boldsymbol{x} = \boldsymbol{V}_{1} \boldsymbol{\Sigma}_{1}^{-1} \boldsymbol{U}_{1}^{T}$


In [24]:
x = np.transpose(V1).dot(S1.dot(U1.T).dot(b))
print(x)


[1.  0.5 0.5]