The reader should be familiar with eigenvalue decomposition, singular value decompostion, and perturbation theory for eigenvalue decomposition.
The reader should be able to understand and check the facts about perturbations of singular values and vectors.
For more details and the proofs of the Facts below, see R.-C. Li, Matrix Perturbation Theory, and the references therein.
Let $A\in\mathbb{C}^{m\times n}$ and let $A=U\Sigma V^*$ be its SVD.
The set of $A$'s singular values is $sv(B)=\{\sigma_1,\sigma_2,\ldots)$, with $\sigma_1\geq \sigma_2\geq \cdots\geq 0$, and let $sv_{ext}(B)=sv(B)$ unless $m>n$ for which $sv_{ext}(B)=sv(B)\cup \{0,\ldots,0\}$ (additional $|m-n|$ zeros).
Triplet $(u,\sigma,v)\in\times\mathbb{C}^{m}\times\mathbb{R}\times\mathbb{C}^{n}$ is a singular triplet of $A$ if $\|u\|_2=1$, $\|v\|_2=1$, $\sigma\geq 0$, and $Av=\sigma u$ and $A^*u=\sigma v$.
$\tilde A=A+\Delta A$ is a perturbed matrix, where $\Delta A$ is perturbation. The same notation is adopted to $\tilde A$, except all symbols are with tildes.
Spectral condition number of $A$ is $\kappa_2(A)=\sigma_{\max}(A)/ \sigma_{\min}(A)$.
Let $X,Y\in\mathbb{C}^{n\times k}$ with $\mathop{\mathrm{rank}}(X)=\mathop{\mathrm{rank}}(Y)=k$. The canonical angles between their column spaces are $\theta_i=\cos^{-1}\sigma_i$, where $\sigma_i$ are the singular values of $(Y^*Y)^{-1/2}Y^*X(X^*X)^{-1/2}$. The canonical angle matrix between $X$ and $Y$ is
$$\Theta(X,Y)=\mathop{\mathrm{diag}}(\theta_1,\theta_2,\ldots,\theta_k). $$Mirsky Theorem. $\|\Sigma-\tilde\Sigma\|_2\leq \|\Delta A\|_2$ and $\|\Sigma-\tilde\Sigma\|_F\leq \|\Delta A\|_F$.
Residual bounds. Let $\|\tilde u\|_2=\|\tilde v\|_2=1$ and $\tilde \mu=\tilde u^* A \tilde v$. Let residuals $r=A\tilde v-\tilde \mu \tilde u$ and $s=A^*\tilde u - \tilde \mu \tilde v$, and let $\varepsilon=\max\{\|r\|_2,\|s\|_2\}$. Then $|\tilde \mu -\mu|\leq \varepsilon$ for some singular value $\mu$ of $A$.
The smallest error matrix $\Delta A$ for which $(\tilde u, \tilde \mu, \tilde v)$ is a singular triplet of $\tilde A$ satisfies $\| \Delta A\|_2=\varepsilon$.
Let $\mu$ be the closest singular value in $sv_{ext}(A)$ to $\tilde \mu$ and $(u,\mu,v)$ be the associated singular triplet, and let $$\eta=\mathop{\mathrm{gap}}(\tilde\mu)= \min_{\mu\neq\sigma\in sv_{ext}(A)}|\tilde\mu-\sigma|.$$ If $\eta>0$, then \begin{align*} |\tilde\mu-\mu |&\leq \frac{\varepsilon^2}{\eta},\\ \sqrt{\sin^2\theta(u,\tilde u)+ \sin^2\theta(v,\tilde v)} & \leq \frac{\sqrt{\|r\|_2^2 + \|s\|_2^2}}{\eta}. \end{align*}
Let $$ A=\begin{bmatrix} M & E \\ F & H \end{bmatrix}, \quad \tilde A=\begin{bmatrix} M & 0 \\ 0 & H \end{bmatrix}, $$ where $M\in\mathbb{C}^{k\times k}$, and set $\eta=\min |\mu-\nu|$ over all $\mu\in sv(M)$ and $\nu\in sv_{ext}(H)$, and $\varepsilon =\max \{ \|E\|_2,\|F\|_2 \}$. Then $$ \max |\sigma_j -\tilde\sigma_j| \leq \frac{2\varepsilon^2}{\eta+\sqrt{\eta^2+4\varepsilon^2}}. $$
Let $m\geq n$ and let $$ \begin{bmatrix} U_1^*\\ U_2^* \end{bmatrix} A \begin{bmatrix} V_1 & V_2 \end{bmatrix}= \begin{bmatrix} A_1 & \\ & A_2 \end{bmatrix}, \quad \begin{bmatrix} \tilde U_1^*\\ \tilde U_2^* \end{bmatrix} \tilde A \begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}= \begin{bmatrix} \tilde A_1 & \\ & \tilde A_2 \end{bmatrix}, $$ where $\begin{bmatrix} U_1 & U_2 \end{bmatrix}$, $\begin{bmatrix} V_1 & V_2 \end{bmatrix}$, $\begin{bmatrix} \tilde U_1 & \tilde U_2 \end{bmatrix}$, and $\begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}$ are unitary, and $U_1,\tilde U_1\in \mathbb{C}^{m\times k}$, $V_1,\tilde V_1\in \mathbb{C}^{n\times k}$. Set $$ R=A\tilde V_1-\tilde U_1\tilde A_1,\quad S=A^*\tilde U_1-\tilde V_1 \tilde A_1. $$ Let $\eta=\min|\tilde \mu-\nu|$ over all $\tilde \mu\in sv(\tilde A_1)$ and $\nu\in sv_{ext}(A_2)$. If $\eta > 0$, then $$ \sqrt{\|\sin\Theta(U_1,\tilde U_1)\|_F^2 + \|\sin\Theta(V_1,\tilde V_1)\|_F^2} \leq \frac{\sqrt{\|R\|_F^2 + \|S\|_F^2 }}{\eta}. $$
In [1]:
import Random
Random.seed!(421)
m=8
n=5
k=min(m,n)
A=rand(-9:9,m,n)
Out[1]:
In [2]:
ΔA=rand(m,n)/100
B=A+ΔA
Out[2]:
In [3]:
using LinearAlgebra
U,σ,V=svd(A)
U₁,σ₁,V₁=svd(B)
Out[3]:
In [4]:
# Mirsky's Theorems
maximum(abs,σ-σ₁),opnorm(Diagonal(σ)-Diagonal(σ₁)),
opnorm(ΔA), norm(σ-σ₁), norm(ΔA)
Out[4]:
In [5]:
# Residual bounds - how close is (x,ζ,y) to (U[:,j],σ[j],V[:,j])
j=rand(2:k-1)
x=round.(U[:,j],digits=3)
y=round.(V[:,j],digits=3)
x=x/norm(x)
y=y/norm(y)
ζ=(x'*A*y)[]
σ, j, ζ
Out[5]:
In [6]:
# Fact 2
r=A*y-ζ*x
s=A'*x-ζ*y
ϵ=max(norm(r),norm(s))
Out[6]:
In [7]:
minimum(abs,σ.-ζ), ϵ
Out[7]:
In [8]:
# Fact 4
η=min(abs(ζ-σ[j-1]),abs(ζ-σ[j+1]))
Out[8]:
In [9]:
ζ-σ[j], ϵ^2/η
Out[9]:
In [10]:
# Eigenvector bound
# cos(θ)
cosθU=dot(x,U[:,j])
cosθV=dot(y,V[:,j])
# Bound
sqrt(1-cosθU^2+1-cosθV^2), sqrt(norm(r)^2+norm(s)^2)/η
Out[10]:
In [11]:
# Fact 5 - we create small off-diagonal block perturbation
j=3
M=A[1:j,1:j]
H=A[j+1:m,j+1:n]
B=cat(M,H,dims=(1,2))
Out[11]:
In [12]:
E=rand(Float64,size(A[1:j,j+1:n]))/100
F=rand(Float64,size(A[j+1:m,1:j]))/100
C=map(Float64,B)
C[1:j,j+1:n]=E
C[j+1:m,1:j]=F
C
Out[12]:
In [13]:
svdvals(M)
Out[13]:
In [14]:
svdvals(H)'
Out[14]:
In [15]:
svdvals(M).-svdvals(H)'
Out[15]:
In [16]:
ϵ=max(norm(E), norm(F))
β=svdvals(B)
γ=svdvals(C)
η=minimum(abs,svdvals(M).-svdvals(H)')
display([β γ])
maximum(abs,β-γ), 2*ϵ^2/(η+sqrt(η^2+4*ϵ^2))
Out[16]:
Matrix $A\in\mathbb{C}^{m\times n}$ is multiplicatively pertubed to $\tilde A$ if $\tilde A=D_L^* A D_R$ for some $D_L\in\mathbb{C}^{m\times m}$ and $D_R\in\mathbb{C}^{n\times n}$.
Matrix $A$ is (highly) graded if it can be scaled as $A=GS$ such that $\kappa_2(G)$ is of modest magnitude. The scaling matrix $S$ is often diagonal. Interesting cases are when $\kappa_2(G)\ll \kappa_2(A)$.
Relative distances between two complex numbers $\alpha$ and $\tilde \alpha$ are:
\begin{align*} \zeta(\alpha,\tilde \alpha)&=\frac{|\alpha-\tilde\alpha|}{\sqrt{|\alpha\tilde \alpha|}}, \quad \textrm{for } \alpha\tilde\alpha\neq 0,\\ \varrho(\alpha,\tilde \alpha)&=\frac{|\alpha-\tilde\alpha|} {\sqrt{|\alpha|^2 + |\tilde \alpha|^2}}, \quad \textrm{for } |\alpha|+|\tilde\alpha|> 0. \end{align*}If $D_L$ and $D_R$ are non-singular and $m\geq n$, then \begin{align*} \frac{\sigma_j}{\|D_L^{-1}\|_2\|D_R^{-1}\|_2}& \leq \tilde\sigma_j \leq \sigma_j \|D_L\|_2\|D_R\|_2, \quad \textrm{for } i=1,\ldots,n, \\ \| \mathop{\mathrm{diag}}(\zeta(\sigma_1,\tilde \sigma_1),\ldots, \zeta(\sigma_n,\tilde \sigma_n)\|_{2,F} & \leq \frac{1}{2}\|D_L^*-D_L^{-1}\|_{2,F} + \frac{1}{2}\|D_R^*-D_R^{-1}\|_{2,F}. \end{align*}
Let $m\geq n$ and let $$ \begin{bmatrix} U_1^*\\ U_2^* \end{bmatrix} A \begin{bmatrix} V_1 & V_2 \end{bmatrix}= \begin{bmatrix} A_1 & \\ & A_2 \end{bmatrix}, \quad \begin{bmatrix} \tilde U_1^*\\ \tilde U_2^* \end{bmatrix} \tilde A \begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}= \begin{bmatrix} \tilde A_1 & \\ & \tilde A_2 \end{bmatrix}, $$ where $\begin{bmatrix} U_1 & U_2 \end{bmatrix}$, $\begin{bmatrix} V_1 & V_2 \end{bmatrix}$, $\begin{bmatrix} \tilde U_1 & \tilde U_2 \end{bmatrix}$, and $\begin{bmatrix} \tilde V_1 & \tilde V_2 \end{bmatrix}$ are unitary, and $U_1,\tilde U_1\in \mathbb{C}^{m\times k}$, $V_1,\tilde V_1\in \mathbb{C}^{n\times k}$. Set $$ R=A\tilde V_1-\tilde U_1\tilde A_1,\quad S=A^*\tilde U_1-\tilde V_1 \tilde A_1. $$ Let $\eta=\min \varrho(\mu,\tilde \mu)$ over all $\mu\in sv(A_1)$ and $\tilde \mu\in sv_{ext}(A_2)$. If $\eta > 0$, then \begin{align*} & \sqrt{\|\sin\Theta(U_1,\tilde U_1)\|_F^2 + \|\sin\Theta(V_1,\tilde V_1)\|_F^2} \\ & \leq \frac{1}{\eta}( \|(I-D_L^*)U_1\|_F^2+ \|(I-D_L^{-1})U_1\|_F^2 \\ & \quad +\|(I-D_R^*)V_1\|_F^2+ \|(I-D_R^{-1})V_1\|_F^2 )^{1/2}. \end{align*}
Let $A=GS$ and $\tilde A=\tilde GS$, and let $\Delta G=\tilde G-G$. Then $\tilde A=DA$, where $D=I+(\Delta G) G^{\dagger}$, and Fact 1 applies with $D_L=D$, $D_R=I$, and $$ \|D^*-D^{-1}\|_{2,F} \leq \bigg(1+\frac{1}{1-\|(\Delta G) G^{\dagger}\|_{2}}\bigg) \frac{\|(\Delta G) G^{\dagger}\|_{2,F}}{2}. $$ According to the notebook on Jacobi Method and High Relative Accuracy, nearly optimal diagonal scaling is such that all columns of $G$ have unit norms, $S=\mathop{\mathrm{diag}} \big( \| A_{:,1}\|_2,\ldots,\|A_{:,n}\|_2 \big)$.
Let $A$ be an real upper-bidiagonal matrix with diagonal entries $a_1,a_2,\ldots,a_n$ and the super-diagonal entries $b_1,b_2, \ldots,b_{n-1}$. Let the diagonal entries of $\tilde A$ be $\alpha_1 a_1,\alpha_2 a_2,\ldots,\alpha_n a_n$, and its super-diagonal entries be $\beta_1 b_1,\beta_2 b_2,\ldots,\beta_{n-1} b_{n-1}$. Then $\tilde A=D_L^* A D_R$ with \begin{align*} D_L &=\mathop{\mathrm{diag}} \bigg(\alpha_1,\frac{\alpha_1 \alpha_2}{\beta_1}, \frac{\alpha_1 \alpha_2 \alpha_3}{\beta_1 \beta_2},\cdots\bigg),\\ D_R &=\mathop{\mathrm{diag}} \bigg(1, \frac{\beta_1}{\alpha_1}, \frac{\beta_1 \beta_2}{\alpha_1 \alpha_2},\cdots\bigg). \end{align*} Let $\alpha=\prod\limits_{j=1}^n \max\{\alpha_j, 1/\alpha_j\}$ and $\beta=\prod\limits_{j=1}^{n-1} \max\{\beta_j, 1/\beta_j\}$. Then $$ (\alpha\beta)^{-1}\leq \| D_L^{-1}\|_2 \|D_R^{-1}\|_2 \leq \| D_L\|_2 \|D_R\|_2 \leq \alpha\beta, $$ and Fact 1 applies. This is a result by Demmel and Kahan.
Consider the block partitioned matrices \begin{align*} A & =\begin{bmatrix} B & C \\ 0 & D\end{bmatrix}, \\ \tilde A & = \begin{bmatrix} B & 0 \\ 0 & D\end{bmatrix} =A \begin{bmatrix} I & -B^{-1} C \\ 0 & I \end{bmatrix}\equiv A D_R. \end{align*} By Fact 1, $\zeta(\sigma_j,\tilde \sigma_j) \leq \frac{1}{2} \|B^{-1}C\|_2$. This is used as a deflation criterion in the SVD algorithm for bidiagonal matrices.
In order to illustrate Facts 1 to 3, we need an algorithm which computes the singular values with high relative acuracy. Such algorithm, the one-sided Jacobi method, is discussed in the notebook on Jacobi and Lanczos Methods.
The algorithm actually used in the function svdvals() for Bidiagonal is the zero-shift bidiagonal QR algorithm, which attains the accuracy given by Fact 4: if all
$1-\varepsilon \leq \alpha_i,\beta_j \leq 1+\varepsilon$, then
$$
(1-\varepsilon)^{2n-1} \leq (\alpha\beta)^{-1} \leq \alpha\beta \leq (1-\varepsilon)^{2n-1}.
$$
In other words, $\varepsilon$ relative changes in diagonal and super-diagonal elements, cause at most $(2n-1)\varepsilon$ relative changes in the singular values.
However, if singular values and vectors are desired, the function svd() calls the standard algorithm, described in the next notebook, which does not attain this accuracy .
In [17]:
Random.seed!(135)
n=50
δ=100000
# The starting matrix
a=exp.(50*(rand(n).-0.5))
b=exp.(50*(rand(n-1).-0.5))
A=Bidiagonal(a,b,'U')
# Multiplicative perturbation
DL=ones(n)+(rand(n).-0.5)/δ
DR=ones(n)+(rand(n).-0.5)/δ
# The perturbed matrix
α=DL.*a.*DR
β=DL[1:end-1].*b.*DR[2:end]
B=Bidiagonal(α,β,'U')
# B-Diagonal(DL)*A*Diagonal(DR)
(A.dv-B.dv)./A.dv
Out[17]:
In [18]:
cond(A)
Out[18]:
In [19]:
(a-α)./a, (b-β)./b
Out[19]:
In [20]:
@which svdvals(A)
Out[20]:
In [21]:
@which svdvals!(copy(A))
Out[21]:
In [22]:
σ=svdvals(A)
μ=svdvals(B)
[σ (σ-μ)./σ]
Out[22]:
In [23]:
# The standard algorithm
U,ν,V=svd(A);
In [24]:
ν
Out[24]:
In [25]:
(σ-ν)./σ
Out[25]:
In [ ]: