In order to measure changes, we need to define norms. For more details and the proofs of the Facts below, see R. Byers and B. N. Datta, Vector and Matrix Norms, Error Analysis, Efficiency, and Stability and the references therein.
Norm on a vector space $X$ is a real-valued function $\| \phantom{x} \| : X\to \mathbb{R}$ with the following properties:
Commonly encountered vector norms for $x\in\mathbb{C}^n$ are:
Vector norm is absolute if $\||x|\|=\|x\|$.
Vector norm is monotone if $|x|\leq |y|$ implies $\|x\|\leq \|y\|$.
From every vector norm we can derive a corresponding induced matrix norm (also, operator norm or natural norm):
$$\|A\| = \max\limits_{x\neq 0} \frac{\|Ax\|}{\|x\|}=\max\limits_{\|x\|=1} \|Ax\|.$$For matrix $A\in\mathbb{C}^{m\times n}$ we define:
Matrix norm is consistent if $\|A\cdot B\|\leq \|A\| \cdot \| B\|$, where $A$ and $B$ are compatible for matrix multiplication.
Matrix norm is absolute if $\||A|\|=\|A\|$.
In [1]:
import Random
Random.seed!(425)
x=rand(-4:4,5)
Out[1]:
In [2]:
using LinearAlgebra
norm(x,1), norm(x), norm(x,Inf)
Out[2]:
In [3]:
A=rand(-4:4,7,5)
Out[3]:
In [4]:
norm(A,1), norm(A), norm(A,2), norm(A,Inf),
opnorm(A), opnorm(A,1), opnorm(A,Inf)
Out[4]:
In [5]:
# Frobenius norm
norm(vec(A))
Out[5]:
In [6]:
A
Out[6]:
In [7]:
# Absolute norms
opnorm(A,1), opnorm(abs.(A),1), opnorm(A,Inf), opnorm(abs.(A),Inf), norm(A),
norm(abs.(A)), opnorm(A),opnorm(abs.(A))
Out[7]:
In [8]:
# Equivalence of norms
m,n=size(A)
opnorm(A,Inf)\sqrt(n),opnorm(A), sqrt(m)*opnorm(A,Inf)
Out[8]:
In [9]:
opnorm(A), norm(A), sqrt(n)*opnorm(A)
Out[9]:
In [10]:
opnorm(A,1)\sqrt(m),opnorm(A), sqrt(n)*opnorm(A,1)
Out[10]:
In [11]:
# Fact 12
opnorm(A), sqrt(opnorm(A,1)*opnorm(A,Inf))
Out[11]:
In [12]:
# Fact 13
B=rand(n,rand(1:9))
norm(A*B), norm(A)*opnorm(B), opnorm(A)*norm(B)
Out[12]:
In [13]:
# Fact 14
x=rand(ComplexF64,10)
y=rand(ComplexF64,10)
A=x*y'
opnorm(A), norm(A), norm(x)*norm(y)
Out[13]:
In [14]:
# Fact 15
A=rand(-4:4,7,5)+im*rand(-4:4,7,5)
opnorm(A), opnorm(A'), norm(A), norm(A')
Out[14]:
In [15]:
# Unitary invariance - generate random unitary matrix U
U,R=qr(rand(ComplexF64,size(A)));
In [16]:
opnorm(A), opnorm(U*A), norm(A), norm(U*A)
Out[16]:
In [17]:
# Spectral radius
A=rand(ComplexF64,7,7)
maximum(abs,eigvals(A)), opnorm(A,Inf), opnorm(A,1), opnorm(A), norm(A)
Out[17]:
In [18]:
# Fact 18
B=A/(maximum(abs,eigvals(A))+2)
@show maximum(abs,eigvals(B))
norm(B^100)
Out[18]:
We want to answer the question:
How much the value of a function changes with respect to the change of its argument?
For function $f(x)$ and argument $x$, the absolute error with respect to the perturbation of the argument $\delta x$ is
$$ \| f(x+\delta x)-f(x)\| = \frac{\| f(x+\delta x)-f(x)\|}{\| \delta x \|} \|\delta x\| \equiv \kappa \|\delta x\|. $$The condition or condition number $\kappa$ tells how much does the perturbation of the argument increase. (Its form resembles derivative.)
Similarly, the relative error with respect to the relative perturbation of the argument is
$$ \frac{\| f(x+\delta x)-f(x)\|}{\| f(x) \|}= \frac{\| f(x+\delta x)-f(x)\|\cdot \|x\| }{\|\delta x\| \cdot\| f(x)\|} \cdot \frac{\|\delta x\|}{\|x\|} \equiv \kappa_{rel} \frac{\|\delta x\|}{\|x\|}. $$Let $A\in\mathbb{C}^{n\times n}$.
Pair $(\lambda,x)\in\mathbb{C}\times\mathbb{C}^{n\times n}$ is an eigenpair of $A$ if $x\neq 0$ and $Ax=\lambda x$.
Triplet $(y,\lambda,x)\in\times\mathbb{C}^{n}\times\mathbb{C}\times\mathbb{C}^{n}$ is an eigentriplet of $A$ if $x,y\neq 0$ and $Ax=\lambda x$ and $y^*A=\lambda y^*$.
Eigenvalue matrix is a diagonal matrix $\Lambda=\mathop{\mathrm{diag}}(\lambda_1,\lambda_2,\ldots,\lambda_n)$.
If all eigenvalues are real, they can be increasingly ordered. $\Lambda^\uparrow$ is the eigenvalue matrix of increasingly ordered eigenvalues.
$\tau$ is a permutation of $\{1,2,\ldots,n\}$.
$\tilde A=A+\Delta A$ is a perturbed matrix, where $\Delta A$ is perturbation. $(\tilde \lambda,\tilde x)$ are the eigenpairs of $\tilde A$.
Condition number of a nonsingular matrix $X$ is $\kappa(X)=\|X\| \|X^{-1}\|$.
Let $X,Y\in\mathbb{C}^{n\times k}$ with $\mathop{\mathrm{rank}}(X)=\mathop{\mathrm{rank}}(Y)=k$. The canonical angles between their column spaces, $\theta_i$, are defined by $\cos \theta_i=\sigma_i$, where $\sigma_i$ are the singular values of the matrix $$(Y^*Y)^{-1/2}Y^*X(X^*X)^{-1/2}.$$ The canonical angle matrix between $X$ and $Y$ is $$\Theta(X,Y)=\mathop{\mathrm{diag}}(\theta_1,\theta_2,\ldots,\theta_k). $$
Bounds become more strict as matrices have more structure. Many bounds have versions in spectral norm and Frobenius norm. For more details and the proofs of the Facts below, see R.-C. Li, Matrix Perturbation Theory, and the references therein.
There exists $\tau$ such that $$\|\Lambda- \tilde\Lambda_\tau\|_2\leq 4(\|A\|_2+\|\tilde A\|_2)^{1-1/n}\|\Delta A\|_2^{1/n}.$$
First-order perturbation bounds. Let $(y,\lambda,x)$ be an eigentriplet of a simple $\lambda$. $\Delta A$ changes $\lambda$ to $\tilde\lambda=\lambda+ \delta\lambda$, where $$ \delta\lambda=\frac{y^*(\Delta A)x}{y^*x}+O(\|\Delta A\|_2^2). $$
Let $\mu$ be a semisimple eigenvalue of $A$ with multiplicitiy $k$, and let $X,Y\in \mathbb{C}^{n\times k}$ be the matrices of the corresponding right and left eigenvectors, that is, $AX=\lambda X$ and $Y^*A=\lambda Y^*$, such that $Y^*X=I_k$. $\Delta A$ changes the $k$ copies of $\mu$ to $\tilde \mu=\mu+\delta\mu_i$, where $\delta\mu_i$ are the eigenvalues of $Y^*(\Delta A) X$ up to $O(\|\Delta A\|_2^2)$.
Perturbations and the inverse: if $\|A\|_p<1$, then $I-A$ is nonsingular and (for proof see GVL p. 74 (98)) $$ (I-A)^{-1}=\sum\limits_{k=0}^\infty A^k $$ with $$ \|(I-A)^{-1}\|_p \leq \frac{1}{1-\|A\|_p},\qquad \|(I-A)^{-1}-I\|_p \leq \frac{\|A\|_p}{1-\|A\|_p}. $$
Geršgorin Circle Theorem. If $X^{-1} A X=D+F$, where $D=\mathop{\mathrm{diag}}(d_1,\ldots,d_n)$ and $F$ has zero diagonal entries, then (for proof see GVL p.357 (381)) $$\sigma(A) \subseteq \bigcup\limits_{i=1}^n D_i,$$ where $$D_i=\big\{z\in\mathbb{C} : |z-d_i| \leq \sum\limits_{j=1}^n |f_{ij}| \big\}. $$ Moreover, by continuity, if a connected component of $D$ consists of $k$ circles, it contains $k$ eigenvalues.
Bauer-Fike Theorem. If $A$ is diagonalizable and $A=X\Lambda X^{-1}$ is its eigenvalue decomposition, then (for proof see GVL p.357 (381)) $$ \max_i\min_j |\tilde \lambda_i - \lambda_j|\leq \|X^{-1}(\Delta A)X\|_p\leq \kappa_p(X)\|\Delta A\|_p. $$
If $A$ and $\tilde A$ are diagonalizable, there exists $\tau$ such that $$\|\Lambda-\tilde\Lambda_\tau\|_F\leq \sqrt{\kappa_2(X)\kappa_2(\tilde X)}\|\Delta A\|_F. $$ If $\Lambda$ and $\tilde\Lambda$ are real, then $$ \|\Lambda^\uparrow-\tilde\Lambda^\uparrow\|_{2,F} \leq \sqrt{\kappa_2(X)\kappa_2(\tilde X)}\|\Delta A\|_{2,F}. $$
If $A$ is normal, there exists $\tau$ such that $\|\Lambda-\tilde\Lambda_\tau\|_F\leq\sqrt{n}\|\Delta A\|_F$.
Hoffman-Wielandt Theorem. If $A$ and $\tilde A$ are normal, there exists $\tau$ such that $\|\Lambda-\tilde\Lambda_\tau\|_F\leq\|\Delta A\|_F$.
If $A$ and $\tilde A$ are Hermitian, for any unitarily invariant norm $\|\Lambda^\uparrow-\tilde\Lambda^\uparrow\| \leq \|\Delta A\|$. In particular, \begin{align*} \max_i|\lambda^\uparrow_i-\tilde\lambda^\uparrow_i|&\leq \|\Delta A\|_2,\\ \sqrt{\sum_i(\lambda^\uparrow_i-\tilde\lambda^\uparrow_i)^2}&\leq \|\Delta A\|_F. \end{align*}
Residual bounds. Let $A$ be Hermitian. For some $\tilde\lambda\in\mathbb{R}$ and $\tilde x\in\mathbb{C}^n$ with $\|\tilde x\|_2=1$, define residual $r=A\tilde x-\tilde\lambda\tilde x$. Then $|\tilde\lambda-\lambda|\leq \|r\|_2$ for some $\lambda\in\sigma(A)$.
Let, in addition, $\tilde\lambda=\tilde x^* A\tilde x$, let $\lambda$ be closest to $\tilde\lambda$ and $x$ be its unit eigenvector, and let $$\eta=\mathop{\mathrm{gap}}(\tilde\lambda)= \min_{\lambda\neq\mu\in\sigma(A)}|\tilde\lambda-\mu|.$$ If $\eta>0$, then $$ |\tilde\lambda-\lambda|\leq \frac{\|r\|_2^2}{\eta},\quad \sin\theta(x,\tilde x)\leq \frac{\|r\|_2}{\eta}. $$
Let $A$ be Hermitian, $X\in\mathbb{C}^{n\times k}$ have full column rank, and $M\in\mathcal{H}_k$ having eigenvalues $\mu_1\leq\mu_2\leq\cdots\leq\mu_k$. Set $R=AX-XM$. Then there exist $\lambda_{i_1}\leq\lambda_{i_2}\leq\cdots\leq\lambda_{i_k}\in\sigma(A)$ such that \begin{align*} \max_{1\leq j\leq k} |\mu_j-\lambda_{i_j}|& \leq \frac{\|R\|_2}{\sigma_{\min}(X)},\\ \sqrt{\sum_{j=1}^k (\mu_j-\lambda_{i_j})^2}&\leq \frac{\|R\|_F}{\sigma_{\min}(X)}. \end{align*} (The indices $i_j$ need not be the same in the above formulae.)
If, additionally, $X^*X=I$ and $M=X^*AX$, and if all but $k$ of $A$'s eigenvalues differ from every one of $M$'s eigenvalues by at least $\eta>0$, then $$\sqrt{\sum_{j=1}^k (\mu_j-\lambda_{i_j})^2}\leq \frac{\|R\|_F^2}{\eta\sqrt{1-\|R\|_F^2/\eta^2}}.$$
Let $A=\begin{bmatrix} M & E^* \\ E & H \end{bmatrix}$ and $\tilde A=\begin{bmatrix} M & 0 \\ 0 & H \end{bmatrix}$ be Hermitian, and set $\eta=\min |\mu-\nu|$ over all $\mu\in\sigma(M)$ and $\nu\in\sigma(H)$. Then $$ \max |\lambda_j^\uparrow -\tilde\lambda_j^\uparrow| \leq \frac{2\|E\|_2^2}{\eta+\sqrt{\eta^2+4\|E\|_2^2}}. $$
Let $$ \begin{bmatrix} X_1^*\\ X_2^* \end{bmatrix} A \begin{bmatrix} X_1 & X_2 \end{bmatrix}= \begin{bmatrix} A_1 & \\ & A_2 \end{bmatrix}, \quad \begin{bmatrix} X_1 & X_2 \end{bmatrix} \textrm{unitary}, \quad X_1\in\mathbb{C}^{n\times k}. $$ Let $Q\in\mathbb{C}^{n\times k}$ have orthonormal columns and for a Hermitian $k\times k$ matrix $M$ set $R=AQ-QM$. Let $\eta=\min|\mu-\nu|$ over all $\mu\in\sigma(M)$ and $\nu\in\sigma(A_2)$. If $\eta > 0$, then $$ \|\sin\Theta(X_1,Q)\|_F\leq \frac{\|R\|_F}{\eta}. $$
In [19]:
A=[-3 7 -1; 6 8 -2; 72 -28 19]
Out[19]:
In [20]:
# (Right) eigenvectors
λ,X=eigen(A)
λ
Out[20]:
In [21]:
X
Out[21]:
In [22]:
cond(X)
Out[22]:
In [23]:
X[:,2]+X[:,3]
Out[23]:
In [24]:
# Left eigenvectors
λ1,Y=eigen(Matrix(A'))
Out[24]:
In [25]:
cond(Y)
Out[25]:
In [26]:
# Try k=2,3
k=3
Y[:,k]'*A-λ[k]*Y[:,k]'
Out[26]:
In [27]:
ΔA=rand(3,3)/20
B=A+ΔA
Out[27]:
In [28]:
norm(ΔA)
Out[28]:
In [29]:
μ,Z=eigen(B)
Out[29]:
In [30]:
# Fact 2
δλ=μ[1]-λ[1]
Out[30]:
In [31]:
k=1
Y[:,k]'*ΔA*X[:,k]/(Y[:,k]⋅X[:,k])
Out[31]:
In [32]:
n=10
c=0.5
J=Bidiagonal(c*ones(n),ones(n-1),'U')
Out[32]:
In [33]:
# Accurately defined eigenvalues
λ=eigvals(J)
Out[33]:
In [34]:
# Only one eigenvector
X=eigvecs(J)
Out[34]:
In [35]:
x=eigvecs(J)[:,1]
y=eigvecs(J')[:,1]
Out[35]:
In [36]:
y'*Matrix(J)-0.5*y'
Out[36]:
In [37]:
# Just one perturbed element in the lower left corner
ΔJ=sqrt(eps())*[zeros(n-1);1]*Matrix(I,1,n)
Out[37]:
In [38]:
B=J+ΔJ
μ=eigvals(B)
Out[38]:
In [39]:
# Fact 2
maximum(abs,λ-μ)
Out[39]:
In [40]:
y'*ΔJ*x/(y⋅x)
Out[40]:
However, since $B$ is diagonalizable, we can apply Bauer-Fike theorem to it:
In [41]:
μ,Y=eigen(B)
Out[41]:
In [42]:
cond(Y)
Out[42]:
In [43]:
opnorm(inv(Y)*ΔJ*Y), cond(Y)*opnorm(ΔJ)
Out[43]:
In [44]:
using SpecialMatrices
In [46]:
n=6
C=SpecialMatrices.Circulant(rand(-5:5,n))
Out[46]:
In [47]:
λ=eigvals(Matrix(C))
Out[47]:
In [48]:
ΔC=rand(n,n)*0.0001
Out[48]:
In [49]:
@show opnorm(ΔC)
μ=eigvals(C+ΔC)
Out[49]:
In [50]:
m=10
n=6
A=rand(m,n)
# Some scaling
D=Diagonal((rand(n).-0.5)*exp(20))
A=A*D
Out[50]:
In [51]:
using Statistics
A=cor(A)
Out[51]:
In [52]:
ΔA=cor(rand(m,n)*D)*1e-5
Out[52]:
In [53]:
B=A+ΔA
Out[53]:
In [54]:
λ,U=eigen(A)
μ=eigvals(B)
[λ μ]
Out[54]:
In [55]:
norm(ΔA)
Out[55]:
In [56]:
# ?round
In [57]:
# Residual bounds - how close is μ, y to λ[2],X[:,2]
k=3
μ=round(λ[k],sigdigits=2)
y=round.(U[:,k],sigdigits=2)
y=y/norm(y)
Out[57]:
In [58]:
μ
Out[58]:
In [59]:
# Fact 9
r=A*y-μ*y
Out[59]:
In [60]:
minimum(abs,μ.-λ), norm(r)
Out[60]:
In [61]:
# Fact 10 - μ is Rayleigh quotient
μ=y⋅(A*y)
r=A*y-μ*y
Out[61]:
In [62]:
η=min(abs(μ-λ[k-1]),abs(μ-λ[k+1]))
Out[62]:
In [63]:
μ-λ[k], norm(r)^2/η
Out[63]:
In [64]:
# Eigenvector bound
# cos(θ)
cosθ=dot(y,U[:,k])
# sin(θ)
sinθ=sqrt(1-cosθ^2)
sinθ,norm(r)/η
Out[64]:
In [65]:
# Residual bounds - Fact 13
U=eigvecs(A)
Q=round.(U[:,1:3],sigdigits=2)
# Orthogonalize
F=qr(Q)
X=Matrix(F.Q)
M=X'*A*X
R=A*X-X*M
μ=eigvals(M)
R
Out[65]:
In [66]:
λ
Out[66]:
In [67]:
μ
Out[67]:
In [68]:
# The entries of μ are not ordered - which algorithm was called?
issymmetric(M)
Out[68]:
In [69]:
M=Hermitian(M)
R=A*X-X*M
μ=eigvals(M)
Out[69]:
In [70]:
η=λ[4]-λ[3]
Out[70]:
In [71]:
norm(λ[1:3]-μ), norm(R)^2/η
Out[71]:
In [72]:
A
Out[72]:
In [73]:
# Fact 13
M=A[1:3,1:3]
H=A[4:6,4:6]
E=A[4:6,1:3]
# Block-diagonal matrix
B=cat(M,H,dims=(1,2))
Out[73]:
In [74]:
η=minimum(abs,eigvals(M)-eigvals(H))
μ=eigvals(B)
[λ μ]
Out[74]:
In [75]:
2*norm(E)^2/(η+sqrt(η^2+4*norm(E)^2))
Out[75]:
In [76]:
# Eigenspace bounds - Fact 14
B=A+ΔA
μ,V=eigen(B)
Out[76]:
In [77]:
# sin(Θ(U[:,1:3],V[:,1:3]))
X=U[:,1:3]
Q=V[:,1:3]
cosθ=svdvals(sqrt(Q'*Q)*Q'*X*sqrt(X'*X))
sinθ=sqrt.(1 .-cosθ.^2)
Out[77]:
In [78]:
# Bound
M=Q'*A*Q
Out[78]:
In [79]:
R=A*Q-Q*M
Out[79]:
In [80]:
eigvals(M), λ
Out[80]:
In [81]:
η=abs(eigvals(M)[3]-λ[4])
norm(sinθ), norm(R)/η
Out[81]: