Since computing the SVD of $A$ can be seen as computing the EVD of the symmetric matrices $A^*A$, $AA^*$, or $\begin{bmatrix}0 & A \\ A^* & 0 \end{bmatrix}$, simple modifications of the corresponding EVD algorithms yield version for computing the SVD.
For more details on one-sided Jacobi method, see Z. Drmač, Computing Eigenvalues and Singular Values to High Relative Accuracy and the references therein.
The reader should be familiar with concepts of singular values and vectors, related perturbation theory, and algorithms, and Jacobi and Lanczos methods for the symmetric eigenvalue decomposition.
The reader should be able to recognise matrices which warrant high relative accuracy and to apply Jacobi method to them. The reader should be able to recognise matrices to which Lanczos method can be efficiently applied and do so.
Let $A\in\mathbb{R}^{m\times n}$ with $\mathop{\mathrm{rank}}(A)=n$ (therefore, $m\geq n$) and $A=U\Sigma V^T$ its thin SVD.
Let $A=BD$, where $D=\mathop{\mathrm{diag}} (\| A_{:,1}\|_2, \ldots, \|A_{:,n}\|_2)$ is a diagonal scaling , and $B$ is the scaled matrix of $A$ from the right. Then $[B^T B]_{i,i}=1$.
Let $\tilde U$, $\tilde V$ and $\tilde \Sigma$ be the approximations of $U$, $V$ and $\Sigma$, respectively, computed by a backward stable algorithm as $A+\Delta A=\tilde U\tilde \Sigma \tilde V^T$. Since the orthogonality of $\tilde U$ and $\tilde V$ cannot be guaranteed, this product in general does not represent and SVD. There exist nearby orthogonal matrices $\hat U$ and $\hat V$ such that $(I+E_1)(A+\Delta A)(I+E_2)=\hat U \tilde \Sigma \hat V^T$, where departures from orthogonalithy, $E_1$ and $E_2$, are small in norm.
Standard algorithms compute the singular values with backward error $\| \Delta A\|\leq \phi\varepsilon \|A\|_2$, where $\varepsilon$ is machine precision and $\phi$ is a slowly growing function og $n$. The best error bound for the singular values is $|\sigma_j-\tilde \sigma_j|\leq \| \Delta A\|_2$, and the best relative error bound is $$ \max_j \frac{|\sigma_j-\tilde\sigma_j|}{\sigma_j}\leq \frac{\| \Delta A \|_2}{\sigma_j} \leq \phi \varepsilon \kappa_2(A). $$
Let $\|[\Delta A]_{:,j}\|_2\leq \varepsilon \| A_{:,j}\|_2$ for all $j$. Then $A+\Delta A=(B+\Delta B)D$ and $\|\Delta B\|_F\leq \sqrt{n}\varepsilon$, and $$ \max_j \frac{|\sigma_j-\tilde\sigma_j|}{\sigma_j}\leq \| (\Delta B) B^{\dagger} \|_2\leq \sqrt{n}\varepsilon \| B^{\dagger}\|_2. $$ This is Fact 3 from the Relative perturbation theory.
It holds $$ \| B^{\dagger} \| \leq \kappa_2(B) \leq \sqrt{n} \min_{S=\mathop{\mathrm{diag}}} \kappa_2(A\cdot S)\leq \sqrt{n}\,\kappa_2(A). $$ Therefore, numerical algorithm with column-wise small backward error computes singular values more accurately than an algorithm with small norm-wise backward error.
In each step, one-sided Jacobi method computes the Jacobi rotation matrix from the pivot submatrix of the current Gram matrix $A^TA$. Afterwards, $A$ is multiplied with the computed rotation matrix from the right (only two columns are affected). Convergence of the Jacobi method for the symmetric matrix $A^TA$ to a diagonal matrix, implies that the matrix $A$ converges to the matrix $AV$ with orthogonal columns and $V^TV=I$. Then $AV=U\Sigma$, $\Sigma=\mathop{\mathrm{diag}}(\| A_{:,1}\|_2, \ldots, \| A_{:,n}\|_2)$, $U=AV\Sigma^{-1}$, and $A=U\Sigma V^T$ is the SVD of $A$.
One-sided Jacobi method computes the SVD with error bound from Facts 2 and 3, provided that the condition of the intermittent scaled matrices does not grow much. There is overwhelming numerical evidence for this. Alternatively, if $A$ is square, the one-sided Jacobi method can be applied to the transposed matrix $A^T=DB^T$ and the same error bounds apply, but the condition of the scaled matrix (this time from the left) does not change. This approach is slower.
One-sided Jacobi method can be preconditioned by applying one QR factorization with full pivoting and one QR factorization withour pivoting to $A$, to obtain faster convergence, without sacrifying accuracy. This method is implemented in the LAPACK routine
DGESVJ.
Writing the wrapper for DGESVJ
is a tutorial assignment.
In [1]:
using LinearAlgebra
In [8]:
function myJacobiR(A1::AbstractMatrix)
A=deepcopy(A1)
m,n=size(A)
T=typeof(A[1,1])
V=Matrix{T}(I,n,n)
# Tolerance for rotation
tol=sqrt(map(T,n))*eps(T)
# Counters
p=n*(n-1)/2
sweep=0
pcurrent=0
# First criterion is for standard accuracy, second one is for relative accuracy
# while sweep<30 && vecnorm(A-diagm(diag(A)))>tol
while sweep<30 && pcurrent<p
sweep+=1
# Row-cyclic strategy
for i = 1 : n-1
for j = i+1 : n
# Compute the 2 x 2 sumbatrix of A'*A
F=A[:,[i,j]]'*A[:,[i,j]]
# Check the tolerance - the first criterion is standard,
# the second one is for relative accuracy
# if A[i,j]!=zero(T)
#
if abs(F[1,2])>tol*sqrt(F[1,1]*F[2,2])
# Compute c and s
τ=(F[1,1]-F[2,2])/(2*F[1,2])
t=sign(τ)/(abs(τ)+sqrt(1+τ^2))
c=1/sqrt(1+t^2)
s=c*t
G=LinearAlgebra.Givens(i,j,c,s)
# A*=G'
rmul!(A,adjoint(G))
# V*=G'
rmul!(V,adjoint(G))
pcurrent=0
else
pcurrent+=1
end
end
end
end
σ=[norm(A[:,k]) for k=1:n]
for k=1:n
A[:,k]./=σ[k]
end
# A, σ, V
SVD(A,σ,adjoint(V))
end
Out[8]:
In [9]:
m=8
n=5
import Random
Random.seed!(432)
A=rand(-9.0:9,m,n)
Out[9]:
In [10]:
U,σ,V=myJacobiR(A)
Out[10]:
In [11]:
# Residual
A*V-U*Diagonal(σ)
Out[11]:
In [13]:
# Orthogonality
norm(U'*U-I),norm(V'*V-I)
Out[13]:
In [20]:
m=20
n=15
B=rand(m,n)
D=exp.(50*(rand(n).-0.5))
A=B*Diagonal(D)
Out[20]:
In [21]:
cond(B), cond(A)
Out[21]:
In [22]:
U,σ,V=myJacobiR(A);
In [23]:
[sort(σ,rev=true) svdvals(A)]
Out[23]:
In [24]:
(sort(σ,rev=true)-svdvals(A))./sort(σ,rev=true)
Out[24]:
In [25]:
norm(A*V-U*Diagonal(σ))
Out[25]:
In [26]:
U'*A*V
Out[26]:
In the alternative approach, we first apply QR factorization with column pivoting to obtain the square matrix.
In [14]:
# ?qr
In [27]:
Q=qr(A,Val(true))
Out[27]:
In [28]:
diag(Q.R)
Out[28]:
In [29]:
Q.Q
Out[29]:
In [30]:
Matrix(Q.Q)
Out[30]:
In [31]:
# Residual
norm(Q.Q*Q.R-A[:,Q.p])
Out[31]:
In [33]:
Q.p
Out[33]:
In [38]:
UR,σR,VR=myJacobiR(Q.R')
Out[38]:
In [39]:
(sort(σ)-sort(σR))./sort(σ)
Out[39]:
Now $QRP^T=A$ and $R^T=U_R\Sigma_R V^T_R$, so
$$ A=(Q V_R) \Sigma_R (U_R^T P^T) $$is an SVD of $A$.
In [40]:
# Check the residual
U₁=Q.Q*VR
V₁=UR[invperm(Q.p),:]
norm(A*V₁-U₁*Diagonal(σR))
Out[40]:
In [41]:
using Arpack
In [44]:
?svds;
In [45]:
m=20
n=15
A=rand(m,n);
In [46]:
U,σ,V=svd(A);
In [48]:
# Some largest singular values
k=6
σ₁,rest=svds(A,nsv=k);
(σ[1:k]-σ₁.S)./σ[1:k]
Out[48]:
In [49]:
m=2000
n=1500
Ab=rand(m,n);
In [50]:
@time Ub,σb,Vb=svd(Ab);
In [52]:
# This is rather slow
k=10
@time σl,rest=svds(Ab,nsv=k);
In [53]:
(σb[1:k]-σl.S)./σb[1:k]
Out[53]:
In [54]:
using SparseArrays
In [57]:
?sprand;
In [59]:
m=10000
n=3000
A=sprand(m,n,0.05)
Out[59]:
In [60]:
# No vectors, this takes about 5 sec.
k=100
@time σ₁,rest=svds(A,nsv=k,ritzvec=false)
Out[60]:
In [61]:
# Full matrix
@time σ₂=svdvals(Matrix(A));
In [62]:
maximum(abs,(σ₁.S-σ₂[1:k])./σ₂[1:k])
Out[62]:
In [ ]: