Suppose we are given a data matrix $A$, and know that it has a form $$A=L+S,$$ where $L$ is of low-rank and $S$ is sparse, but we know neither the rank of $L$ nor the non-zero entries of $S$.
Can we recover $L$ and $S$?
The answer is YES, with high probability and with an efficient algorithm.
Sparse + Low-rank splitting can be successfully applied to video surveilance, face recognition, latent semantic indexing, and ranking and collaborative filtering.
The reader should be familiar with linear algebra concepts, particularly SVD and its properties and algorithms.
The reader should be able to apply sparse + low-rank splitting to real problems.
For more details see E. J. Candes, X. Li, Y. Ma, and J. Wright, Robust Principal Component Analysis?
Credits: The author wishes to thank Dimitar Ninevski, a former IAESTE intern, for collecting and preparing some of the material.
Let $A\in\mathbb{R}^{m\times n}$ have rank $r$, and let $A=U\Sigma V^T$ be its SVD.
The nuclear norm of $A$ is $\|A\|_*=\sum\limits_{i=1}^r \sigma_i(A)$.
Let $\|A\|_1=\sum\limits_{i,j} |A_{ij}|$ denote the $1$-norm of $A$ seen as a long vector.
Let $\|A\|_{\infty}=\max\limits_{i,j} |A_{ij}|$ denote the $\infty$-norm of $A$ seen as a long vector.
Given $\tau>0$, the shrinkage operator $\mathcal{S}_{\displaystyle\tau}:\mathbb{R}\to \mathbb{R}$ is defined by $\mathcal{S}_{\displaystyle\tau}(x)=\mathop{\mathrm{sign}}(x)\max\{|x|-\tau,0\}$, and is extended to matrices by applying it to each element.
Given $\tau>0$, the singluar value thresholding operator is $\mathcal{D}_{\displaystyle\tau}(A)=U \mathcal{S}_{\displaystyle\tau} (\Sigma) V^T$.
In [1]:
# Shrinkage
function Shr(x::Array{T},τ::T) where T
sign.(x).*max.(abs.(x).-τ,zero(T))
end
Out[1]:
In [15]:
import Random
Random.seed!(421)
A=2*rand(3,5).-1
Out[15]:
In [16]:
Shr(A,0.5)
Out[16]:
In [17]:
using LinearAlgebra
In [18]:
# Singular value thresholding
function D(A::Array{T},τ::T) where T
# U,σ,V=svd(A)
# This can be replaced by a faster approach
V=svd(A)
S=Shr(V.S,τ)
k=count(!iszero,S)
return (V.U[:,1:k]*Diagonal(S[1:k]))*V.Vt[1:k,:]
end
Out[18]:
In [19]:
A
Out[19]:
In [20]:
svdvals(A)
Out[20]:
In [21]:
C=D(A,0.5)
Out[21]:
In [22]:
svdvals(C)
Out[22]:
Let $A=L+S$ be the splitting that we are looking for.
The problem can be formulated as $\mathop{\textrm{arg min}}\limits_{\displaystyle \mathop{\mathrm{rank}}(L)\leq k} \|A-L\|_2$.
The problem makes sense if the incoherence conditions $$ \max_{\displaystyle i} \| U_{:,1:r}^T e_i\|_2^2\leq \frac{\mu r}{m}, \quad \max_{\displaystyle i} \| V_{:,1:r}^T e_i\|_2^2\leq \frac{\mu r}{n}, \quad \|UV^T\|_{\infty} \leq \sqrt{\frac{\mu r}{m\cdot n}}, $$ hold for some parameter $\mu$.
If the incoherence conditions are satisfied, the Principal Component Pursuit estimate, $$ \mathop{\textrm{arg min}}\limits_{\displaystyle L+S=A} \|L\|_*+\lambda \|S\|_2, $$ exactly recovers $L$ and $S$.
Principal Component Pursuit by Alternating Directions algorithm finds the above estimate
In [23]:
function PCPAD(A::Array{T}) where T
# Initialize
δ=1.0e-7
tol=δ*norm(A)
m,n=size(A)
S=zero(A)
Y=zero(A)
L=zero(A)
T₁=zero(A)
μ=(m*n)/(4*(norm(A[:],1)))
μ₁=one(T)/μ
λ=one(T)/sqrt(max(m,n))
λμ₁=λ*μ₁
ν=1e20
maxiter=200
iterations=0
# Iterate
while (ν>tol) && iterations<maxiter
# println(iterations," ",ν)
iterations+=1
L=D(A-S+μ₁*Y,μ₁)
S=Shr(A-L+μ₁.*Y,λμ₁)
T₁=(A-L-S)
Y+=(μ.*T₁)
ν=norm(T₁)
end
L,S, iterations
end
Out[23]:
In [24]:
@time L,S,iter=PCPAD(A)
Out[24]:
In [25]:
rank(L),norm(A-L-S)
Out[25]:
In [26]:
L
Out[26]:
In [27]:
S
Out[27]:
In [28]:
# Now the real test
# Dimensions of the matrix
m=100
n=100
# Rank of the low-rank part L
k=10
# Generate L
L=rand(m,k)*rand(k,n)
rank(L)
Out[28]:
In [30]:
using SparseArrays
In [31]:
# Sparsity of the sparse part S
sparsity=0.1
# Generate S
S=10*sprand(m,n,sparsity)
nnz(S)
Out[31]:
In [32]:
# Generate the matrix A, it is a full matrix with full rank
A=L+S
rank(A)
Out[32]:
In [33]:
# Decompose A into L₁ and S₁
@time L₁,S₁,iters=PCPAD(A);
In [34]:
iters, rank(L₁), norm(L), norm(L-L₁), norm(S), norm(S-S₁)
Out[34]:
Although there might be no convergence, the splitting is still good.
In [35]:
S₁
Out[35]:
In [36]:
Matrix(S)
Out[36]:
We will try to recover missing features. The images are chosen from the Yale Face Database.
In [37]:
using Images
In [39]:
# First a single image
img=load("files/17.jpg")
Out[39]:
In [40]:
show(img[1,1])
In [41]:
A=map(Float64,img)
Out[41]:
In [42]:
# Compute the splitting and show number of iterations
@time L,S,iters=PCPAD(A)
iters, rank(L), norm(A), norm(A-L-S)
Out[42]:
In [43]:
colorview(Gray,L)
Out[43]:
In [45]:
# Try S+0.5
colorview(Gray,S.+0.5)
Out[45]:
In [47]:
hcat(colorview(Gray,0.9*L+S.+0.2),img)
Out[47]:
In [48]:
# Another image
img=load("files/19.jpg")
Out[48]:
In [50]:
A=map(Float64,img)
L,S,iters=PCPAD(A)
iters, rank(L), norm(A), norm(A-L-S)
Out[50]:
In [51]:
hcat(img,colorview(Gray,L),colorview(Gray,L+S.+0.3))
Out[51]:
In [52]:
# Load all images in the collection
dir="./files/yaleB08/"
files=readdir(dir)
Out[52]:
In [53]:
# import Pkg; Pkg.add("Netpbm")
In [54]:
n=length(files)-1
images=Array{Any}(undef,n)
B=Array{Any}(undef,n)
for i=1:n
images[i]=Images.load(joinpath(dir,files[i+1]))
B[i]=map(Float64,images[i])
end
In [56]:
# See the images - last 9 images are meaningless
# for i=1:n; display(images[i]); end
In [57]:
# Form the big matrix - each image is converted to a column vector
mi,ni=size(images[1])
A=Array{Float64}(undef,mi*ni,n-9)
for i=1:n-9
A[:,i]=vec(B[i])
end
size(A)
Out[57]:
In [58]:
# Now the big SVDs - 1 minute
@time L,S,iters=PCPAD(A)
iters, rank(L), norm(A), norm(A-L-S)
Out[58]:
In [60]:
reshape(S[:,1],mi,ni)
Out[60]:
In [61]:
for i=1:10 # n-9
Li=reshape(L[:,i],mi,ni)
Si=reshape(S[:,i],mi,ni).+0.3
display(hcat(images[i],colorview(Gray,Li),colorview(Gray,Li+Si)))
end
In [62]:
img=load("files/mona-lisa_1.jpg")
Out[62]:
In [66]:
# As in the notebook S03
imgsep=map(Float64,channelview(img))
Out[66]:
In [67]:
size(imgsep)
Out[67]:
In [68]:
# 1-2 minutes
# Red
@time RL,RS,Riter=PCPAD(imgsep[1,:,:])
# Green
GL,GS,Giter=PCPAD(imgsep[2,:,:])
# Blue
BL,BS,Biter=PCPAD(imgsep[1,:,:])
Out[68]:
In [69]:
Giter, rank(GL), norm(imgsep[2,:,:]), norm(imgsep[2,:,:]-GL-GS)
Out[69]:
In [70]:
# Mona Lisa's low-rank component
colorview(RGB,RL,GL,BL)
Out[70]:
In [71]:
# Mona Lisa's sparse component
colorview(RGB,RS.+0.5,GS.+0.5,BS.+0.5)
Out[71]:
In [50]:
img=load("files/Mona_Lisa_detail_hands.jpg")
Out[50]:
In [51]:
imgsep=float(channelview(img))
@time RL,RS,Riter=PCPAD(imgsep[1,:,:])
GL,GS,Giter=PCPAD(imgsep[2,:,:])
BL,BS,Biter=PCPAD(imgsep[1,:,:])
Giter, rank(GL), norm(imgsep[2,:,:]),
norm(imgsep[2,:,:]-GL-GS)
Out[51]:
In [52]:
colorview(RGB,RL,GL,BL)
Out[52]:
In [53]:
colorview(RGB,RS.+0.5,GS.+0.5,BS.+0.5)
Out[53]:
In [ ]: