Julia Issue Note #5705

Linear algebra test failures for linear solve with triangular matrices

Recent Julia issues like #5705 have adjusted the bounds on linear algebra tests. The purpose of this notebook is to provide more extensive numerical demonstrations of why the current linear algebra tests are fundamentally broken and need to be systematically replaced by more theoretically justified error bounds, as explained in #5605.

The current testing strategy

Here is a snippet of code representing a typical linear algebra test in Julia:



In [1]:

    
#Test linear solve on upper triangular matrices
n=10
eltya=Float64 #eltype of matrix
eltyb=Float64 #eltype of vector

A=convert(Matrix{eltya}, randn(n,n))
A=triu(A)
b=convert(Vector{eltyb}, randn(n))

ε = max(eps(abs(float(one(eltya)))),eps(abs(float(one(eltyb)))))
x = A \ b
#@test_approx_eq_eps triu(a)*x b 20000ε
#Instead of the actual assertion, show the values being compared
@show norm(A*x), norm(b), 20000ε
abs(norm(A*x) - norm(b))/ε*1









    



(norm(*(A,x)),norm(b),*(20000,ε)) => (4.026090098980171,4.026090098980171,4.440892098500626e-12)






    Out[1]:





0.0

Mathematically, the test being performed corresponds to asserting that the inequality

$$ \left\vert \left\Vert A x\right\Vert - \left\Vert b \right\Vert \right\vert \le C \epsilon $$

is valid, where $\epsilon$ is machine epsilon and $C = 20000$ is an arbitrary magic constant.

Most of the time, this test will appear to be well-behaved. The deviation between $Ax$ and $b$ appears to be small in norm, and it looks like the magic number 20000 would obviously cover any contingency. But how robust is this observation?

Distribution of magic numbers



In [2]:

    
t=10^6
n=10
T=Float64

b=randn(n)
c=zeros(t) #log10 of magic numbers
ε=eps(real(one(T)))
for i=1:t
    A = convert(Matrix{T}, triu(randn(n,n)))
    x = A \ b
    c[i] = max(log10(abs(norm(A*x)-norm(b))/ε), -2) #arbitrary floor for plotting 0 on log scale
end



In [3]:

    
@show minimum(c), maximum(c)
x, y = hist(c, 50)









    



(minimum(c),maximum(c)) => (-2.0,12.659850021280592)






    Out[3]:





(-2.5:0.5:13.0,[67288,0,0,0,0,99814,180104,156724,160350,120193  …  65,24,14,0,5,0,0,0,0,1])

The statistics show that every now and then, a nearly singular matrix is sampled, causing the magic number to blow up.

Where do the old and new bounds $15000\epsilon$ and $20000\epsilon$ lie on the distribution?



In [4]:

    
using Gadfly
plot(x=x, y=y/(t*(x[2]-x[1])), Geom.bar, Guide.XLabel("log10(c)"), Guide.YLabel("Density"), 
     xintercept=[log10(15000), log10(20000)], Geom.vline(color="orange"))

Despite the large change in the absolute magnitude of the magic constant, we see that this only shifts the threshold slightly on this histogram of $log(c)$, thanks to the extremely long tail in the magic number $c$. What is the probability that a given random matrix will fail the old and new cutoffs?



In [5]:

    
for cutoff in (15000, 20000)
    p = sum([c .> log10(cutoff)])/t
    @show cutoff, p
end









    



(cutoff,p) => 





    



WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:58.
Use "x[i:end]" instead.

WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:71.
Use "x[i:end]" instead.

WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:77.
Use "x[i:end]" instead.

WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:92.
Use "x[i:end]" instead.

WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:106.
Use "x[i:end]" instead.

WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:111.
Use "x[i:end]" instead.

WARNING: deprecated syntax "x[i:]" at /Users/jiahao/.julia/v0.3/DataFrames/src/formula.jl:122.
Use "x[i:end]" instead.






    



(15000,0.032434)
(cutoff,p) => (20000,0.027863)

The change of the magic number in PR #5707 decreases the probability of failure by an essentially negligible amount.

Plotting the failure rate vs the magic constant $log(c)$ shows that in the grand scheme of things, the change in the magic constant does very little, and it would take an enormous shift in $c$ to make the failure rate negligible, so much so that the test would be practically useless.



In [6]:

    
plot(x=x, y=1-cumsum(y)/t, Geom.line, Guide.XLabel("log10(c)"), Guide.YLabel("Probability of failure"),
xintercept=[log10(15000), log10(20000)], Geom.vline(color="Orange"))









    













    Out[6]:

Theoretical error bounds on linear solvers

Let's have a look at which method is being dispatched by the linear solver.



In [7]:

    
@which A\b
@which \(Triangular(A), b)
@which Base.LinAlg.naivesub!(Triangular(A), b)









    Out[7]:




naivesub!(A::Triangular{T<:Number},b::AbstractArray{T,1}) at linalg/triangular.jl:127

For a triangular matrix $A$, this either dispatches to the LAPACK xTRTRS subroutine, which in turn calls the substitution solver xTRSM, or the naïve substitution solver Base.LinAlg.naivesub!, depending on the element type.

What does theory tell us about the error bounds on substitution? You can derive an error bound using first-order matrix perturbation theory, or if you're lazy, consult a reference like Higham's Accuracy and Stability of Numerical Algorithms, Ch. 8 (pdf).

Error bound results usually come in two flavors. First, we have forward error bounds, which bound the norm of the difference between the computed solution $\hat{x}$ and the exact solution $x$.

For the triangular system, we have the forward error bound (Higham, Theorem 8.5):

$$ \frac{\left\Vert x - \hat{x} \right\Vert_\infty}{\left\Vert x \right\Vert_\infty} \le \frac{\textrm{cond}(A,x) \gamma_n}{1 - \textrm{cond}(T) \gamma_n} $$

with constant

$$ \gamma_n = \frac{n\epsilon}{1-n\epsilon} $$

and the Skeel condition numbers

$$ \textrm{cond}(A,x) = \frac{\left\Vert \left\vert A \right\vert \left\vert A^{-1} \right\vert \left\vert x \right\vert \right\Vert_\infty}{\left\Vert x \right\Vert_\infty} $$

and

$$ \textrm{cond}(A) = \left\Vert \left\vert A \right\vert \left\vert A^{-1} \right\vert \right\Vert_\infty $$

where $|A|$ means that the modulus is taken componentwise for every element of $A$.

Second, we also have backward error bounds, which assert that the computed solution $\hat{x} = A\backslash b$ is an exact solution to a slightly perturbed problem $(A + \delta A) \hat{x} = b + \delta b$. Sometimes results are also known if only $A$ or only $b$ is assumed to be perturbed.

For the triangular system, we have the backward error bound (Higham, Theorem 8.3)

$$ \left\vert \delta A \right\vert \le \gamma_n \left\vert A \right\vert $$

where only the matrix $A$ is perturbed.

Forward error bound

Ordinarily, this error bound is useless in practice since the exact solution $x$ is not known. However, thanks to Julia's BigFloats, we can approximate the exact solution with that returned by big(A) \ big(b).



In [8]:

    
#Define the Skeel condition numbers
cond_skeel(A::AbstractMatrix) = norm(abs(inv(A))*abs(A), Inf)
cond_skeel(A::AbstractMatrix, x::AbstractVector) = norm(abs(inv(A))*abs(A)*abs(x), Inf)









    Out[8]:





cond_skeel (generic function with 2 methods)



In [9]:

    
t=10^4
n=10
T=Float64

b=randn(n)
c=zeros(t) #log10 of magic numbers
r=zeros(t) #ratio of lhs/rhs in inequality
ε=eps(real(one(T)))
γ = n*ε/(1-n*ε) #γ_n
for i=1:t
    A = convert(Matrix{T}, triu(randn(n,n)))
    ̂x = A \ b #we called this x earlier
    c[i] = max(log10(abs(norm(A* ̂x)-norm(b))/ε), -2) #arbitrary floor for plotting 0 on log scale
    bigA = big(A)
    x = bigA \ b #x is now the exact solution
    lhs = norm(̂x-x,Inf)/norm(x,Inf) 
    rhs = cond_skeel(bigA, x)*γ/(1-cond_skeel(bigA)*γ)
    r[i] = lhs/rhs
end



In [10]:

    
plot(x=c, y=r, Guide.XLabel("log10(c)"), Guide.YLabel("lhs/rhs"))

There doesn't seem to be much correlation between the originally coded test and the test using the forward error bound. But at least the forward error bound test seems to be robust; every point has lhs <= rhs.



In [11]:

    
x, y=hist(r, 100)
plot(x=x, y=y/(t*(x[2]-x[1])), Geom.bar, Guide.XLabel("lhs/rhs"), Guide.YLabel("Density"),
xintercept=[1.0], Geom.vline(color="orange"))



In [12]:

    
mean(r), std(r)/sqrt(t-1) #mean with its standard error









    Out[12]:





(8.240937806544264e-5,3.067119378454114e-6)

The inequality seems to be fairly loose, but at least nothing violates it.

Backward error bound

From the implicit definition of the perturbed matrix $\delta A$ in

$$ (A + \delta A) \hat{x} = b $$

we can rearrange this to get

$$ \delta A \hat{x} = b - A \hat{x} $$

And taking the norm gives an inequality on the residual $$ \left\Vert b - A \hat{x} \right\Vert = \left\Vert \delta A \hat{x} \right\Vert = \left\Vert \left\vert \delta A \hat{x} \right\vert \right\Vert \le \gamma_n \left\Vert \left\vert A \right\vert \left\vert \hat{x} \right\vert \right\Vert \le \gamma_n \left\Vert A \right\Vert \left\Vert \hat{x} \right\Vert. $$



In [13]:

    
t=10^4
n=10
T=Float64

b=randn(n)
c=zeros(t) #log10 of magic numbers
r=zeros(t) #ratio of lhs/rhs in inequality
ε=eps(real(one(T)))
γ = n*ε/(1-n*ε) #γ_n
for i=1:t
    A = convert(Matrix{T}, triu(randn(n,n)))
    x = A \ b
    c[i] = max(log10(abs(norm(A*x)-norm(b))/ε), -2) #arbitrary floor for plotting 0 on log scale
    resid = norm(abs(b - A*x))
    rhs = γ * norm(A) * norm(x)
    r[i] = resid/rhs
end



In [14]:

    
plot(x=c, y=r, Guide.XLabel("log10(c)"), Guide.YLabel("lhs/rhs"))

The original Julia test doesn't seem to correlate with the test derived from backward error result either. But at least none of the tests fail so far.



In [15]:

    
x, y = hist(r, 25)
plot(x=x, y=y/(t*(x[2]-x[1])), Geom.bar, Guide.XLabel("lhs/rhs"), Guide.YLabel("Density"))



In [16]:

    
mean(r), std(r)/sqrt(t-1)









    Out[16]:





(0.005644636805875966,4.95668601105437e-5)

Summary

This Issue Note describes two replacement tests for the existing test for the linear solve with a triangular matrix. Numerical testing indicates that the new tests are more robust and do not exhibit any spurious failures when applied to a moderate number of randomly sampled matrices.