Recent Julia issues like #5705 have adjusted the bounds on linear algebra tests. The purpose of this notebook is to provide more extensive numerical demonstrations of why the current linear algebra tests are fundamentally broken and need to be systematically replaced by more theoretically justified error bounds, as explained in #5605.
Here is a snippet of code representing a typical linear algebra test in Julia:
In [1]:
#Test linear solve on upper triangular matrices
n=10
eltya=Float64 #eltype of matrix
eltyb=Float64 #eltype of vector
A=convert(Matrix{eltya}, randn(n,n))
A=triu(A)
b=convert(Vector{eltyb}, randn(n))
ε = max(eps(abs(float(one(eltya)))),eps(abs(float(one(eltyb)))))
x = A \ b
#@test_approx_eq_eps triu(a)*x b 20000ε
#Instead of the actual assertion, show the values being compared
@show norm(A*x), norm(b), 20000ε
abs(norm(A*x) - norm(b))/ε*1
Out[1]:
Mathematically, the test being performed corresponds to asserting that the inequality
$$ \left\vert \left\Vert A x\right\Vert - \left\Vert b \right\Vert \right\vert \le C \epsilon $$is valid, where $\epsilon$ is machine epsilon and $C = 20000$ is an arbitrary magic constant.
Most of the time, this test will appear to be well-behaved. The deviation between $Ax$ and $b$ appears to be small in norm, and it looks like the magic number 20000 would obviously cover any contingency. But how robust is this observation?
In [2]:
t=10^6
n=10
T=Float64
b=randn(n)
c=zeros(t) #log10 of magic numbers
ε=eps(real(one(T)))
for i=1:t
A = convert(Matrix{T}, triu(randn(n,n)))
x = A \ b
c[i] = max(log10(abs(norm(A*x)-norm(b))/ε), -2) #arbitrary floor for plotting 0 on log scale
end
In [3]:
@show minimum(c), maximum(c)
x, y = hist(c, 50)
Out[3]:
The statistics show that every now and then, a nearly singular matrix is sampled, causing the magic number to blow up.
Where do the old and new bounds $15000\epsilon$ and $20000\epsilon$ lie on the distribution?
In [4]:
using Gadfly
plot(x=x, y=y/(t*(x[2]-x[1])), Geom.bar, Guide.XLabel("log10(c)"), Guide.YLabel("Density"),
xintercept=[log10(15000), log10(20000)], Geom.vline(color="orange"))
Out[4]:
Despite the large change in the absolute magnitude of the magic constant, we see that this only shifts the threshold slightly on this histogram of $log(c)$, thanks to the extremely long tail in the magic number $c$. What is the probability that a given random matrix will fail the old and new cutoffs?
In [5]:
for cutoff in (15000, 20000)
p = sum([c .> log10(cutoff)])/t
@show cutoff, p
end
The change of the magic number in PR #5707 decreases the probability of failure by an essentially negligible amount.
Plotting the failure rate vs the magic constant $log(c)$ shows that in the grand scheme of things, the change in the magic constant does very little, and it would take an enormous shift in $c$ to make the failure rate negligible, so much so that the test would be practically useless.
In [6]:
plot(x=x, y=1-cumsum(y)/t, Geom.line, Guide.XLabel("log10(c)"), Guide.YLabel("Probability of failure"),
xintercept=[log10(15000), log10(20000)], Geom.vline(color="Orange"))
Out[6]:
In [7]:
@which A\b
@which \(Triangular(A), b)
@which Base.LinAlg.naivesub!(Triangular(A), b)
Out[7]:
For a triangular matrix $A$, this either dispatches to the LAPACK xTRTRS subroutine, which in turn calls the substitution solver xTRSM, or the naïve substitution solver Base.LinAlg.naivesub!, depending on the element type.
What does theory tell us about the error bounds on substitution? You can derive an error bound using first-order matrix perturbation theory, or if you're lazy, consult a reference like Higham's Accuracy and Stability of Numerical Algorithms, Ch. 8 (pdf).
Error bound results usually come in two flavors. First, we have forward error bounds, which bound the norm of the difference between the computed solution $\hat{x}$ and the exact solution $x$.
For the triangular system, we have the forward error bound (Higham, Theorem 8.5):
$$ \frac{\left\Vert x - \hat{x} \right\Vert_\infty}{\left\Vert x \right\Vert_\infty} \le \frac{\textrm{cond}(A,x) \gamma_n}{1 - \textrm{cond}(T) \gamma_n} $$with constant
$$ \gamma_n = \frac{n\epsilon}{1-n\epsilon} $$and the Skeel condition numbers
$$ \textrm{cond}(A,x) = \frac{\left\Vert \left\vert A \right\vert \left\vert A^{-1} \right\vert \left\vert x \right\vert \right\Vert_\infty}{\left\Vert x \right\Vert_\infty} $$and
$$ \textrm{cond}(A) = \left\Vert \left\vert A \right\vert \left\vert A^{-1} \right\vert \right\Vert_\infty $$where $|A|$ means that the modulus is taken componentwise for every element of $A$.
Second, we also have backward error bounds, which assert that the computed solution $\hat{x} = A\backslash b$ is an exact solution to a slightly perturbed problem $(A + \delta A) \hat{x} = b + \delta b$. Sometimes results are also known if only $A$ or only $b$ is assumed to be perturbed.
For the triangular system, we have the backward error bound (Higham, Theorem 8.3)
$$ \left\vert \delta A \right\vert \le \gamma_n \left\vert A \right\vert $$where only the matrix $A$ is perturbed.
In [8]:
#Define the Skeel condition numbers
cond_skeel(A::AbstractMatrix) = norm(abs(inv(A))*abs(A), Inf)
cond_skeel(A::AbstractMatrix, x::AbstractVector) = norm(abs(inv(A))*abs(A)*abs(x), Inf)
Out[8]:
In [9]:
t=10^4
n=10
T=Float64
b=randn(n)
c=zeros(t) #log10 of magic numbers
r=zeros(t) #ratio of lhs/rhs in inequality
ε=eps(real(one(T)))
γ = n*ε/(1-n*ε) #γ_n
for i=1:t
A = convert(Matrix{T}, triu(randn(n,n)))
̂x = A \ b #we called this x earlier
c[i] = max(log10(abs(norm(A* ̂x)-norm(b))/ε), -2) #arbitrary floor for plotting 0 on log scale
bigA = big(A)
x = bigA \ b #x is now the exact solution
lhs = norm(̂x-x,Inf)/norm(x,Inf)
rhs = cond_skeel(bigA, x)*γ/(1-cond_skeel(bigA)*γ)
r[i] = lhs/rhs
end
In [10]:
plot(x=c, y=r, Guide.XLabel("log10(c)"), Guide.YLabel("lhs/rhs"))
Out[10]:
There doesn't seem to be much correlation between the originally coded test and the test using the forward error bound. But at least the forward error bound test seems to be robust; every point has lhs <= rhs.
In [11]:
x, y=hist(r, 100)
plot(x=x, y=y/(t*(x[2]-x[1])), Geom.bar, Guide.XLabel("lhs/rhs"), Guide.YLabel("Density"),
xintercept=[1.0], Geom.vline(color="orange"))
Out[11]:
In [12]:
mean(r), std(r)/sqrt(t-1) #mean with its standard error
Out[12]:
The inequality seems to be fairly loose, but at least nothing violates it.
From the implicit definition of the perturbed matrix $\delta A$ in
$$ (A + \delta A) \hat{x} = b $$we can rearrange this to get
$$ \delta A \hat{x} = b - A \hat{x} $$And taking the norm gives an inequality on the residual $$ \left\Vert b - A \hat{x} \right\Vert = \left\Vert \delta A \hat{x} \right\Vert = \left\Vert \left\vert \delta A \hat{x} \right\vert \right\Vert \le \gamma_n \left\Vert \left\vert A \right\vert \left\vert \hat{x} \right\vert \right\Vert \le \gamma_n \left\Vert A \right\Vert \left\Vert \hat{x} \right\Vert. $$
In [13]:
t=10^4
n=10
T=Float64
b=randn(n)
c=zeros(t) #log10 of magic numbers
r=zeros(t) #ratio of lhs/rhs in inequality
ε=eps(real(one(T)))
γ = n*ε/(1-n*ε) #γ_n
for i=1:t
A = convert(Matrix{T}, triu(randn(n,n)))
x = A \ b
c[i] = max(log10(abs(norm(A*x)-norm(b))/ε), -2) #arbitrary floor for plotting 0 on log scale
resid = norm(abs(b - A*x))
rhs = γ * norm(A) * norm(x)
r[i] = resid/rhs
end
In [14]:
plot(x=c, y=r, Guide.XLabel("log10(c)"), Guide.YLabel("lhs/rhs"))
Out[14]:
The original Julia test doesn't seem to correlate with the test derived from backward error result either. But at least none of the tests fail so far.
In [15]:
x, y = hist(r, 25)
plot(x=x, y=y/(t*(x[2]-x[1])), Geom.bar, Guide.XLabel("lhs/rhs"), Guide.YLabel("Density"))
Out[15]:
In [16]:
mean(r), std(r)/sqrt(t-1)
Out[16]: