Saner error bounds on linear algebra tests #5605

jiahao · 2014-01-29T20:55:08Z

In the current linear algebra test suite, many of the linear algebra tests have error bounds that are heuristically implemented, e.g. something like N^2*eps(). This occasionally causes brittleness in that the linear algebra tests fail for no apparent reason.

The purpose of this issue is to track

Specific linear algebra tests which are brittle and have to had their tolerances adjusted manually for no apparent reason
The brittleness of the magic seed that is currently used in the linear algebra test suite.
Thoughts on an appropriate fuzz factor for linear algebraic tests
The implementation of proper error bounds on linear algebra tests.

The long term hope is that the last action item will close this issue once and for all, but in the meantime we should keep track of any heuristic adjustments to the test suite.

Cross-reference: #5578 #5472 #4638

The text was updated successfully, but these errors were encountered:

staticfloat · 2014-01-29T21:30:37Z

Shouldn't all error bounds be in terms of eps and the number of instructions that could cause an error of eps in the output? I can't think of another way to determine the error bounds, other than just running our tests with a large sample pool and measuring them. But that seems rather self-defeating, as we want to ensure our tests are accurate, and logically our operations should have some finite, calculable bound. Perhaps our reasoning is wrong on some of these bounds?

jiahao · 2014-01-29T21:37:18Z

Our reasoning is wrong on almost all of these bounds; they are either too strict or too lenient (and sometimes both), and the ones for which the error bounds are too strict occasionally break when the randomly sampled matrices change, which causes the test suite to die.

…y reflectors. Add Float16 and BigFloat to tests and test promotion. Cleaup promotion in LinAlg. Avoid promotion in mutating ! functions. Make Symmetric, Hermitian and QRs immutable. Make thin a keyword argument in svdfact. Remove cond argument in sqrtm.

rickhg12hs · 2014-01-31T17:09:01Z

Error update (XRef #5472):

$ ./julia -e 'versioninfo()'
Julia Version 0.3.0-prerelease+1347
Commit 896dcc9* (2014-01-31 12:34 UTC)
Platform Info:
  System: Linux (i686-redhat-linux)
  CPU: Genuine Intel(R) CPU           T2250  @ 1.73GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm

$ make test-linalg
    JULIA test/linalg
     * linalg
exception on 1: ERROR: assertion failed: |:(*(triu(a),x)) - b| <= 0.0017881393432617188
  :(*(triu(a),x)) = 4.000061    .99801636
4.9999237       3.0001984
3.0004425       5.000412
5.000763        2.9999542
1.0000076       3.0000153
5.0000305       3.0000916
1.9999999       3.9999993
2       4
3       4
3       4

  b = 4 1
5       3
3       5
5       3
1       3
5       3
2       4
2       4
3       4
3       4

  difference = 0.0019836426 > 0.0017881393432617188
 in error at error.jl:22
 in test_approx_eq at test.jl:68
 in anonymous at no file:183
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
while loading linalg.jl, in expression starting on line 23
ERROR: assertion failed: |:(*(triu(a),x)) - b| <= 0.0017881393432617188
  :(*(triu(a),x)) = 4.000061    .99801636
4.9999237       3.0001984
3.0004425       5.000412
5.000763        2.9999542
1.0000076       3.0000153
5.0000305       3.0000916
1.9999999       3.9999993
2       4
3       4
3       4

  b = 4 1
5       3
3       5
5       3
1       3
5       3
2       4
2       4
3       4
3       4

  difference = 0.0019836426 > 0.0017881393432617188
 in error at error.jl:22
 in test_approx_eq at test.jl:68
 in anonymous at no file:183
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
while loading linalg.jl, in expression starting on line 23
while loading /usr/local/src/julia/julia/test/runtests.jl, in expression starting on line 23

make[1]: *** [linalg] Error 1
make: *** [test-linalg] Error 2

mschauer · 2014-01-31T17:11:36Z

Thanks, exactly what I thought. This is #5526 , just made a comment there.

andreasnoack · 2014-01-31T17:25:00Z

Okay. Short term fix is that I install a 32 bit VB and adjust these.

mschauer · 2014-01-31T17:39:40Z

VB will be overkill, see my comment there.

Implements two error bounds from [LAWNS 14](http:https://www.netlib.org/lapack/lawnspdf/lawn14.pdf) concerning the elementwise reconstruction of a matrix from its Cholesky factorization, and the error bounds of performing a linear solve using the Cholesky factors. (#5605)

Replaces magic numbers in error bounds for general dense and triangular solve tests. Fix #5678, ref #5605

andreasnoack · 2014-02-04T20:29:57Z

I'll continue the conversation from #5678 here.

Most of the linear algebra tests look superficially very similar and fail similarly, which is a problem.

Can you elaborate on that?

Better error bounds would be better, but the tests have actually worked pretty well for a long time. The Recently, I included Float16 in the tests but because they get promoted before the testing, the criterion in @test_approx_eq calculated too small bounds. Therefore, I had to make the ad hoc bounds that are causing all the problems now.

There are many tests and it will take some time to write correct error bounds for all of them. Hence, as a short term solution I don't think it is a problem just to widen the error bounds and get the tests to pass again. The primary reason for the tests is to detect when we break something.

stevengj · 2014-02-07T00:03:04Z

As I commented in #5705, the root problem here is not the error bounds, but rather the fact that the tests use a random matrix. Eventually, you will always hit a badly conditioned matrix by accident.

I would just use a deterministic matrix for the tests. Then you know how accurately the solution should be computed, and can set error bounds relative to that once and for all.

mschauer · 2014-02-07T00:11:08Z

I am not sure I understand you, the rng is seeded and will produce always the same matrices in the test. srand(32);rand(Float64,2,2) is a deterministic matrix barring changes in the random number generator.

stevengj · 2014-02-07T02:22:55Z

@mschauer, I didn't notice that the RNG has a deterministic seed in the tests. Nevermind then, that shouldn't be a problem.

jiahao · 2014-02-07T02:41:22Z

I really shouldn't have spent so much time on this, but I've just uploaded an IJulia notebook JIN5075 (nbviewer link) which does some numerical testing to find out how brittle the test case is when applied to many random matrices, and I've also done some testing to adapt the textbook forward and error bounds into tests that can be used to replace the magic number test in #5075.

I hope the notebook will explain clearly why I am not comfortable with continually adjusting the magic numbers. It is just pushing the issue down the road: what if we one day reorder the tests, or someone inserts a test before it, causing the test to sample a completely new random matrix?

cc: @andreasnoackjensen

staticfloat · 2014-02-07T02:56:04Z

Thanks for the writeup, @jiahao. I think I understand better now. Also, I put in an nbviewer link for those of us too lazy to copy the notebook into our own IJulia setup.

jiahao · 2014-02-07T03:15:11Z

Thanks, I should remember to do that.

StefanKarpinski · 2014-02-07T03:17:23Z

NBViewer link: http:https://nbviewer.ipython.org/github/jiahao/jin/blob/master/JIN5705%20-%20Distribution%20of%20linear%20algebra%20test%20failures%20for%20triangular%20matrices.ipynb

StefanKarpinski · 2014-02-07T05:08:09Z

Well, I'm glad we have a random matrix expert or two around.

jiahao · 2014-02-07T05:15:06Z

I'm glad I'm useful for something other than supplying snacks for @JeffBezanson and @loladiro .

mschauer · 2014-02-07T08:39:06Z

Thank you, for the notebook jiahao. You note that changing a magic number like in 5705 "decreases the probability of failure by an essentially negligible amount". This was a bit the point of the change from my point of view, it doesn't make the tests more or less brittle, it was rather the smallest change to make the test in 32 bit systems to behave similar to the 64 systems until a real solution is found and such that a variant of #5578 can be merged without breaking everything.

mschauer · 2014-02-07T08:57:06Z

One idea: Instead of plain random matrices one can use for testing fixed diagonal matrices with a random orthogonal basis transform that preserves the eigenvalues and condition numbers so it is easier to reason about reasonable error bounds.

rickhg12hs · 2014-02-07T14:21:20Z

Error update (Xref #5472 ):

$ ./julia -E 'versioninfo()'
Julia Version 0.3.0-prerelease+1419
Commit a673e4c* (2014-02-06 22:55 UTC)
Platform Info:
  System: Linux (i686-redhat-linux)
  CPU: Genuine Intel(R) CPU           T2250  @ 1.73GHz
  WORD_SIZE: 32
  BLAS: libopenblas (DYNAMIC_ARCH NO_AFFINITY)
  LAPACK: libopenblas
  LIBM: libopenlibm
nothing
$ make test-linalg   # debug=true in test/linalg.jl
    JULIA test/linalg
     * linalg

[lots of passed tests ... and then ...]

type of a: Int32 type of b: Float16

(Automatic) upper Cholesky factor
exception on 1: ERROR: assertion failed: |:(det(capd)) - :(det(apd))| <= 0.002384185791015625
  :(det(capd)) = 1.626428240984628e9
  :(det(apd)) = 1.6264282409980435e9
  difference = 0.01341557502746582 > 0.002384185791015625
 in error at error.jl:22
 in test_approx_eq at test.jl:68
 in anonymous at no file:54
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
while loading linalg.jl, in expression starting on line 23
ERROR: assertion failed: |:(det(capd)) - :(det(apd))| <= 0.002384185791015625
  :(det(capd)) = 1.626428240984628e9
  :(det(apd)) = 1.6264282409980435e9
  difference = 0.01341557502746582 > 0.002384185791015625
 in error at error.jl:22
 in test_approx_eq at test.jl:68
 in anonymous at no file:54
 in runtests at /usr/local/src/julia/julia/test/testdefs.jl:5
 in anonymous at multi.jl:613
 in run_work_thunk at multi.jl:575
 in remotecall_fetch at multi.jl:647
 in remotecall_fetch at multi.jl:662
 in anonymous at multi.jl:1382
while loading linalg.jl, in expression starting on line 23
while loading /usr/local/src/julia/julia/test/runtests.jl, in expression starting on line 23

make[1]: *** [linalg] Error 1
make: *** [test-linalg] Error 2
$

jiahao · 2014-02-07T15:50:48Z

@mschauer I didn't mean to criticize the intention of #5705, but rather to point out that the smallest change to make the test pass on your system didn't fundamentally change the state of affairs. I think those of us actively discussing these issues understand this quite well, but the point of the numerical experiment was to quantify exactly what the situation looked like and hopefully give others a sense of why this is important.

jiahao · 2014-02-07T16:12:23Z

@mschauer we could generate matrices from random orthogonal/unitary similarity transformations of diagonal matrices, that is true, but those matrix multiplies don't come for free, and the the tests would still have to account for the roundoff induced by those multiplications. I haven't thought through this case yet, but it is certainly plausible that in this particular situation the numerical stability of the rotation matrices would lead to nice behavior.

mschauer · 2014-02-07T16:17:52Z

The idea would be to take the result of the similarity transform "as is", that is as the random matrix we are testing with and basically forget about the way it was constructed. That doesn't change much about the integer cases though.

- Add automatic promotion of integers to floats for condskeel - Ref. #5605, #5705

andreasnoack · 2014-02-08T12:55:34Z

@jiahao I have followed your example and spent too much time on making a notebook with a reply.

http:https://nbviewer.ipython.org/gist/andreasnoackjensen/8882742

The answer is also related to #5726

- Add automatic promotion of integers to floats for condskeel - Ref. #5605, #5705

jiahao · 2014-02-08T16:51:59Z

Thanks for the reply @andreasnoackjensen . A couple of points:

I'm proposing to consider the test matrix as arbitrary rather than random. (pet peeve alert) There are actually quite a lot of tighter error bounds and results concerning specifically the Gaussian random matrices (quite a few from @alanedelman), which are our current test subjects, but I do not consider them to be relevant to the issue of testing linear algebra.
The condition number is not known at the time of the computation, but it of course can be computed, and in fact can be done so fairly accurately for Float64. I've found that at lower precisions, this gets a little dicey. I used BigFloat but it is definitely overkill; explicitly casting to Float64 or Complex128 would almost certainly suffice.
Note that the error bound involving Skeel condition numbers is structurally that of the constant γn = n*ε/(1-n*ε). In the derivation, Higham assumes that is always positive, which was not a bad assumption when only single and double precision were worth considering. I rederived this error bound last night (so take late night mathematics for what it is worth) and it looks like it is sufficient to just take the absolute value of the error bound (although technically this involves loosening the inequality bound yet further). This is implemented in Replace magic constant test for linear solve on triangular matrices #5726.
I left this out of the version of the notebook I uploaded, but I can't find any way to justify an error bound that looks like the 'old' version being tested. The lhs of the current test is abs(norm(A*x)-norm(b)), which is exactly the lower bound of the triangle inequality as applied to the residual vector r=A*x-b, i.e. abs(norm(A*x)-norm(b)) <= norm(A*x-b) <= norm(A*x) + norm(b). Maybe it's possible to bound the leftmost size of the inequality, but to me it feels a lot more natural to just bound the residual itself.
The reason why the error bounds on the forward error bound are loose is actually known; it comes from trying to abstract away the order in which the substitutions are done. (Higham, §§ 8.1-2 sketches out the derivation in considerable detail.) I was surprised, though, at how much slack this introduced into the error bound in the end. We could tighten the bounds considerably, at the cost of tying the test to a specific implementation of the linear solver.
Your observation that sampling the solution vector and reconstructing the rhs a posteriori is quite interesting. I don't think I've seen anyone propose that before. The sampling does look much more reasonable doing it this way. (The gap between the data and the bound is to be expected, though, since the bound demarcates the region of zero probability and the gap is probably the region expected from possible, but very unlikely, samples.)

Right now I still think that where possible, we should use theoretically justified error bounds. But adapting the textbook results has turned out to be somewhat tricker than I had expected. The test derived from the backward error bound is easy to reason about and I think it's correct. Conversely, the forward error test is tricky; the correct way to implement my version of the forward error bound test ought to account also for the error of the more precise estimator for the solution. For BigFloats I currently assume that the contribution to the error is exactly zero, when for BigFloat, γn~2e-76. There has been a lot of literature on mixed precision calculations, which could be worth digging into, but I haven't yet unearthed results specific to the forward error bound between two estimated solutions of different precisions.

I smell an excellent paper about linear algebra testing if we want to pursue this further.

alanedelman · 2014-02-09T20:55:34Z

Reading real quickly now.

Lots of linear algebra error bounds:
http:https://www.netlib.org/lapack/lug/node72.html

Probably many are pessimistic

I have a preference for eps * (the right condition number) * (# ops/number) * (3,4,or 10)

lapack has lots of discussion about testing, good test matrices, etc
not that things can't be improved, but decades of good thinking has happened already

random matrices, i feel, are good for a few tests, but not overused

something i mentioned and often gets quoted:

We want to convey that random matrices are very special matrices. It is a mistake to link psychologically a random matrix with the intuitive notion of a ‘typical’ matrix or the vague concept of ‘any old matrix’. In fact, the larger the size of the matrix the more predictable it becomes.

andreasnoack · 2014-02-11T18:22:46Z

@jiahao

If the distribution has unbounded support and any test failure is problematic then I think arbitrariness and randomness can be considered the same. However, I see your point, we don't want to exploit probability results in these tests.
I guess you only know an estimate of the condition number, but with a negligible error, right? In theory, for a very tight bound the inequality could be violated because of the estimation error in the condition number. This is not important, but I just wanted to differentiate knowns and unknowns.
Causing big frustration to me, my library account doesn't allow me to read SIAM material, so I haven't read Higham. With that reservation, I don't think γ is the problem in itself. Rather it is the 1-cond(A)γ denominator and I don't think the absolute value will suffice as a solution when the matrices are very ill conditioned. Please have a look here.
I completely agree. I don't remember why it was like that before. The revisions of test_approx_eq I have done, have been completely theory ignorant.
This has been a concern to me. There is a size/power trade off. If the bounds get too loose, we avoid false positives at the cost of not detecting true errors.
I am a little surprised as I thought it was the most obvious thing to do if you wanted to measure the error. I agree that a gap is to be expected. The important thing is the correlation ensuring that the error and bound moves together when the condition number gets big. You don't have that effect in the BigFloat solution and I think that is problematic.

Lets read more of the LAPACK manual and the LAWNs and take whats there.

jiahao · 2014-02-11T23:10:39Z

I guess you only know an estimate of the condition number, but with a negligible error, right?

At some level, the distinction between that and the best possible answer that can be computed within floating-point roundoff becomes meaningless. For the dense matrix case, I think cond computes the full SVD and takes the ratio of the largest to smallest condition number, which is by definition the "classical" condition number. SVD itself isn't always that stable numerically...

Ref. #5605

KristofferC · 2017-12-02T22:46:36Z

With the guardsrand changes and the inactivity here, I think this can be closed.

jiahao mentioned this issue Jan 29, 2014

WIP: Use single call to dsfmt_gv_genrand_uint32() for small ranges #5578

Closed

This was referenced Feb 2, 2014

Cholesky factorization test failure #5650

Closed

Linear solver on lower triangular matrix linalg test failing #5678

Closed

jiahao added a commit that referenced this issue Feb 4, 2014

Use condition numbers in linear solver tests

9fa2d17

Replaces magic numbers in error bounds for general dense and triangular solve tests. Fix #5678, ref #5605

This was referenced Feb 5, 2014

Relax linalg tests. #5691

Merged

One more relaxation of test bounds in linalg. #5705

Merged

jiahao added a commit that referenced this issue Feb 8, 2014

Replace tests for linear solver on triangular systems

fb9db5c

- Add automatic promotion of integers to floats for condskeel - Ref. #5605, #5705

jiahao mentioned this issue Feb 8, 2014

One more temporary fix of linalg test errors. #5732

Merged

jiahao added a commit that referenced this issue Feb 8, 2014

Replace tests for linear solver on triangular systems

7977571

- Add automatic promotion of integers to floats for condskeel - Ref. #5605, #5705

jiahao mentioned this issue Feb 8, 2014

linalg tests take too long #5735

Closed

mschauer mentioned this issue Feb 9, 2014

Use single call to dsfmt_gv_genrand_uint32() for small ranges (Take 2) #5741

Merged

rickhg12hs mentioned this issue Feb 13, 2014

make test-linalg fails #5472

Closed

jiahao mentioned this issue Feb 21, 2014

32-bit julia test failures #5874

Closed

andreasnoack mentioned this issue Feb 22, 2014

A more robust error bound for the determinant. #5902

Merged

jiahao mentioned this issue Mar 3, 2014

Fails to build on opensuse 13.1 #6029

Closed

jiahao added a commit that referenced this issue Jun 2, 2014

Bump threshold for cond(::Triangular). Ref: #5605

7b6c68a

jiahao added a commit that referenced this issue Jun 3, 2014

Adjusted test threshold for expm(::Diagonal)

454d307

Ref. #5605

jiahao mentioned this issue Dec 15, 2014

@test_approx_eq never finds random Float16 arrays different #9359

Closed

jiahao mentioned this issue May 6, 2015

Fixed broken exception in TriDiag LU and added tests #11146

Merged

jiahao mentioned this issue Aug 9, 2015

add isapprox for arrays, with ≈ and ≉ synonyms #12472

Merged

KristofferC closed this as completed Dec 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saner error bounds on linear algebra tests #5605

Saner error bounds on linear algebra tests #5605

jiahao commented Jan 29, 2014

staticfloat commented Jan 29, 2014

jiahao commented Jan 29, 2014

rickhg12hs commented Jan 31, 2014

mschauer commented Jan 31, 2014

andreasnoack commented Jan 31, 2014

mschauer commented Jan 31, 2014

andreasnoack commented Feb 4, 2014

stevengj commented Feb 7, 2014

mschauer commented Feb 7, 2014

stevengj commented Feb 7, 2014

jiahao commented Feb 7, 2014

staticfloat commented Feb 7, 2014

jiahao commented Feb 7, 2014

StefanKarpinski commented Feb 7, 2014

StefanKarpinski commented Feb 7, 2014

jiahao commented Feb 7, 2014

mschauer commented Feb 7, 2014

mschauer commented Feb 7, 2014

rickhg12hs commented Feb 7, 2014

jiahao commented Feb 7, 2014

jiahao commented Feb 7, 2014

mschauer commented Feb 7, 2014

andreasnoack commented Feb 8, 2014

jiahao commented Feb 8, 2014

alanedelman commented Feb 9, 2014

andreasnoack commented Feb 11, 2014

jiahao commented Feb 11, 2014

KristofferC commented Dec 2, 2017

Saner error bounds on linear algebra tests #5605

Saner error bounds on linear algebra tests #5605

Comments

jiahao commented Jan 29, 2014

staticfloat commented Jan 29, 2014

jiahao commented Jan 29, 2014

rickhg12hs commented Jan 31, 2014

mschauer commented Jan 31, 2014

andreasnoack commented Jan 31, 2014

mschauer commented Jan 31, 2014

andreasnoack commented Feb 4, 2014

stevengj commented Feb 7, 2014

mschauer commented Feb 7, 2014

stevengj commented Feb 7, 2014

jiahao commented Feb 7, 2014

staticfloat commented Feb 7, 2014

jiahao commented Feb 7, 2014

StefanKarpinski commented Feb 7, 2014

StefanKarpinski commented Feb 7, 2014

jiahao commented Feb 7, 2014

mschauer commented Feb 7, 2014

mschauer commented Feb 7, 2014

rickhg12hs commented Feb 7, 2014

jiahao commented Feb 7, 2014

jiahao commented Feb 7, 2014

mschauer commented Feb 7, 2014

andreasnoack commented Feb 8, 2014

jiahao commented Feb 8, 2014

alanedelman commented Feb 9, 2014

andreasnoack commented Feb 11, 2014

jiahao commented Feb 11, 2014

KristofferC commented Dec 2, 2017