Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent LapackError(4) in test/linalg.jl; #1652

Closed
staticfloat opened this issue Dec 2, 2012 · 12 comments
Closed

Intermittent LapackError(4) in test/linalg.jl; #1652

staticfloat opened this issue Dec 2, 2012 · 12 comments
Labels
kind:bug Indicates an unexpected problem or unintended behavior

Comments

@staticfloat
Copy link
Sponsor Member

I get this intermittently on 32-bit Ubuntu 12.04, and this has shown up a few times on Travis:

LapackException(4)
 in syev! at lapack.jl:1241
 in anonymous at no file:341
 in load_now at util.jl:228
 in load_now at util.jl:240
 in require at util.jl:176
 in runtests at runtests.jl:4
 in anonymous at no file:57
 in include at boot.jl:237
 in process_options at client.jl:199
 in _start at client.jl:255
at /home/ubuntu/src/julia/test/linalg.jl:28

Any ideas? I'm not honestly sure what this error means.

@andreasnoack
Copy link
Member

Not really. It is a convergense failure in the eigen decomposition but syev! is not used in any Julia functions right now. It is only called in the linalg tests (and not in line 28) to ensure that the new syevr! produces the same value as the old syev. However, the seed is set in the test and hence the computation should be the same in all runs and the failure therefore deterministic.

@andreasnoack
Copy link
Member

We can try to change the seed in the test and see if that solves it.

@staticfloat
Copy link
Sponsor Member Author

I tried changing the seed to a few different values, and it didn't do much except cause different tests to fail at times.

I find it very strange that the LAPACK method is failing sometimes and not others for the same input. Could this be an LAPACK bug? Would it help if I provided the actual matrices it is failing on?

@staticfloat
Copy link
Sponsor Member Author

I am also getting the following error every now and then:

assertion failed: :( (r==chol(apd)) )
 in anonymous at no file:14
 in anonymous at no file:6
 in load_now at util.jl:228
 in load_now at util.jl:240
 in require at util.jl:176
 in runtests at runtests.jl:4
 in anonymous at no file:57
 in include at boot.jl:237
 in process_options at client.jl:199
 in _start at client.jl:255
at /home/ubuntu/src/julia/test/linalg.jl:103

If I put the following before the @assert:

println( r )
println( apd )
println( r - chol(apd) )
println( sum( r - chol(apd) ) )

I (sometimes!) see the following:

4x4 Complex64 Array:
 0.742501f0+0.0f0im  0.442769f0+0.0f0im  …  0.647775f0+0.0f0im
      0.0f0+0.0f0im  0.633343f0+0.0f0im     0.738515f0+0.0f0im
      0.0f0+0.0f0im       0.0f0+0.0f0im     0.141621f0+0.0f0im
      0.0f0+0.0f0im       0.0f0+0.0f0im     0.233825f0+0.0f0im
4x4 Complex64 Array:
 0.551307f0+0.0f0im  0.328756f0+0.0f0im  …  0.480973f0+0.0f0im
 0.328756f0-0.0f0im  0.597168f0+0.0f0im     0.754548f0+0.0f0im
 0.948766f0-0.0f0im   0.87534f0-0.0f0im      1.23537f0+0.0f0im
 0.480973f0-0.0f0im  0.754548f0-0.0f0im      1.03975f0+0.0f0im
4x4 Complex64 Array:
 0.0f0+0.0f0im  0.0f0+0.0f0im  0.0f0+0.0f0im      0.0f0+0.0f0im
 0.0f0+0.0f0im  0.0f0+0.0f0im  0.0f0+0.0f0im      0.0f0+0.0f0im
 0.0f0+0.0f0im  0.0f0+0.0f0im  0.0f0+0.0f0im      0.0f0+0.0f0im
 0.0f0+0.0f0im  0.0f0+0.0f0im  0.0f0+0.0f0im  1.3411f-7+0.0f0im
1.3411045f-7

That last element randomly flips from 0 to that value, and throws the test off. Is 1e-7 a good floating-point floor value? Also, I don't know if this affects anything, but your "complex matrices" are purely real, because they're being instantiated from a single randn() call.

@andreasnoack
Copy link
Member

Thank you for the info. It is still weird though. It would be helpful if you could show me the matrix that triggers the syev! error.

The assertion failed: :( (r==chol(apd)) ) error is a floating point error I think. r and chol(apd) are two different calls to LAPACK and therefore potentially affected. I'll change that test.

It is my intention that the complex matrices have zero imaginary part. I just wanted to call all the versions of LAPACK functions and it is easier to check the results, when they are identical over the LapackType.

@staticfloat
Copy link
Sponsor Member Author

Try as I might, I can't get it to break anymore. Let's leave this open for a few days in case anyone else runs into this, and if I'm still unable to reproduce, I'll close it.

@pao
Copy link
Member

pao commented Dec 5, 2012

Just happened to me in https://travis-ci.org/JuliaLang/julia/jobs/3507555, and I see that @andreasnoackjensen mentioned the other one I filed as #1682 (log at https://travis-ci.org/JuliaLang/julia/jobs/3506379).

@pao
Copy link
Member

pao commented Dec 6, 2012

Not actually closed by e8ce0df.

@pao pao reopened this Dec 6, 2012
@kmsquire
Copy link
Member

Not sure if this is related or a new issue, but I'm getting this consistently:

~/src/julia$ make test-linalg
    JULIA test/linalg
     * linalg
BoundsError()
 in check_bounds at abstractarray.jl:76
 in check_bounds at abstractarray.jl:111
 in ref at array.jl:326
 in \ at linalg_dense.jl:500
 in anonymous at no file:32
 in include at boot.jl:248
at /home/kmsquire/Source/julia/test/linalg.jl:104
make[1]: *** [linalg] Error 1
make: *** [test-linalg] Error 2

If it helps, a couple of times, I got Lapack errors instead:

~/src/julia$ make test-linalg
    JULIA test/linalg
     * linalg
LapackException(47317654700032)
 in potrs! at lapack.jl:1009
 in \ at linalg_dense.jl:447
 in anonymous at no file:32
 in include at boot.jl:248
at /home/kmsquire/Source/julia/test/linalg.jl:104
make[1]: *** [linalg] Error 1
make: *** [test-linalg] Error 2

(also LapackException(-4294967296)).

I've since done a make cleanall; make, and the Lapack exceptions are not appearing.

@kmsquire
Copy link
Member

Rebuilding all deps fixed everything for me. Probably related to not rebuilding after recent 64-bit build-dep changes.

@andreasnoack
Copy link
Member

Is this issue still relevant?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Feb 2, 2013

i don't think the travis build has failed recently because of it, but then travis has been failing to run quite a bit lately.

this assertion failure assertion failed: :( (r==chol(apd)) ) was not actually fixed, but just removed from the test suite

@vtjnash vtjnash closed this as completed Feb 2, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

5 participants