-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LU decomposition crashes Windows for 35x35 matrices #2124
Comments
Cc: @xianyi |
Might be a stack issue. Wouldn't be surprised. |
Is it an openblas bug, or something in our windows port? |
Can't tell yet. I can give you a windows VM if you want to debug. |
I should add that this seems to be limited to older Windows machines. The example is from Windows Server 2003. |
Don't have a test machine for that yet. Will set up a couple of VMs for it on julia.mit.edu |
I am able to get the same crash on a Windows Server 2008 but to do so I need a 65x65 matrix. I cannot crash Julia on Windows 8. |
What's your compiler version? GCC 4.7? Xianyi |
+1 |
can someone tell me how to run the equivalent of |
|
Hi @ViralBShah , We are in Chinese New Year holiday. I think we can address this issue next week. Xianyi |
Ok. Have fun! Let me know if I should file this as an issue on openblas. |
I think we should ship the Windows version with Reference BLAS, if we can't get ATLAS working in the meanwhile, and until OpenBLAS can be stabilized. |
@andreasnoackjensen We should probably add some of these windows crashes as tests in |
Hi all, I don't know why it calls csyr in I just uploaded a simple dgetrf sample to gist https://gist.github.com/xianyi/4771129 Xianyi |
No it is not so obvious why csyr is called. However, the problem seems again to be be related to multithreading. If I set the number of threads to one I don't get the error. |
Hi @andreasnoackjensen , Is it 32 bit or 64 bit? Could you try OpenBLAS develop branch? Could you try my dgetrf test https://gist.github.com/xianyi/4771129 ? Thank you Xianyi |
Hi @xianyi, It was on a Windows Server 2008 64 bit machine, but I don't know much about the Windows build of Julia. Therefore I cannot try a build with the develop branch. Maybe @loladiro and @vtjnash can help here. I'll see if I can run your example, but I don't have access to a Windows machine with privileges to install programs. |
i added comments to xianyi's gist. current workaround for julia may be to add |
@xianyi I've narrowed this down to the stack being corrupted by the line in your gist:
|
Does this happen only in LU, or does it happen for other decompositions too? |
I have tested the other factorizations and the problem seems to be for LU only. However, that includes the solution of a general linear system which also crashes Julia. |
@vtjnash Lets set number of threads to 1 on windows if that will solve the immediate release issue. |
@vtjnash , I also added the comment in my gist. Xianyi |
CBLAS does get linked into the openblas used by julia. |
Bumping to post 0.1. |
@xianyi Would it be possible to fix this in a few days? If so, we can build julia windows binaries with openblas now that we have released 0.1. |
@zchothia Could you investigate this issue? Thank you. |
Hi @vtjnash , I read your comments in my gist. However, when I built OpenBLAS on Linux and test_dgetrf on Windows, I didn't meet the SEGFAULT bug on Windows. What's the i686-w64-mingw32-gcc version on Linux and gcc version on Windows? Thank you Xianyi |
built with max OPENBLAS_NUM_THREADS of 80 tested with
and
and (which is really the same as the first one):
Oh, and my machine is a VMware instance with 2 processors (sometimes 4) running on a Core i7 2620m with 4 processors (all x86_64 / 64-bit). are any of these make flags for openblas potentially at fault (or insufficient)?
|
Your i686-w64-mingw32-gcc is 4.6 version. Did you use gcc 4.6 on Windows? I remember that 4.6 and 4.7 have the different calling conventions on Windows. Xianyi |
IIUC, It appears that only the calling convention of C++11 changed: https://gcc.gnu.org/gcc-4.7/changes.html. |
Hi @vtjnash , Please give me the access to the VM. I cannot reproduce this bug on my machine :( Xianyi |
@xianyi I haven't started it yet (I think I need to find my windows install disk). However, I just identified the problem as stack overflow. The default stack on windows is 1MB, increasing it to 16MB fixes the problem ( |
Julia itself can use quite a bit of stack space; can we bump the default to 8MB on windows (if that's enough to fix this)? |
16MB was enough to bump the max number of openblas threads up to somewhere between 10 and 60, then we run into some other segfault (which appears to be caused by a null pointer) |
note: fixing #1971 converted this segfault into a julia stack overflow exception for OPENBLAS_NUM_THREADS<30 (or so) at which point it turns into a MemoryError (or an openblas/lapack crash?) |
…mething much simpler (wasn't essential for this). closes #1971
(edit: now named
lufact(rand(33,33))
)julia>lud(rand(33))
ok!but (see also #1543)
The text was updated successfully, but these errors were encountered: