-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StackOverflowError
trying to throw OutOfGPUMemoryError
, subsequent errors
#2292
Comments
Update: it was not unending, it just took a very, very long time to clear the buffer of all the errors. Then the REPL was fine. Except a new StackOverflowError randomly appeared while the prompt was just sitting there. Task Manager shows the GPU memory is full.
|
StackOverflowError
trying to throw OutOfGPUMemoryError
, REPL brokenStackOverflowError
trying to throw OutOfGPUMemoryError
, subsequent errors
Related: Deliberately calling |
The problem is just that the OOM handling itself triggers an OOM, I don't see how that would imply that the entire GPU GC handling is broken. |
The fact that I have to manually free GPU arrays to avoid OOM errors at all does, though. Separate issue, to be sure. |
Not necessarily; when encountering an OOM, we trigger GC to free up memory. If we first encounter a stack overflow, that doesn't happen. |
Ohhh, okay, I understand. Thank you. Is |
I didn't expect that the |
Please test #2299 |
Describe the bug
My neural network was training smoothly, but then I got a
StackOverflowError
. The stack trace suggests this was caused byOutOfGPUMemoryError
being thrown.I can't show the original stack trace, because when I went to check the CUDA version via Pkg, the screen started being flooded with more
StackOverflowError
s unendingly with "Error while freeing DeviceBuffer(_____)", where the details of the buffer are different each time. The REPL is broken.The subsequent errors have this form:
To reproduce
Not available.
Manifest.toml
Expected behavior
These errors should not be causing
StackOverflowError
s, and the REPL should be usable.Version info
Details on Julia:
Details on CUDA:
Additional context
none
The text was updated successfully, but these errors were encountered: