Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task scheduling can result in delays when synchronizing #1525

Closed
maleadt opened this issue May 26, 2022 · 6 comments
Closed

Task scheduling can result in delays when synchronizing #1525

maleadt opened this issue May 26, 2022 · 6 comments
Labels
bug Something isn't working

Comments

@maleadt
Copy link
Member

maleadt commented May 26, 2022

See https://discourse.julialang.org/t/occasional-long-delays-in-cuda-jl/81545

@maleadt maleadt added the bug Something isn't working label May 26, 2022
@physicsjoke
Copy link

physicsjoke commented Feb 13, 2023

I think I might have a similar problem. While testing some code that we ported from CPU, including a few kernels to draw random numbers and some array operations, Julia reproducibly gets stuck, and needs to be killed/interrupted using Ctrl+C.
Nsys shows that the computations are done within seconds, the kernel execution times being very short. I'd be glad to provide mor info as needed...
Screenshot_cuda_hangs

I tried this with and without the nonblocking_synchronize line commented out, as suggested in that thread. the behaviour is the same.
It occurs independently using either the Pkg.test mechanism or including the "runtests.jl" file we made for the project

@maleadt
Copy link
Member Author

maleadt commented Feb 13, 2023

I tried this with and without the nonblocking_synchronize line commented out, as suggested in that thread. the behaviour is the same.

Then this is a different issue. Julia should always respond to CRTL-C, especially in the case of this issue. If it doesn't, attach GDB and see where it's stuck.

@physicsjoke
Copy link

It does respond to Ctrl-C, but the test should finish, right?

@maleadt
Copy link
Member Author

maleadt commented Feb 13, 2023

Ah I misread. It cleanly reports a shutdown, right? Because if you mash CTRL-C you can get Julia killed uncleanly.

What I'm looking for, is whether there's a task that just doesn't finish (in which case a CTRL-C should be cleanly intercepted, and result in an InterruptException), or whether a CUDA API call blocks (in which case a CTRL-C will likely result in a fatal error, like a segfault). In case of the former, you could try something like JuliaLang/julia#47933 to enumerate the outstanding tasks. In case of the latter, you can also use gdb to identify the blocking API call.

Either way, this is a different issue, unrelated to the task scheduling delay reported here. And it may still be a user error; if you create a deadlock on a task, that will result in the task never finishing and Julia soft-hanging.

@physicsjoke
Copy link

You were correct and this was not your issue, sorry for the confusion :)

@maleadt
Copy link
Member Author

maleadt commented Apr 22, 2024

Should be fixed by #2025

@maleadt maleadt closed this as completed Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants