-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Post _on_completed
callbacks to io_service_
instead of running them synchronously
#39112
[core] Post _on_completed
callbacks to io_service_
instead of running them synchronously
#39112
Conversation
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Kicked off release tests for core and serve: https://buildkite.com/ray-project/release-tests-pr/builds/51486 |
…-asyncio-callback-to-loop
…-asyncio-callback-to-loop
Test test seems pretty promising |
previous release tests are here; https://buildkite.com/ray-project/release-tests-pr/builds/51486 (note: I didn't finish some of long running tests to run core tests) |
@edoakes can you take a look at failures from https://buildkite.com/ray-project/release-tests-pr/builds/51486#018a485f-94f3-4afa-96a5-3f3ea7cab789? (from logs its failures are pretty mysterious) |
Rerunning it with core https://buildkite.com/ray-project/release-tests-pr/builds/51571. I hope the wheel is correct this time... |
Test result so far seems pretty promising |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The test result seems promising
_on_completed
callbacks to io_service_
instead of running them synchronously_on_completed
callbacks to io_service_
instead of running them synchronously
Signed-off-by: Edward Oakes <[email protected]>
@rkooo567 there are a lot of failures that have tracebacks like this:
Looks sort of suspicious (some others had this raised in |
Same tracebacks are happening on the release branch, so I'm not concerned about them: https://buildkite.com/ray-project/release-tests-branch/builds/2103#018a44b2-6415-495b-8823-7824101415f8 Note that this test also forcibly kills replicas constantly, so seeing some tracebacks is expected. |
All tests also passed on the serve cancellation PR with this change merged in: https://buildkite.com/ray-project/oss-ci-build-pr/builds/34426 |
…ning them synchronously (ray-project#39112) Fixes the deadlock described in the linked issue by posting callbacks to the `io_service_`. This avoids running them while holding locks (such as from the task manager).
…ning them synchronously (ray-project#39112) Fixes the deadlock described in the linked issue by posting callbacks to the `io_service_`. This avoids running them while holding locks (such as from the task manager). Signed-off-by: Edward Oakes <[email protected]>
* [core] Decrement `ObjectRef._on_completed` callback when callback errors (#38673) We manually increment and decrement the Python refcount for the callback to avoid Python GC'ing it while it's stored in the core worker. However, the decrement logic is skipped if the callback raises an exception. This change adds a `try: finally` block to always decrement the ref count. The added test case fails without the change and passes w/ it. Signed-off-by: Edward Oakes <[email protected]> * [core] Post `_on_completed` callbacks to `io_service_` instead of running them synchronously (#39112) Fixes the deadlock described in the linked issue by posting callbacks to the `io_service_`. This avoids running them while holding locks (such as from the task manager). Signed-off-by: Edward Oakes <[email protected]> --------- Signed-off-by: Edward Oakes <[email protected]>
…ning them synchronously (ray-project#39112) Fixes the deadlock described in the linked issue by posting callbacks to the `io_service_`. This avoids running them while holding locks (such as from the task manager).
…ning them synchronously (ray-project#39112) Fixes the deadlock described in the linked issue by posting callbacks to the `io_service_`. This avoids running them while holding locks (such as from the task manager).
…ning them synchronously (ray-project#39112) Fixes the deadlock described in the linked issue by posting callbacks to the `io_service_`. This avoids running them while holding locks (such as from the task manager). Signed-off-by: Jim Thompson <[email protected]>
…ning them synchronously (ray-project#39112) Fixes the deadlock described in the linked issue by posting callbacks to the `io_service_`. This avoids running them while holding locks (such as from the task manager). Signed-off-by: Victor <[email protected]>
Why are these changes needed?
Attempts to fix the deadlock described in the linked issue by posting callbacks to the
io_service_
. This avoids running them while holding locks (such as from the task manager).Related issue number
#39113 (will be closed once cherry-picked).
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.