forked from ray-project/ray
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[core][state] Adding task failure type to state API [1/2] (ray-projec…
…t#32834) This PR adds the `ErrorType` information of failed tasks to state API, so that it's queryable from state API and could be made visible on the dashboard. This PR would allow us to mark children task failure properly by differentiating with the error type (`WORKER_DIED` vs `TASK_EXECUTION_EXECEPTION`) (see ray-project#32835) **Major changes in the PR are:** 1. Propogage the `ErrorType` info (as well as the `RayErrorInfo`) into the `TaskManager` when it marks tasks as failed. 2. A bit of plumbing to to the `TaskEventBuffer` interface for constructing a `TaskStatusEvent` (wrapping of various arguments into a struct `TaskStateUpdate` so that future additional info will occur with minimal plumbing of function signatures) 3. Embed the `ErrorType` into `RayErrorInfo` at places when the two are used. (They are currently not coupled even though `ErrorType` is a field of `RayErrorInfo`). -------- The PR doesn't test all the error type yet but only some, and some I believe are not relevant with task failures. Also need some help to find a way to test the others in unittesting. **I beleive these are not relevant with task failures:** - [ ] OBJECT_UNRECONSTRUCTABLE: - [ ] OBJECT_IN_PLASMA: - [ ] RUNTIME_ENV_SETUP_FAILED: - [ ] OBJECT_LOST: - [ ] OWNER_DIED: - [ ] OWNER_DIED: - [ ] OBJECT_UNRECONSTRUCTABLE_MAX_ATTEMPTS_EXCEEDED: - [ ] OBJECT_UNRECONSTRUCTABLE_LINEAGE_EVICTED: - [ ] OBJECT_FETCH_TIMED_OUT - [ ] ACTOR_PLACEMENT_GROUP_REMOVED: - [ ] ACTOR_UNSCHEDULABLE_ERROR: - [ ] OUT_OF_DISK_ERROR - [ ] OBJECT_FREED: - [ ] DEPENDENCY_RESOLUTION_FAILED **Relevant task failures (tested)** - [x] WORKER_DIED - [x] ACTOR_DIED - [x] TASK_EXECUTION_EXCEPTION - [x] TASK_CANCELLED - [x] ACTOR_CREATION_FAILED: should be addressed together with this PR: ray-project#32726 - [x] LOCAL_RAYLET_DIED: not sure how to repro - [x] TASK_PLACEMENT_GROUP_REMOVED - [x] TASK_UNSCHEDULABLE_ERROR - [x] OUT_OF_MEMORY - [x] NODE_DIED Co-authored-by: SangBin Cho <[email protected]>
- Loading branch information
Showing
18 changed files
with
327 additions
and
84 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.