-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Refactoring Ray DAG object scanner #26917
Conversation
Signed-off-by: Siyuan Zhuang <[email protected]>
Signed-off-by: Siyuan Zhuang <[email protected]>
|
||
T = TypeVar("T") | ||
|
||
# Used in deserialization hooks to reference scanner instances. | ||
_instances: Dict[int, "_PyObjScanner"] = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we have a global variable for this. I have some concerns for a multi-threading env. It's normal that the user might have multiple thread creating the DAGs right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have the same issue for previous implementation. Also because of GIL, multi-threaded accessing actually is not an issue here (also in the code, the class object would not be shared by multiple threads); but it could be an issue in other parts of Ray DAG. I do not think we need to take care of it at this stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand this. This is a dict, right? Do you mean that dict operation inside python is atomic? For example, one instruction for dict operations?
If so, I'm ok with this.
also in the code, the class object would not be shared by multiple threads
I'm not sure. _instances
is per-module right? so it'll be shared across threads. If we have multiple threads creating DAG, is this still thread safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, dict operation inside python is atomic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know! Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @ericl do you have any comments for this?
Some CI failure may be related. |
Could not reproduce the CI failure locally. Retesting. |
Signed-off-by: Siyuan Zhuang <[email protected]>
Looks like the CI failure is gone. |
The CI failure is fixed by #26960 . It is not related to this PR. Let me merge it. |
…rch-extra-index-url * 'master' of https://github.com/ray-project/ray: (26 commits) [runtime env] plugin refactor[6/n]: java api refactor (ray-project#26783) easy test? (ray-project#26905) [core] Introduce a flag which allows a longer timeout for raylet when GCS restarts. (ray-project#26919) [RLLib] Record framework and algorithm used by an RLlib run. (ray-project#26956) [train] set split locality_hints (ray-project#26973) [Serve] Make the checkpoint and recover only from GCS (ray-project#26753) [AIR DOC] minor tweaks to checkpoint user guide for clarity and consistency subheadings (ray-project#26937) [tune] Only sync down from cloud if needed (ray-project#26725) [Core] Refactoring Ray DAG object scanner (ray-project#26917) [air] Raise error on path-like access for Checkpoints (ray-project#26970) [AIR] Enable other notebooks previously marked with # REGRESSION (ray-project#26896) [RLlib] Simplify agent collector (ray-project#26803) [Datasets] Automatically cast tensor columns when building Pandas blocks. (ray-project#26924) [Workflow] Fix flaky example(ray-project#26960) [dashboard] Update cluster_activities endpoint to use pydantic. (ray-project#26609) [air] Un-revert "[air] remove unnecessary logs + improve repr for result" (ray-project#26942) [setup-dev] Add flag to skip symlink certain folders (ray-project#26899) [air/tune/docs] Cont. convert Tune examples to use Tuner.fit() (ray-project#26959) [air/tune/docs] Change Tuner() occurences in rest of ray/tune (ray-project#26961) [data] set iter_batches default batch_size (ray-project#26955) ... Signed-off-by: ddelange <[email protected]>
* make sure Ray DAG can work with minimal install Signed-off-by: Siyuan Zhuang <[email protected]>\ Signed-off-by: klwuibm <[email protected]>
* make sure Ray DAG can work with minimal install Signed-off-by: Siyuan Zhuang <[email protected]>\ Signed-off-by: Catch-Bull <[email protected]>
* make sure Ray DAG can work with minimal install Signed-off-by: Siyuan Zhuang <[email protected]>\ Signed-off-by: Rohan138 <[email protected]>
* make sure Ray DAG can work with minimal install Signed-off-by: Siyuan Zhuang <[email protected]>\ Signed-off-by: Frank Luan <[email protected]>
* make sure Ray DAG can work with minimal install Signed-off-by: Siyuan Zhuang <[email protected]>\ Signed-off-by: Scott Graham <[email protected]>
* make sure Ray DAG can work with minimal install Signed-off-by: Siyuan Zhuang <[email protected]>\
* make sure Ray DAG can work with minimal install Signed-off-by: Siyuan Zhuang <[email protected]>\ Signed-off-by: Stefan van der Kleij <[email protected]>
Signed-off-by: Siyuan Zhuang [email protected]
Why are these changes needed?
This PR fixes multiple important Ray DAG issues.
DAGNode
, you need to change the code inpy_obj_scanner
to register it.dag_node
andpy_obj_scanner
This PR addresses all of them.
Related issue number
Closes #25988
Checks
scripts/format.sh
to lint the changes in this PR.