Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][accelerated DAGs] Make DAG teardown blocking and fix bug during close #45099

Merged
merged 6 commits into from
May 3, 2024

Conversation

stephanie-wang
Copy link
Contributor

Why are these changes needed?

Contains a few fixes related to DAG teardown:

  • Removes an unnecessary .close() call that would error if the DAG has a single output (instead of a MultiOutputNode)
  • Makes dag.teardown() blocking to ensure that actors can be reused after the teardown call returns.
  • Makes DAG teardown in __del__ asynchronous. if synchronous, this can hang the driver upon shutdown. I'm not exactly sure why but I believe this happens if the CoreWorker is shut down before dag.teardown() is complete.

Your Name and others added 4 commits May 1, 2024 17:54
x
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Your Name <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Stephanie Wang <[email protected]>
python/ray/dag/compiled_dag_node.py Outdated Show resolved Hide resolved
python/ray/dag/compiled_dag_node.py Show resolved Hide resolved
python/ray/dag/compiled_dag_node.py Show resolved Hide resolved
python/ray/dag/compiled_dag_node.py Outdated Show resolved Hide resolved
self.teardown()
monitor = getattr(self, "_monitor", None)
if monitor is not None:
# Teardown asynchronously.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is because del call ordering is not guaranteed when python interpreter exits

if self.in_teardown:
if wait:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this? Since we are already in teardown, isn't there supposed to be other caller that is waiting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to make sure that explicitly calling .teardown() will always wait until the DAG is torn down.

@can-anyscale can-anyscale merged commit 4937ac3 into ray-project:master May 3, 2024
4 of 5 checks passed
shaikhismail pushed a commit to shaikhismail/ray that referenced this pull request May 4, 2024
…g close (ray-project#45099)

## Why are these changes needed?

Contains a few fixes related to DAG teardown:
- Removes an unnecessary `.close()` call that would error if the DAG has
a single output (instead of a MultiOutputNode)
- Makes `dag.teardown()` blocking to ensure that actors can be reused
after the teardown call returns.
- Makes DAG teardown in `__del__` asynchronous. if synchronous, this can
hang the driver upon shutdown. I'm not exactly sure why but I believe
this happens if the CoreWorker is shut down before `dag.teardown()` is
complete.

---------
Signed-off-by: Stephanie Wang <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request May 8, 2024
…g close (ray-project#45099)

## Why are these changes needed?

Contains a few fixes related to DAG teardown:
- Removes an unnecessary `.close()` call that would error if the DAG has
a single output (instead of a MultiOutputNode)
- Makes `dag.teardown()` blocking to ensure that actors can be reused
after the teardown call returns.
- Makes DAG teardown in `__del__` asynchronous. if synchronous, this can
hang the driver upon shutdown. I'm not exactly sure why but I believe
this happens if the CoreWorker is shut down before `dag.teardown()` is
complete.

---------
Signed-off-by: Stephanie Wang <[email protected]>
peytondmurray pushed a commit to peytondmurray/ray that referenced this pull request May 13, 2024
…g close (ray-project#45099)

## Why are these changes needed?

Contains a few fixes related to DAG teardown:
- Removes an unnecessary `.close()` call that would error if the DAG has
a single output (instead of a MultiOutputNode)
- Makes `dag.teardown()` blocking to ensure that actors can be reused
after the teardown call returns.
- Makes DAG teardown in `__del__` asynchronous. if synchronous, this can
hang the driver upon shutdown. I'm not exactly sure why but I believe
this happens if the CoreWorker is shut down before `dag.teardown()` is
complete.

---------
Signed-off-by: Stephanie Wang <[email protected]>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 6, 2024
…g close (ray-project#45099)

## Why are these changes needed?

Contains a few fixes related to DAG teardown:
- Removes an unnecessary `.close()` call that would error if the DAG has
a single output (instead of a MultiOutputNode)
- Makes `dag.teardown()` blocking to ensure that actors can be reused
after the teardown call returns.
- Makes DAG teardown in `__del__` asynchronous. if synchronous, this can
hang the driver upon shutdown. I'm not exactly sure why but I believe
this happens if the CoreWorker is shut down before `dag.teardown()` is
complete.

---------
Signed-off-by: Stephanie Wang <[email protected]>
Signed-off-by: Ryan O'Leary <[email protected]>
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024
…g close (ray-project#45099)

## Why are these changes needed?

Contains a few fixes related to DAG teardown:
- Removes an unnecessary `.close()` call that would error if the DAG has
a single output (instead of a MultiOutputNode)
- Makes `dag.teardown()` blocking to ensure that actors can be reused
after the teardown call returns.
- Makes DAG teardown in `__del__` asynchronous. if synchronous, this can
hang the driver upon shutdown. I'm not exactly sure why but I believe
this happens if the CoreWorker is shut down before `dag.teardown()` is
complete.

---------
Signed-off-by: Stephanie Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants