-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix dashboard process reporting on windows #45578
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
workers = { | ||
self._generate_worker_key(proc): proc for proc in raylet_proc.children() | ||
} | ||
workers = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do this for windows only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. While PEP 397 does mention windows, there is a launcher for other operating systems. The problem found here is subtle and by making it windows-only may trip up users on other platforms.
Thanks for the hint. After a036832 running How do I see if the Edit: setproctitile mentions the windows limitation at the end of the README |
cc @anyscalesam - can you help shepherd here among core team? |
@anyscalesam ping |
It is a shame this missed the 2.30 release. |
10b40c9
to
921d111
Compare
@jjyao rebased and cleaned up commit history: one commit fixes the problem and the other cleans up tests and skips one that cannot succeed. |
# windows, get the child process not the runner | ||
for child in raylet_proc.children(): | ||
if child.children(): | ||
child = child.children()[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do it only for windows, otherwise if a Ray task launches a child process, this will find that process not the Ray task worker process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, that would happen on windows too. I wonder if there is some better indicator that the process is the one we are looking for. I should add a test for a worker that has subprocess.Popen()
in its task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be fine since we only get child of child and we know the direct child must be a launcher?
self._raylet_proc = curr_proc.parent() | ||
# The dashboard agent is a child of the raylet process. | ||
# It is not necessarily the direct child (python-windows | ||
# typically uses a py.exe runner to run python), so search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is incomplete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Signed-off-by: mattip <[email protected]> Signed-off-by: Matti Picus <[email protected]>
Signed-off-by: mattip <[email protected]> Signed-off-by: Matti Picus <[email protected]>
Signed-off-by: Matti Picus <[email protected]>
Signed-off-by: Matti Picus <[email protected]>
Rebased to clear merge conflicts |
@jjyao CI is passing |
@jjyao : Can you merge that, please? Unfortunately, another release was missed, but then let's ensure the solution makes it to the next one. |
self._generate_worker_key(proc): proc for proc in raylet_proc.children() | ||
} | ||
workers = {} | ||
if sys.platform == "win32": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we split the proc finding logic vs the dict assignment? something like
def find_worker_child_proc(proc):
if sys.platform != "win32":
return proc
# TODO: add comment about the launcher and PEP 397
# TODO: add some checks here to make sure it's the worker ?
if child.children():
# TODO: assert len(child.children()) == 1 ?
return child.children()[0]
return child
workers = {
self._generate_worker_key(proc): proc
for proc in map(find_worker_child_proc, raylet_proc.children())
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe that can be done in a follow-on clean-up PR? It has been difficult to get reviewed and approved and I don't want to prolong the process.
Why are these changes needed?
Fixes two problems found on windows in process dashboard reporting:
num_fds
metric which is not supported on windowsraylet
process, it is not sufficient to just checkpid.parent()
, so now the check climbs up the process heirarchy until it finds a process namedraylet*
or reaches the root.raylet.child()
, so for each child the check climbs down the process heirarchy to the bottom.Happy to add a test, or to enable a skipped windows test. Where can I find one?
With these changes, I now see data in the actor panel of the dashboard.
Related issue number
Closes #44604
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.