You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
what we noticed is that node start failing due to agent failure like following
2023-07-21 02:20:52,503 ERROR reporter_agent.py:1112 -- Error publishing node physical stats.
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/ray/thirdparty_files/psutil/_pslinux.py", line 1653, in wrapper
return fun(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/ray/thirdparty_files/psutil/_common.py", line 480, in wrapper
raise raise_from(err, None)
File "<string>", line 3, in raise_from
File "/usr/local/lib/python3.8/dist-packages/ray/thirdparty_files/psutil/_common.py", line 478, in wrapper
return fun(self)
File "/usr/local/lib/python3.8/dist-packages/ray/thirdparty_files/psutil/_pslinux.py", line 1695, in _parse_stat_file
data = bcat("%s/%s/stat" % (self._procfs_path, self.pid))
File "/usr/local/lib/python3.8/dist-packages/ray/thirdparty_files/psutil/_common.py", line 813, in bcat
return cat(fname, fallback=fallback, _open=open_binary)
File "/usr/local/lib/python3.8/dist-packages/ray/thirdparty_files/psutil/_common.py", line 801, in cat
with _open(fname) as f:
File "/usr/local/lib/python3.8/dist-packages/ray/thirdparty_files/psutil/_common.py", line 765, in open_binary
return open(fname, "rb", buffering=FILE_READ_BUFFER_SIZE)
FileNotFoundError: [Errno 2] No such file or directory: '/proc/1214155/stat'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 1095, in _perform_iteration
stats = self._get_all_stats()
File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 626, in _get_all_stats
"workers": self._get_workers(),
File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 512, in _get_workers
return [
File "/usr/local/lib/python3.8/dist-packages/ray/dashboard/modules/reporter/reporter_agent.py", line 525, in <listcomp>
if w.status() != psutil.STATUS_ZOMBIE
File "/usr/local/lib/python3.8/dist-packages/ray/thirdparty_files/psutil/__init__.py", line 691, in status
return self._proc.status()
File "/usr/local/lib/python3.8/dist-packages/ray/thirdparty_files/psutil/_pslinux.py", line 1653, in wrapper
return fun(self, *args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/ray/thirdparty_files/psutil/_pslinux.py", line 2187, in status
letter = self._parse_stat_file()['status']
File "/usr/local/lib/python3.8/dist-packages/ray/thirdparty_files/psutil/_pslinux.py", line 1660, in wrapper
raise NoSuchProcess(self.pid, self._name)
psutil.NoSuchProcess: process no longer exists (pid=1214155)
Versions / Dependencies
latest
Reproduction script
n/a
Issue Severity
None
The text was updated successfully, but these errors were encountered:
scv119
added
bug
Something that is supposed to be working; but isn't
P0
Issues that should be fixed in short order
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jul 24, 2023
scv119
added
core
Issues that should be addressed in Ray Core
and removed
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jul 24, 2023
What happened + What you expected to happen
what we noticed is that node start failing due to agent failure like following
Versions / Dependencies
latest
Reproduction script
n/a
Issue Severity
None
The text was updated successfully, but these errors were encountered: