Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Core] Fix session_name not reused when GCS restarts + node ip address not set for driver #39211

Closed
wants to merge 46 commits into from
Closed
Changes from 1 commit
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
d2f802a
ip
rkooo567 Jul 20, 2023
a391f66
ip
rkooo567 Jul 21, 2023
3966778
working now.
rkooo567 Jul 21, 2023
992d7ab
working + lint
rkooo567 Jul 21, 2023
8083748
Fix
rkooo567 Jul 21, 2023
73bdd49
Merge branch 'master' into automatically-set-node-ip-addr
rkooo567 Aug 10, 2023
a25aa34
ip
rkooo567 Aug 10, 2023
fa2dc4c
Merge branch 'master' into automatically-set-node-ip-addr
rkooo567 Aug 22, 2023
db2af38
ip
rkooo567 Aug 22, 2023
8476e4f
ip
rkooo567 Aug 22, 2023
ba69b1b
Made it work.
rkooo567 Aug 22, 2023
4a0dd44
working
rkooo567 Aug 22, 2023
f6ee80e
Fixed a broken test.
rkooo567 Aug 22, 2023
023b3a4
Merge branch 'master' into automatically-set-node-ip-addr
rkooo567 Aug 22, 2023
dba0008
Fixed the test failure.
rkooo567 Aug 22, 2023
77b933c
print error messages before assertion
rkooo567 Aug 23, 2023
87ea063
Merge branch 'master' into automatically-set-node-ip-addr
rkooo567 Aug 23, 2023
302ecd8
ip
rkooo567 Aug 23, 2023
cdbf084
more info for debugging.
rkooo567 Aug 23, 2023
b9aedbb
Merge branch 'master' into automatically-set-node-ip-addr
rkooo567 Aug 31, 2023
55979f0
ip
rkooo567 Aug 31, 2023
4dbb1af
ip
rkooo567 Aug 31, 2023
32de899
ip
rkooo567 Aug 31, 2023
632442a
ip
rkooo567 Aug 31, 2023
1705ec4
remove bind
rkooo567 Aug 31, 2023
cbff14f
try fixing it.
rkooo567 Aug 31, 2023
3107619
remove print
rkooo567 Aug 31, 2023
bb8e1f6
Work around.
rkooo567 Aug 31, 2023
eeb3610
.
rkooo567 Aug 31, 2023
484c68a
Revert
rkooo567 Sep 1, 2023
748ddf8
Wokrs not
rkooo567 Sep 1, 2023
f35a165
Fix failed ha tests.
rkooo567 Sep 1, 2023
01e492a
fix some tests.
rkooo567 Sep 1, 2023
8a14bcc
done
rkooo567 Sep 1, 2023
cd2b44d
maybe working?
rkooo567 Sep 1, 2023
9330cc6
Revert "maybe working?"
rkooo567 Sep 1, 2023
e32d538
Revert "done"
rkooo567 Sep 1, 2023
0e73a7e
Revert "fix some tests."
rkooo567 Sep 1, 2023
b3d590a
Revert "Fix failed ha tests."
rkooo567 Sep 1, 2023
5a41237
Revert "Wokrs not"
rkooo567 Sep 1, 2023
5df758f
clean up
rkooo567 Sep 1, 2023
951e54c
Revert "Revert "Wokrs not""
rkooo567 Sep 1, 2023
32e65fc
Revert "Revert "Fix failed ha tests.""
rkooo567 Sep 1, 2023
ec0591f
Revert "Revert "fix some tests.""
rkooo567 Sep 1, 2023
6734082
Revert "Revert "done""
rkooo567 Sep 1, 2023
672d996
Revert "Revert "maybe working?""
rkooo567 Sep 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Revert "Fix failed ha tests."
This reverts commit f35a165.
  • Loading branch information
rkooo567 committed Sep 1, 2023
commit b3d590ab60e4487cfda87ce5a1295c8122e6663b
7 changes: 6 additions & 1 deletion python/ray/_private/node.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def __init__(
ray_params.external_addresses = external_redis
ray_params.num_redis_shards = len(external_redis) - 1
storage_namespace = os.environ.get("RAY_external_storage_namespace")
if head and storage_namespace is None:
if storage_namespace is None:
raise ValueError(
"RAY_external_storage_namespace must be provided "
"when using Ray with external Redis for the fault tolerance. "
Expand Down Expand Up @@ -963,6 +963,11 @@ def _wait_and_get_for_node_address(self, timeout_s: int = 60) -> str:
The node_ip_address of the current session if it finds it
within timeout_s.
"""
# logger.error(f"Read file from {self.get_session_dir_path()}")
path = Path(self.get_session_dir_path())
file_names = [f.name for f in path.iterdir() if f.is_file()]
# logger.error(file_names)

for i in range(timeout_s):
node_ip_address = self._get_cached_node_ip_address()

Expand Down