-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Core] Fix session_name not reused when GCS restarts + node ip address not set for driver #39211
Conversation
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
Signed-off-by: SangBin Cho <[email protected]>
@@ -34,19 +34,29 @@ Setting up Redis | |||
set the OS environment ``RAY_REDIS_ADDRESS`` to | |||
the Redis address, and supply the ``--redis-password`` flag with the password when calling ``ray start``: | |||
|
|||
You should also set the OS environment variable ``RAY_external_storage_namespace`` to isolate the data stored in Redis. | |||
This makes sure that there is no data conflicts if multiple Ray clusters share the same Redis instance. | |||
``RAY_external_storage_namespace`` must be an unique ID, and whenever you restart a head node, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we give users an example of unique id or generator code like uuid.uuid4()
?
# date including microsecond | ||
date_str = datetime.datetime.today().strftime("%Y-%m-%d_%H-%M-%S_%f") | ||
self._session_name = f"session_{date_str}_{os.getpid()}" | ||
else: | ||
assert not self._default_worker | ||
session_name = ray._private.utils.internal_kv_get_with_retry( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This not a broker) . the behavior of worker node will be changed if worker node also set RAY_external_storage_namespace
.
origin:
worker node's session_name always fetch from kv storage.
Now:
If worker node set RAY_external_storage_namespace
. worker node's session_name will use RAY_external_storage_namespace
first.
If the RAY_external_storage_namespace is set improperly by the user, it can potentially cause issues with the worker nodes.
Sorry @larrylian this was the emergency PR in case we cannot get #39194 in. I think #39194 will work out, so let me close the issue |
ok |
Why are these changes needed?
This is the PR that combines #37644 and also allows to reuse the session_name when GCS is restarted.
Every GCS HA users are supposed to use
RAY_external_storage_namespace
, and we are using this name as a session_name. This also means all the directory will have different style naming when GCS HA is used (instead of session_, it will become session_).Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.