-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1/n] Stabilize GCS/Autoscaler interface: Introduce monitor server #31827
Conversation
Co-authored-by: Dmitri Gekhtman <[email protected]> Signed-off-by: Alex Wu <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment fix
namespace ray { | ||
namespace gcs { | ||
|
||
/// GcsNodeManager is responsible for managing and monitoring nodes as well as handing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace GcsNodeManager
with GcsMonitorServer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the rest of the docstring too I suppose
Can you link me a design doc? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I'm assuming most of the gRPC code is boilerplate is copied from the existing GcsNodeManager/NodeInfoService, but let me know if there's something different here that warrants closer attention.
There have been no design changes since the previous design docs (you can refer to those design docs), I'm just refactoring everything to a cleaner defined protos. I've updated the list of things we need to handle in the github issue but there will be no design changes. |
@@ -598,6 +601,13 @@ void GcsServer::InitGcsTaskManager() { | |||
rpc_server_.RegisterService(*task_info_service_); | |||
} | |||
|
|||
void GcsServer::InitMonitorServer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why we call it "MonitorServer" that's different from other modules' naming convention (*Manager)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly because it's not actually "managing" anything. Most of the other managers actually manage some state.
Remaining test failures look like DinD issues and known flakey tests which are all unrelated. Merging. |
…ay-project#31827) This is the first PR towards stabilizing the GCS autoscaler interface by introducing a new grpc service definition which we will provide backwards compatibility guarantees. This PR mostly just introduces scaffolding and a trivial GetRayVersion endpoint. By the end of this series of PRs, monitor.py will only communicate with the rest of the ray cluster via this service definition.
…ay-project#31827) This is the first PR towards stabilizing the GCS autoscaler interface by introducing a new grpc service definition which we will provide backwards compatibility guarantees. This PR mostly just introduces scaffolding and a trivial GetRayVersion endpoint. By the end of this series of PRs, monitor.py will only communicate with the rest of the ray cluster via this service definition.
Why are these changes needed?
This is the first PR towards stabilizing the GCS autoscaler interface by introducing a new grpc service definition which we will provide backwards compatibility guarantees.
This PR mostly just introduces scaffolding and a trivial
GetRayVersion
endpoint.By the end of this series of PRs,
monitor.py
will only communicate with the rest of the ray cluster via this service definition.Related issue number
Related to #31826
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.