Skip to content

Commit

Permalink
[Cluster] Fix Kuberay capitalization (#37791)
Browse files Browse the repository at this point in the history
replace Kuberay with KubeRay, and add Kuberay to the list of banned words in the linter

---------

Signed-off-by: Archit Kulkarni <[email protected]>
  • Loading branch information
architkulkarni committed Nov 27, 2023
1 parent ce20fa5 commit b21831e
Show file tree
Hide file tree
Showing 11 changed files with 33 additions and 33 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
.jar
/dashboard/client/build

# Kuberay config lives in a separate repository
# KubeRay config lives in a separate repository
python/ray/autoscaler/kuberay/config

# Files generated by flatc should be ignored
Expand Down
2 changes: 1 addition & 1 deletion .vale/styles/Google/WordList.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ swap:
in order to: to
ingest: import|load
k8s: Kubernetes
"[Kk]uberay": KubeRay
"[Kk]ube[Rr]ay": KubeRay
long press: touch & hold
network IP address: internal IP address
omnibox: address bar
Expand Down
2 changes: 1 addition & 1 deletion ci/lint/check-banned-words.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

# Checks Python and doc files for common mispellings.

BANNED_WORDS="RLLib Rllib"
BANNED_WORDS="RLLib Rllib Kuberay"

echo "Checking for common mis-spellings..."
for word in $BANNED_WORDS; do
Expand Down
4 changes: 2 additions & 2 deletions doc/source/cluster/vms/user-guides/community/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ The following is a list of community supported cluster managers.
Using a custom cloud or cluster manager
=======================================

The Ray cluster launcher currently supports AWS, Azure, GCP, Aliyun, vSphere and Kuberay out of the box. To use the Ray cluster launcher and Autoscaler on other cloud providers or cluster managers, you can implement the `node_provider.py <https://github.com/ray-project/ray/tree/master/python/ray/autoscaler/node_provider.py>`_ interface (100 LOC).
The Ray cluster launcher currently supports AWS, Azure, GCP, Aliyun, vSphere and KubeRay out of the box. To use the Ray cluster launcher and Autoscaler on other cloud providers or cluster managers, you can implement the `node_provider.py <https://github.com/ray-project/ray/tree/master/python/ray/autoscaler/node_provider.py>`_ interface (100 LOC).
Once the node provider is implemented, you can register it in the `provider section <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/local/example-full.yaml#L18>`_ of the cluster launcher config.

.. code-block:: yaml
Expand All @@ -31,5 +31,5 @@ Once the node provider is implemented, you can register it in the `provider sect
type: "external"
module: "my.module.MyCustomNodeProvider"
You can refer to `AWSNodeProvider <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/_private/aws/node_provider.py#L95>`_, `KuberayNodeProvider <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/_private/kuberay/node_provider.py#L148>`_ and
You can refer to `AWSNodeProvider <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/_private/aws/node_provider.py#L95>`_, `KubeRayNodeProvider <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/_private/kuberay/node_provider.py#L148>`_ and
`LocalNodeProvider <https://github.com/ray-project/ray/blob/master/python/ray/autoscaler/_private/local/node_provider.py#L166>`_ for more examples.
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ Then check the memory usage from the head node from the node memory usage view i
The Ray head node has more memory-demanding system components such as GCS or the dashboard.
Also, the driver runs from a head node by default. If the head node has the same memory capacity as worker nodes
and if you execute the same number of Tasks and Actors from a head node, it can easily have out-of-memory problems.
In this case, do not run any Tasks and Actors on the head node by specifying ``--num-cpus=0`` when starting a head node by ``ray start --head``. If you use Kuberay, view `here <kuberay-num-cpus>`.
In this case, do not run any Tasks and Actors on the head node by specifying ``--num-cpus=0`` when starting a head node by ``ray start --head``. If you use KubeRay, view `here <kuberay-num-cpus>`.

.. _troubleshooting-out-of-memory-reduce-parallelism:

Expand Down
2 changes: 1 addition & 1 deletion doc/source/serve/production-guide/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -246,7 +246,7 @@ Monitor your Serve application using the Ray Dashboard.
- Learn about the [Ray Serve logs](serve-logging) and how to [persistent logs](kuberay-logging) on Kubernetes.

:::{note}
- To troubleshoot application deployment failures in Serve, you can check the Kuberay operator logs by running `kubectl logs -f <kuberay-operator-pod-name>` (e.g., `kubectl logs -f kuberay-operator-7447d85d58-lv7pf`). The Kuberay operator logs contain information about the Serve application deployment event and Serve application health checks.
- To troubleshoot application deployment failures in Serve, you can check the KubeRay operator logs by running `kubectl logs -f <kuberay-operator-pod-name>` (e.g., `kubectl logs -f kuberay-operator-7447d85d58-lv7pf`). The KubeRay operator logs contain information about the Serve application deployment event and Serve application health checks.
- You can also check the controller log and deployment log, which are located under `/tmp/ray/session_latest/logs/serve/` in both the head node pod and worker node pod. These logs contain information about specific deployment failure reasons and autoscaling events.
:::

Expand Down
16 changes: 8 additions & 8 deletions python/ray/autoscaler/_private/kuberay/node_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,16 +46,16 @@
# it decreases the number of replicas and adds the exact pods that should be
# terminated to the scaleStrategy).

# KuberayNodeProvider inherits from BatchingNodeProvider.
# KubeRayNodeProvider inherits from BatchingNodeProvider.
# Thus, the autoscaler's create and terminate requests are batched into a single
# Scale Request object which is submitted at the end of autoscaler update.
# KubeRay node provider converts the ScaleRequest into a RayCluster CR patch
# and applies the patch in the submit_scale_request method.

# To reduce potential for race conditions, KuberayNodeProvider
# To reduce potential for race conditions, KubeRayNodeProvider
# aborts the autoscaler update if the operator has not yet processed workersToDelete -
# see KuberayNodeProvider.safe_to_scale().
# Once it is confirmed that workersToDelete have been cleaned up, KuberayNodeProvider
# see KubeRayNodeProvider.safe_to_scale().
# Once it is confirmed that workersToDelete have been cleaned up, KubeRayNodeProvider
# clears the workersToDelete list.


Expand Down Expand Up @@ -202,25 +202,25 @@ def _worker_group_replicas(raycluster: Dict[str, Any], group_index: int):
return raycluster["spec"]["workerGroupSpecs"][group_index].get("replicas", 1)


class KuberayNodeProvider(BatchingNodeProvider): # type: ignore
class KubeRayNodeProvider(BatchingNodeProvider): # type: ignore
def __init__(
self,
provider_config: Dict[str, Any],
cluster_name: str,
_allow_multiple: bool = False,
):
logger.info("Creating KuberayNodeProvider.")
logger.info("Creating KubeRayNodeProvider.")
self.namespace = provider_config["namespace"]
self.cluster_name = cluster_name

self.headers, self.verify = load_k8s_secrets()

assert (
provider_config.get(WORKER_LIVENESS_CHECK_KEY, True) is False
), f"To use KuberayNodeProvider, must set `{WORKER_LIVENESS_CHECK_KEY}:False`."
), f"To use KubeRayNodeProvider, must set `{WORKER_LIVENESS_CHECK_KEY}:False`."
assert (
provider_config.get(WORKER_RPC_DRAIN_KEY, False) is True
), f"To use KuberayNodeProvider, must set `{WORKER_RPC_DRAIN_KEY}:True`."
), f"To use KubeRayNodeProvider, must set `{WORKER_RPC_DRAIN_KEY}:True`."
BatchingNodeProvider.__init__(
self, provider_config, cluster_name, _allow_multiple
)
Expand Down
6 changes: 3 additions & 3 deletions python/ray/autoscaler/_private/providers.py
Original file line number Diff line number Diff line change
Expand Up @@ -91,9 +91,9 @@ def _import_kubernetes(provider_config):


def _import_kuberay(provider_config):
from ray.autoscaler._private.kuberay.node_provider import KuberayNodeProvider
from ray.autoscaler._private.kuberay.node_provider import KubeRayNodeProvider

return KuberayNodeProvider
return KubeRayNodeProvider


def _import_aliyun(provider_config):
Expand Down Expand Up @@ -188,7 +188,7 @@ def _import_external(provider_config):
"gcp": "GCP",
"azure": "Azure",
"kubernetes": "Kubernetes",
"kuberay": "Kuberay",
"kuberay": "KubeRay",
"aliyun": "Aliyun",
"external": "External",
"vsphere": "vSphere",
Expand Down
2 changes: 1 addition & 1 deletion python/ray/autoscaler/kuberay/init-config.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#!/bin/bash

# Clone pinned Kuberay commit to temporary directory, copy the CRD definitions
# Clone pinned KubeRay commit to temporary directory, copy the CRD definitions
# into the autoscaler folder.
KUBERAY_BRANCH="v1.0.0"
OPERATOR_TAG="v1.0.0"
Expand Down
2 changes: 1 addition & 1 deletion python/ray/autoscaler/kuberay/ray-cluster.complete.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# This is adapted from https://github.com/ray-project/kuberay/blob/master/ray-operator/config/samples/ray-cluster.complete.yaml
# It is a general RayCluster that has most fields in it for maximum flexibility in the Ray/Kuberay integration MVP.
# It is a general RayCluster that has most fields in it for maximum flexibility in the Ray/KubeRay integration MVP.
apiVersion: ray.io/v1alpha1
kind: RayCluster
metadata:
Expand Down
26 changes: 13 additions & 13 deletions python/ray/tests/kuberay/test_kuberay_node_provider.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
_worker_group_index,
_worker_group_max_replicas,
_worker_group_replicas,
KuberayNodeProvider,
KubeRayNodeProvider,
ScaleRequest,
)
from ray.autoscaler._private.util import NodeID
Expand Down Expand Up @@ -88,7 +88,7 @@ def test_worker_group_replicas(group_index, expected_max_replicas, expected_repl
def test_create_node_cap_at_max(
attempted_target_replica_count, expected_target_replica_count
):
"""Validates that KuberayNodeProvider does not attempt to create more nodes than
"""Validates that KubeRayNodeProvider does not attempt to create more nodes than
allowed by maxReplicas. For the config in this test, maxReplicas is fixed at 300.
Args:
Expand All @@ -98,8 +98,8 @@ def test_create_node_cap_at_max(
capped at maxReplicas (300, for the config in this test.)
"""
raycluster = get_basic_ray_cr()
with mock.patch.object(KuberayNodeProvider, "__init__", return_value=None):
kr_node_provider = KuberayNodeProvider(provider_config={}, cluster_name="fake")
with mock.patch.object(KubeRayNodeProvider, "__init__", return_value=None):
kr_node_provider = KubeRayNodeProvider(provider_config={}, cluster_name="fake")
scale_request = ScaleRequest(
workers_to_delete=set(),
desired_num_workers={"small-group": attempted_target_replica_count},
Expand Down Expand Up @@ -171,9 +171,9 @@ def mock_get(node_provider, path):
raise ValueError("Invalid path.")

with mock.patch.object(
KuberayNodeProvider, "__init__", return_value=None
), mock.patch.object(KuberayNodeProvider, "_get", mock_get):
kr_node_provider = KuberayNodeProvider(provider_config={}, cluster_name="fake")
KubeRayNodeProvider, "__init__", return_value=None
), mock.patch.object(KubeRayNodeProvider, "_get", mock_get):
kr_node_provider = KubeRayNodeProvider(provider_config={}, cluster_name="fake")
kr_node_provider.cluster_name = "fake"
nodes = kr_node_provider.non_terminated_nodes({})
assert kr_node_provider.node_data_dict == expected_node_data
Expand Down Expand Up @@ -238,16 +238,16 @@ def mock_get(node_provider, path):
],
)
def test_submit_scale_request(node_data_dict, scale_request, expected_patch_payload):
"""Test the KuberayNodeProvider's RayCluster patch payload given a dict
"""Test the KubeRayNodeProvider's RayCluster patch payload given a dict
of current node counts and a scale request.
"""
raycluster = get_basic_ray_cr()
# Add another worker group for the sake of this test.
blah_group = copy.deepcopy(raycluster["spec"]["workerGroupSpecs"][1])
blah_group["groupName"] = "blah-group"
raycluster["spec"]["workerGroupSpecs"].append(blah_group)
with mock.patch.object(KuberayNodeProvider, "__init__", return_value=None):
kr_node_provider = KuberayNodeProvider(provider_config={}, cluster_name="fake")
with mock.patch.object(KubeRayNodeProvider, "__init__", return_value=None):
kr_node_provider = KubeRayNodeProvider(provider_config={}, cluster_name="fake")
kr_node_provider.node_data_dict = node_data_dict
patch_payload = kr_node_provider._scale_request_to_patch_payload(
scale_request=scale_request, raycluster=raycluster
Expand Down Expand Up @@ -277,9 +277,9 @@ def mock_patch(kuberay_provider, path, patch_payload):
kuberay_provider._patched_raycluster = patch.apply(kuberay_provider._raycluster)

with mock.patch.object(
KuberayNodeProvider, "__init__", return_value=None
), mock.patch.object(KuberayNodeProvider, "_patch", mock_patch):
kr_node_provider = KuberayNodeProvider(provider_config={}, cluster_name="fake")
KubeRayNodeProvider, "__init__", return_value=None
), mock.patch.object(KubeRayNodeProvider, "_patch", mock_patch):
kr_node_provider = KubeRayNodeProvider(provider_config={}, cluster_name="fake")
kr_node_provider.cluster_name = "fake"
kr_node_provider._patched_raycluster = raycluster
kr_node_provider._raycluster = raycluster
Expand Down

0 comments on commit b21831e

Please sign in to comment.