Skip to content

Latest commit

 

History

History
241 lines (173 loc) · 12.5 KB

kubernetes-advanced.rst

File metadata and controls

241 lines (173 loc) · 12.5 KB
orphan:

Ray Operator Advanced Configuration

This document covers configuration options and other details concerning autoscaling Ray clusters on Kubernetes. We recommend first reading this :ref:`introductory guide<ray-k8s-deploy>`.

Helm chart configuration

This section discusses RayCluster configuration options exposed in the Ray Helm chart's values.yaml file. The default settings in values.yaml were chosen for the purposes of demonstration. For production use cases, the values should be modified. For example, you will probably want to increase Ray Pod resource requests.

Setting custom chart values

To configure Helm chart values, you can pass in a custom values yaml and/or set individual fields.

# Pass in a custom values yaml.
$ helm install example-cluster -f custom_values.yaml ./ray
# Set custom values on the command line.
$ helm install example-cluster --set image=rayproject/ray:1.2.0 ./ray

Refer to the Helm docs for more information.

Ray cluster configuration

A :ref:`Ray cluster<cluster-index>` consists of a head node and a collection of worker nodes. When deploying Ray on Kubernetes, each Ray node runs in its own Kubernetes Pod. The podTypes field of values.yaml represents the pod configurations available for use as nodes in the Ray cluster. The key of each podType is a user-defined name. The field headPodType identifies the name of the podType to use for the Ray head node. The rest of the podTypes are used as configuration for the Ray worker nodes.

Each podType specifies minWorkers and maxWorkers fields. The autoscaler will try to maintain at least minWorkers of the podType and can scale up to maxWorkers according to the resource demands of the Ray workload. A common pattern is to specify minWorkers = maxWorkers = 0 for the head podType; this signals that the podType is to be used only for the head node. You can use helm upgrade to adjust the fields minWorkers and maxWorkers without :ref:`restarting<k8s-restarts>` the Ray cluster.

The fields CPU, GPU, memory, and nodeSelector configure the Kubernetes PodSpec to use for nodes of the podType. The image field determines the Ray container image used by all nodes in the Ray cluster.

The rayResources field of each podType can be used to signal the presence of custom resources to Ray. To schedule Ray tasks and actors that use custom hardware resources, rayResources can be used in conjunction with nodeSelector:

  • Use nodeSelector to constrain workers of a podType to run on a Kubernetes Node with specialized hardware (e.g. a particular GPU accelerator.)
  • Signal availability of the hardware for that podType with rayResources: {"custom_resource": 3}.
  • Schedule a Ray task or actor to use that resource with @ray.remote(resources={"custom_resource": 1}).

By default, the fields CPU, GPU, and memory are used to configure cpu, gpu, and memory resources advertised to Ray. However, rayResources can be used to override this behavior. For example, rayResources: {"CPU": 0} can be set for head podType, to :ref:avoid scheduling tasks on the Ray head.

Refer to the documentation in values.yaml for more details.

Note

If your application could benefit from additional configuration options in the Ray Helm chart, (e.g. exposing more PodSpec fields), feel free to open a feature request on the Ray GitHub or a discussion thread on the Ray forums.

For complete configurability, it is also possible launch a Ray cluster :ref:`without the Helm chart<no-helm>` or to modify the Helm chart.

Note

Some things to keep in mind about the scheduling of Ray worker pods and Ray tasks/actors:

1. The Ray Autoscaler executes scaling decisions by sending pod creation requests to the Kubernetes API server. If your Kubernetes cluster cannot accomodate more worker pods of a given podType, requested pods will enter a Pending state until the pod can be scheduled or a timeout expires.

  1. If a Ray task requests more resources than available in any podType, the Ray task cannot be scheduled.

Running multiple Ray clusters

The Ray Operator can manage multiple Ray clusters running within a single Kubernetes cluster. Since Helm does not support sharing resources between different releases, an additional Ray cluster must be launched in a Helm release separate from the release used to launch the Operator.

To enable launching with multiple Ray Clusters, the Ray Helm chart includes two flags:

  • operatorOnly: Start the Operator without launching a Ray cluster.
  • clusterOnly: Create a RayCluster custom resource without installing the Operator. (If the Operator has already been installed, a new Ray cluster will be launched.)

The following commands will install the Operator and two Ray Clusters in three separate Helm releases:

# Install the operator in its own Helm release.
$ helm install ray-operator --set operatorOnly=true ./ray

# Install a Ray cluster in a new namespace "ray".
$ helm -n ray install example-cluster --set clusterOnly=true ./ray --create-namespace

# Install a second Ray cluster. Launch the second cluster without any workers.
$ helm -n ray install example-cluster2 \
    --set podTypes.rayWorkerType.minWorkers=0 --set clusterOnly=true ./ray

# Examine the pods in both clusters.
$ kubectl -n ray get pods
NAME                                    READY   STATUS    RESTARTS   AGE
 example-cluster-ray-head-type-v6tt9     1/1     Running   0          35s
 example-cluster-ray-worker-type-fmn4k   1/1     Running   0          22s
 example-cluster-ray-worker-type-r6m7k   1/1     Running   0          22s
 example-cluster2-ray-head-type-tj666    1/1     Running   0          15s

Alternatively, the Operator and one of the Ray Clusters can be installed in the same Helm release:

# Start the operator. Install a Ray cluster in a new namespace.
helm -n ray install example-cluster --create-namespace ./ray

# Start another Ray cluster.
# The cluster will be managed by the operator created in the last command.
$ helm -n ray install example-cluster2 \
   --set podTypes.rayWorkerType.minWorkers=0 --set clusterOnly=true ./ray

The Operator pod outputs autoscaling logs for all of the Ray clusters it manages. Each line of output is prefixed by the string <cluster name>,<namespace>. This string can be used to filter for a specific Ray cluster's logs:

# The last 100 lines of logging output for the cluster with name "example-cluster2" in namespace "ray":
$ kubectl logs \
  $(kubectl get pod -l cluster.ray.io/component=operator -o custom-columns=:metadata.name) \
  | grep example-cluster2,ray | tail -n 100

Cleaning up resources

When cleaning up, RayCluster resources must be deleted before the Operator deployment is deleted. This is because the Operator must remove a finalizer from the RayCluster resource to allow deletion of the resource to complete.

If the Operator and RayCluster are created as part of the same Helm release, the RayCluster must be deleted :ref:`before<k8s-cleanup-basic>` uninstalling the Helm release. If the Operator and one or more RayClusters are created in multiple Helm releases, the RayCluster releases must be uninstalled before the Operator release.

To remedy a situation where the Operator deployment was deleted first and RayCluster deletion is hanging, try one of the following:

  • Manually delete the RayCluster's finalizers with kubectl edit or kubectl patch.
  • Restart the Operator so that it can remove RayCluster finalizers. Then remove the Operator.

Cluster-scoped vs. namespaced operators

By default, the Ray Helm chart installs a cluster-scoped operator. This means that the operator manages all Ray clusters in your Kubernetes cluster, across all namespaces. The namespace into which the Operator Deployment is launched is determined by the chart field operatorNamespace. If this field is unset, the operator is launched into namespace default.

It is also possible to run a namespace-scoped Operator. This means that the Operator is launched into the namespace of the Helm release and manages only Ray clusters in that namespace. To run a namespaced Operator, add the flag --set namespacedOperator=True to your Helm install command.

Warning

Do not simultaneously run namespaced and cluster-scoped Ray Operators within one Kubernetes cluster, as this will lead to unintended effects.

Deploying without Helm

It is possible to deploy the Ray Operator without Helm. The necessary configuration files are available on the Ray GitHub. The following manifests should be installed in the order listed:

Ray Cluster Lifecycle

Restart behavior

The Ray cluster will restart under the following circumstances:

  • There is an error in the cluster's autoscaling process. This will happen if the Ray head node goes down.
  • There has been a change to the Ray head pod configuration. In terms of the Ray Helm chart, this means either image or one of the following fields of the head's podType has been modified: CPU, GPU, memory, nodeSelector.

Similarly, all workers of a given podType will be discarded if

  • There has been a change to image or one of the following fields of the podType: CPU, GPU, memory, nodeSelector.

Status information

Running kubectl -n <namespace> get raycluster will show all Ray clusters in the namespace with status information.

kubectl -n ray get rayclusters
NAME              STATUS    RESTARTS   AGE
example-cluster   Running   0          9s

The STATUS column reports the RayCluster's status.phase field. The following values are possible:

  • Empty/nil: This means the RayCluster resource has not yet been registered by the Operator.
  • Updating: The Operator is launching the Ray cluster or processing an update to the cluster's configuration.
  • Running: The Ray cluster's autoscaling process is running in a normal state.
  • AutoscalingExceptionRecovery The Ray cluster's autoscaling process has crashed. Ray processes will restart. This can happen if the Ray head node goes down.
  • Error There was an unexpected error while updating the Ray cluster. (The Ray maintainers would be grateful if you file a bug report with operator logs.)

The RESTARTS column reports the RayCluster's status.autoscalerRetries field. This tracks the number of times the cluster has restarted due to an autoscaling error.

Questions or Issues?