Skip to content

Commit

Permalink
[Doc][KubeRay v1.1.0] Deprecate deploymentUnhealthySecondThreshold an…
Browse files Browse the repository at this point in the history
…d serviceUnhealthySecondThreshold (#43626)

Signed-off-by: Kai-Hsun Chen <[email protected]>
  • Loading branch information
kevin85421 committed Mar 1, 2024
1 parent d9fc537 commit d86d199
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 21 deletions.
14 changes: 0 additions & 14 deletions doc/source/cluster/kubernetes/user-guides/rayservice.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,20 +267,6 @@ curl -X POST -H 'Content-Type: application/json' rayservice-sample-serve-svc:800
# [Expected output]: 8
```

### Other possible scenarios that trigger a new RayCluster preparation

> Note: The following behavior is for KubeRay v0.6.2 or newer.
For older versions, see [kuberay#1293](https://github.com/ray-project/kuberay/pull/1293) for more details.

KubeRay also triggers a new RayCluster preparation if it considers a RayCluster unhealthy.
In the RayService, KubeRay can mark a RayCluster as unhealthy in two possible scenarios.

* Case 1: The KubeRay operator can't connect to the dashboard agent on the head Pod for more than the duration defined by the `deploymentUnhealthySecondThreshold` parameter. Both the default value and values in sample YAML files of `deploymentUnhealthySecondThreshold` are 300 seconds.

* Case 2: The KubeRay operator marks a RayCluster as unhealthy if the status of a serve application is `DEPLOY_FAILED` or `UNHEALTHY` for a duration exceeding the `serviceUnhealthySecondThreshold` parameter. Both the default value and values in sample YAML files of `serviceUnhealthySecondThreshold` are 900 seconds.

After KubeRay marks a RayCluster as unhealthy, it initiates the creation of a new RayCluster. Once the new RayCluster is ready, KubeRay redirects network traffic to it, and subsequently deletes the old RayCluster.

## Step 9: Clean up the Kubernetes cluster

```sh
Expand Down
8 changes: 1 addition & 7 deletions doc/source/serve/production-guide/kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,18 +30,12 @@ Once the KubeRay controller is running, manage your Ray Serve application by cre

Under the `spec` section in the `RayService` CR, set the following fields:

**`serviceUnhealthySecondThreshold`**: Represents the threshold in seconds that defines when a service is considered unhealthy (application status is not RUNNING status). The default is 60 seconds. When the service is unhealthy, the KubeRay Service controller tries to recreate a new cluster and deploy the application to the new cluster.

**`deploymentUnhealthySecondThreshold`**: Represents the number of seconds that the Serve application status can be unavailable before the service is considered unhealthy. The Serve application status is unavailable whenever the Ray dashboard is unavailable. The default is 60 seconds. When the service is unhealthy, the KubeRay Service controller tries to recreate a new cluster and deploy the application to the new cluster.

**`serveConfigV2`**: Represents the configuration that Ray Serve uses to deploy the application. Using `serve build` to print the Serve configuration and copy-paste it directly into your [Kubernetes config](serve-in-production-kubernetes) and `RayService` CR.

**`rayClusterConfig`**: Populate this field with the contents of the `spec` field from the `RayCluster` CR YAML file. Refer to [KubeRay configuration](kuberay-config) for more details.

:::{tip}
To enhance the reliability of your application, particularly when dealing with large dependencies that may require a significant amount of time to download, consider increasing the value of the `deploymentUnhealthySecondThreshold` to avoid a cluster restart.

Alternatively, include the dependencies in your image's Dockerfile, so the dependencies are available as soon as the pods start.
To enhance the reliability of your application, particularly when dealing with large dependencies that may require a significant amount of time to download, consider including the dependencies in your image's Dockerfile, so the dependencies are available as soon as the pods start.
:::

(serve-deploy-app-on-kuberay)=
Expand Down

0 comments on commit d86d199

Please sign in to comment.