Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] Add Custom Usage Tags for Reporting #27061

Merged
merged 6 commits into from
Jul 27, 2022

Conversation

simon-mo
Copy link
Contributor

@simon-mo simon-mo commented Jul 26, 2022

Signed-off-by: simon-mo [email protected]

Why are these changes needed?

This PR adds two custom usage tag to track the total number of deployments in Ray Serve, as well as usages of two different versions of APIs.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is manually tested

@@ -1771,6 +1775,9 @@ def deploy(self, deployment_name: str, deployment_info: DeploymentInfo) -> bool:
self._deployment_states[deployment_name] = self._create_deployment_state(
deployment_name
)
record_extra_usage_tag(
TagKey.SERVE_NUM_DEPLOYMENTS, str(len(self._deployment_states))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this per cluster? usage stats is per cluster.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

# Serve
SERVE_USING_V1_API = auto()
SERVE_USING_V2_API = auto()
SERVE_NUM_DEPLOYMENTS = auto()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add descriptions of what this is measuring. (cc @thomasdesr @pcmoritz we shouldn't approve metrics without clear descriptions).


return _private_api.serve_start(detached, http_options, dedicated_cpu, **kwargs)
# Record after Ray has been started.
record_extra_usage_tag(TagKey.SERVE_USING_V1_API, "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
record_extra_usage_tag(TagKey.SERVE_USING_V1_API, "")
record_extra_usage_tag(TagKey.SERVE_API_VERSION, "v1")

RLLIB_FRAMEWORK = auto()
RLLIB_ALGORITHM = auto()
RLLIB_NUM_WORKERS = auto()

# Serve
SERVE_USING_V1_API = auto()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SERVE_USING_V1_API = auto()

Copy link
Contributor

@ericl ericl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments inline.

@simon-mo simon-mo requested a review from ericl July 27, 2022 01:09
@simon-mo
Copy link
Contributor Author

@ericl addressed. PTAL

simon-mo and others added 5 commits July 26, 2022 23:33
Signed-off-by: simon-mo <[email protected]>
Signed-off-by: simon-mo <[email protected]>
Signed-off-by: simon-mo <[email protected]>
Copy link
Contributor

@sihanwang41 sihanwang41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question, how will the stats added in serve working with the doc mentioning "The data will be uploaded every hour with other usage stats data in the cluster."?

Simple scenario, i call deploy() twice in one hour, will we have two stats point in the database?

@@ -1771,6 +1775,9 @@ def deploy(self, deployment_name: str, deployment_info: DeploymentInfo) -> bool:
self._deployment_states[deployment_name] = self._create_deployment_state(
deployment_name
)
record_extra_usage_tag(
TagKey.SERVE_NUM_DEPLOYMENTS, str(len(self._deployment_states))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a function to accept the int instead of converting to str?

(Not python expert, but it looks like inefficient)

@@ -1856,3 +1863,8 @@ def update(self):

for tag in deleted_tags:
del self._deployment_states[tag]

if len(deleted_tags):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, do we need to collect the deleted num deployment? this will make the stats inaccurate since you also gather num deployment in the deploy(),

I suggest we don't gather stats in deploy() or delete(), we can have a dedicated stats thread and push the expected stats into the kv store.

(Just checked it is a synchronous call in record_extra_usage_tag. might not be very safe to do it in the critical code path)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just trying to capture the "latest" state here since it's last writer win for that hour.

@simon-mo simon-mo merged commit 8bb37c3 into ray-project:master Jul 27, 2022
Rohan138 pushed a commit to Rohan138/ray that referenced this pull request Jul 28, 2022
franklsf95 pushed a commit to franklsf95/ray that referenced this pull request Aug 2, 2022
gramhagen pushed a commit to gramhagen/ray that referenced this pull request Aug 15, 2022
gramhagen pushed a commit to gramhagen/ray that referenced this pull request Aug 15, 2022
Chong-Li pushed a commit to alipay/ant-ray that referenced this pull request Aug 16, 2022
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants