Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] create internal request id to track request objects #45761

Merged
merged 8 commits into from
Jun 10, 2024

Conversation

GeneDer
Copy link
Contributor

@GeneDer GeneDer commented Jun 5, 2024

Why are these changes needed?

User found an issue where when the request id is reused by multiple requests around the same time, Serve's proxy will be unable to track the request objects correctly. We should not rely on client's request id to be unique during their lifecycle. This PR added a new internal_request_id on the RequestMetadata, randomly generated by the proxy, to track the request object. Client passed request_id continues to be set on the RequestMetadata and be used for logging. Also, updated the doc on the request id to make it more clear on what happens when it's re-used.

Related issue number

Closes #45723

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@GeneDer GeneDer added serve Ray Serve Related Issue go add ONLY when ready to merge, run all tests labels Jun 5, 2024
@GeneDer GeneDer marked this pull request as ready for review June 5, 2024 23:48
@@ -396,15 +396,18 @@ def _get_response_handler_info(
if version.parse(starlette.__version__) < version.parse("0.33.0"):
proxy_request.set_path(route_path.replace(route_prefix, "", 1))

internal_request_id = generate_request_id()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internal_request_id is generated here always

Copy link
Contributor

@JoshKarpel JoshKarpel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for jumping on this so quickly! Much appreciated!

doc/source/serve/monitoring.md Outdated Show resolved Hide resolved
@@ -677,6 +677,7 @@ class DeploymentHandleSource(str, Enum):
@dataclass
class RequestMetadata:
request_id: str
internal_request_id: str
Copy link

@jugalshah291 jugalshah291 Jun 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq: is the internal_request_id send as an header to the app behind ray serve proxy? (just a thought)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not, this will appear in the serve context object during the request if there is a need for the deployment to know about it, but the request client won't have this info. Do you think there is a need to pass this back to the client? Can make a new header for it if there is a need for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually reading it again, since it's in the context object, the "app" can get it through calling ray.serve.context._serve_request_context.get(). internal_request_id. It's not passed as a header to the "app" nor the request "client".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ not a public API

this internal request ID is purely an implementation detail, shouldn't be exposed to end users

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh ok, let me take it out👍

@GeneDer
Copy link
Contributor Author

GeneDer commented Jun 6, 2024

Thanks for jumping on this so quickly! Much appreciated!

You got it! Also thanks for the detailed issue, that made writing tests so much easier 😄

Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GeneDer I believe request_id is used in some internal data structures inside the replica scheduler logic -- does that part need to be updated?

doc/source/serve/monitoring.md Outdated Show resolved Hide resolved
@@ -677,6 +677,7 @@ class DeploymentHandleSource(str, Enum):
@dataclass
class RequestMetadata:
request_id: str
internal_request_id: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^ not a public API

this internal request ID is purely an implementation detail, shouldn't be exposed to end users

src/ray/protobuf/serve.proto Outdated Show resolved Hide resolved
GeneDer and others added 2 commits June 6, 2024 10:10
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Gene Der Su <[email protected]>
Signed-off-by: Gene Su <[email protected]>
@edoakes
Copy link
Contributor

edoakes commented Jun 6, 2024

@GeneDer I believe request_id is used in some internal data structures inside the replica scheduler logic -- does that part need to be updated?

I think at a minimum we are using request_id in some logs in the replica scheduler -- want to use the external one to avoid this causing confusion.

Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Gene Su <[email protected]>
Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +154 to +157
"""Sending 20 requests in parallel all with the same request id, but with
different request body.
"""
bodies = [{"app_name": f"an_{uuid.uuid4()}"} for _ in range(20)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did this fail consistently w/o the code changes?

could do many more concurrent requests here, like 500, to increase confidence

Copy link
Contributor Author

@GeneDer GeneDer Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, on master, this test will consistently failing with the requests are tracked incorrectly
image

@edoakes edoakes merged commit 5452d0b into ray-project:master Jun 10, 2024
6 checks passed
richardsliu pushed a commit to richardsliu/ray that referenced this pull request Jun 12, 2024
…ect#45761)

<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

## Why are these changes needed?

User found an issue where when the request id is reused by multiple
requests around the same time, Serve's proxy will be unable to track the
request objects correctly. We should not rely on client's request id to
be unique during their lifecycle. This PR added a new
`internal_request_id` on the `RequestMetadata`, randomly generated by
the proxy, to track the request object. Client passed `request_id`
continues to be set on the `RequestMetadata` and be used for logging.
Also, updated the doc on the request id to make it more clear on what
happens when it's re-used.

## Related issue number

Closes ray-project#45723

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: Gene Su <[email protected]>
Signed-off-by: Gene Der Su <[email protected]>
Co-authored-by: Edward Oakes <[email protected]>
Signed-off-by: Richard Liu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] Serve produces insecure and unexpected behavior when the multiple client use the same x-request-id
4 participants