Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[serve] Implement timeouts for streaming & enable all tests with experimental streaming flag turned on #36261

Merged
merged 6 commits into from
Jun 30, 2023

Conversation

edoakes
Copy link
Contributor

@edoakes edoakes commented Jun 9, 2023

Why are these changes needed?

This is the penultimate PR for turning RAY_SERVE_EXPERIMENTAL_STREAMING on by default.

In this change, we are now running all Serve tests under three conditions:

  • RAY_SERVE_EXPERIMENTAL_STREAMING=0 && RAY_SERVE_ENABLE_NEW_ROUTING=0
  • RAY_SERVE_EXPERIMENTAL_STREAMING=0 && RAY_SERVE_ENABLE_NEW_ROUTING=1
  • RAY_SERVE_EXPERIMENTAL_STREAMING=1 && RAY_SERVE_ENABLE_NEW_ROUTING=1

(Note that RAY_SERVE_EXPERIMENTAL_STREAMING=1 implies RAY_SERVE_ENABLE_NEW_ROUTING so the latter is not explicitly set).

This required making two functionality fixes:

  • Never using the streaming path for Java even if the flag is on.
  • Implementing timeouts in the streaming path.

I could split these into separate PRs, but they can't really be fully tested independently, so opting to combine them here.

Related issue number

Closes #35779

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@edoakes edoakes changed the title [WIP] flip flag to see what fails [serve] Enable all tests with experimental streaming flag turned on Jun 29, 2023
@edoakes edoakes requested a review from a team June 29, 2023 15:35
Copy link
Contributor

@shrekris-anyscale shrekris-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, this change looks good to me overall!

else:
curr_timeout_s = timeout_s - (time.time() - start)

obj_ref = await obj_ref_generator._next_async(timeout_s=curr_timeout_s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's obj_ref_generator's behavior if the timeout is negative? What if it's zero?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, let me check and ensure we're being defensive here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zero == timeout immediately, negative == indefinite. I made sure we only pass None or >=0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we don't need to have timeout_s < 0 checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what you mean. self.request_timeout_s will always be set to None if it's <0 in the constructor

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't curr_timeout_s = timeout_s - (time.time() - start) make curr_timeout_s negative?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me fix and add unit tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. I added this to utils.py::calculate_remaining_timeout and added a test case for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small comment on how we set timeout, but otherwise LGTM

python/ray/serve/_private/http_proxy.py Outdated Show resolved Hide resolved
python/ray/serve/BUILD Show resolved Hide resolved
python/ray/serve/_private/http_proxy.py Show resolved Hide resolved
else:
curr_timeout_s = timeout_s - (time.time() - start)

obj_ref = await obj_ref_generator._next_async(timeout_s=curr_timeout_s)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we don't need to have timeout_s < 0 checking?

@@ -56,6 +58,7 @@
"RAY_SERVE_HTTP_REQUEST_MAX_RETRIES cannot be negative."
)

TIMEOUT_ERROR_CODE = "timeout"
Copy link
Contributor

@sihanwang41 sihanwang41 Jun 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally, the error code is a string of number? Is the timeout error code exposing to the users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just for metrics. we already have one for disconnects (just below)

Copy link
Contributor

@sihanwang41 sihanwang41 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for addressing my comments!

Copy link
Contributor

@GeneDer GeneDer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@edoakes
Copy link
Contributor Author

edoakes commented Jun 29, 2023

FYI to reviewers: the diff grew because I refactored the tests to pull the timeout tests into test_request_timeout and add a test for timing out while pending assignment.

@edoakes edoakes changed the title [serve] Enable all tests with experimental streaming flag turned on [serve] Implement timeouts for streaming & enable all tests with experimental streaming flag turned on Jun 29, 2023
@edoakes edoakes force-pushed the streaming-on-by-default branch 2 times, most recently from 5944681 to cd6f29a Compare June 30, 2023 16:46
@edoakes edoakes requested review from zcin, architkulkarni and a team as code owners June 30, 2023 16:46
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
Signed-off-by: Edward Oakes <[email protected]>
@edoakes edoakes merged commit f40d236 into ray-project:master Jun 30, 2023
30 of 35 checks passed
SongGuyang pushed a commit to alipay/ant-ray that referenced this pull request Jul 12, 2023
…rimental streaming flag turned on (ray-project#36261)

This is the penultimate PR for turning RAY_SERVE_EXPERIMENTAL_STREAMING on by default.

In this change, we are now running all Serve tests under three conditions:

- RAY_SERVE_EXPERIMENTAL_STREAMING=0 && RAY_SERVE_ENABLE_NEW_ROUTING=0
- RAY_SERVE_EXPERIMENTAL_STREAMING=0 && RAY_SERVE_ENABLE_NEW_ROUTING=1
- RAY_SERVE_EXPERIMENTAL_STREAMING=1 && RAY_SERVE_ENABLE_NEW_ROUTING=1

(Note that RAY_SERVE_EXPERIMENTAL_STREAMING=1 implies RAY_SERVE_ENABLE_NEW_ROUTING so the latter is not explicitly set).

This required making two functionality fixes:

- Never using the streaming path for Java even if the flag is on.
- Implementing timeouts in the streaming path.

I could split these into separate PRs, but they can't really be fully tested independently, so opting to combine them here.

Signed-off-by: 久龙 <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
…rimental streaming flag turned on (ray-project#36261)

This is the penultimate PR for turning RAY_SERVE_EXPERIMENTAL_STREAMING on by default.

In this change, we are now running all Serve tests under three conditions:

- RAY_SERVE_EXPERIMENTAL_STREAMING=0 && RAY_SERVE_ENABLE_NEW_ROUTING=0
- RAY_SERVE_EXPERIMENTAL_STREAMING=0 && RAY_SERVE_ENABLE_NEW_ROUTING=1
- RAY_SERVE_EXPERIMENTAL_STREAMING=1 && RAY_SERVE_ENABLE_NEW_ROUTING=1

(Note that RAY_SERVE_EXPERIMENTAL_STREAMING=1 implies RAY_SERVE_ENABLE_NEW_ROUTING so the latter is not explicitly set).

This required making two functionality fixes:

- Never using the streaming path for Java even if the flag is on.
- Implementing timeouts in the streaming path.

I could split these into separate PRs, but they can't really be fully tested independently, so opting to combine them here.

Signed-off-by: harborn <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
…rimental streaming flag turned on (ray-project#36261)

This is the penultimate PR for turning RAY_SERVE_EXPERIMENTAL_STREAMING on by default.

In this change, we are now running all Serve tests under three conditions:

- RAY_SERVE_EXPERIMENTAL_STREAMING=0 && RAY_SERVE_ENABLE_NEW_ROUTING=0
- RAY_SERVE_EXPERIMENTAL_STREAMING=0 && RAY_SERVE_ENABLE_NEW_ROUTING=1
- RAY_SERVE_EXPERIMENTAL_STREAMING=1 && RAY_SERVE_ENABLE_NEW_ROUTING=1

(Note that RAY_SERVE_EXPERIMENTAL_STREAMING=1 implies RAY_SERVE_ENABLE_NEW_ROUTING so the latter is not explicitly set).

This required making two functionality fixes:

- Never using the streaming path for Java even if the flag is on.
- Implementing timeouts in the streaming path.

I could split these into separate PRs, but they can't really be fully tested independently, so opting to combine them here.
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
…rimental streaming flag turned on (ray-project#36261)

This is the penultimate PR for turning RAY_SERVE_EXPERIMENTAL_STREAMING on by default.

In this change, we are now running all Serve tests under three conditions:

- RAY_SERVE_EXPERIMENTAL_STREAMING=0 && RAY_SERVE_ENABLE_NEW_ROUTING=0
- RAY_SERVE_EXPERIMENTAL_STREAMING=0 && RAY_SERVE_ENABLE_NEW_ROUTING=1
- RAY_SERVE_EXPERIMENTAL_STREAMING=1 && RAY_SERVE_ENABLE_NEW_ROUTING=1

(Note that RAY_SERVE_EXPERIMENTAL_STREAMING=1 implies RAY_SERVE_ENABLE_NEW_ROUTING so the latter is not explicitly set).

This required making two functionality fixes:

- Never using the streaming path for Java even if the flag is on.
- Implementing timeouts in the streaming path.

I could split these into separate PRs, but they can't really be fully tested independently, so opting to combine them here.

Signed-off-by: e428265 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[serve][streaming] Implement timeout logic when using streaming interface
4 participants