Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Serve] log warning to reconfigure max_ongoing_requests if max_batch_size is less than max_ongoing_requests #43840

Merged
merged 14 commits into from
Mar 14, 2024

Conversation

GeneDer
Copy link
Contributor

@GeneDer GeneDer commented Mar 8, 2024

Why are these changes needed?

Our existing request rate limiting is not factor in the batch size. When using a large max_batch_size (> max_ongoing_requests), the batch won't process the batch up to max_batch_size even when batch_wait_timeout_s is set large enough to collect requests. This PR look for each and every methods on the user code to find the max of max_batch_size on all methods and log a warning to suggest user to reconfigure max_ongoing_requests in order to reach max_batch_size limit.

Related issue number

Closes #43288

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@GeneDer GeneDer requested a review from a team March 8, 2024 23:58
Copy link
Contributor

@edoakes edoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change won't solve the issue in isolation because the sender side (router) will still respect the limit.

The solution is also a little bit hacky to implicitly change the behavior like this -- I'd rather we do this at deployment time and update max_ongoing_requests then and log a warning.

@GeneDer GeneDer changed the title [Serve] fix batching bounded issue [Serve] log warning to reconfigure max_ongoing_requests if max_batch_size is less than max_ongoing_requests Mar 11, 2024
@GeneDer
Copy link
Contributor Author

GeneDer commented Mar 12, 2024

@edoakes this is ready for another round of review. Changed to just log a warning and added a test for it


return method._get_max_batch_size()

def check_max_batch_size_bounded(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes more sense to this inside of serve.batching._BatchQueue rather than adding this method parsing hackery. As written, this currently does not work if users use the set_max_batch_size setter on the method (which can be done dynamically at runtime in the replica).

We'll need access to max_ongoing_requests there, for which I'd suggest we add deployment_config as a private attribute of ReplicaContext (and set/update it properly in replica.py for that purpose). This will likely be generally useful in the future anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh cool, didn't know there is a way for batching to get max_ongoing_requests. Thanks for the suggestion Ed!

@GeneDer
Copy link
Contributor Author

GeneDer commented Mar 13, 2024

@edoakes this is ready for review

python/ray/serve/_private/replica.py Outdated Show resolved Hide resolved
python/ray/serve/batching.py Outdated Show resolved Hide resolved
python/ray/serve/batching.py Outdated Show resolved Hide resolved
@GeneDer
Copy link
Contributor Author

GeneDer commented Mar 14, 2024

@edoakes this is ready for merge

@edoakes edoakes merged commit 0125ce8 into ray-project:master Mar 14, 2024
9 checks passed
@GeneDer GeneDer deleted the fix-batching-issue branch March 14, 2024 18:21
ryanaoleary pushed a commit to ryanaoleary/ray that referenced this pull request Jun 7, 2024
…ch_size` is less than `max_ongoing_requests` (ray-project#43840)

Our existing request rate limiting is not factor in the batch size. When using a large max_batch_size (> max_ongoing_requests), the batch won't process the batch up to max_batch_size even when batch_wait_timeout_s is set large enough to collect requests. This PR look for each and every methods on the user code to find the max of max_batch_size on all methods and log a warning to suggest user to reconfigure max_ongoing_requests in order to reach max_batch_size limit.

---------

Signed-off-by: Gene Su <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Serve] Dynamic Request Batching not sending max_batch_size
2 participants