Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release Performance Regression 2.3.0/2.3.1 #33187

Merged
merged 1 commit into from
Mar 19, 2023

Conversation

zcin
Copy link
Contributor

@zcin zcin commented Mar 10, 2023

Why are these changes needed?

Output from compare_perf_metrics:
https://docs.google.com/spreadsheets/d/1lSCLAQ7ghc4uOk80L3e9S2Rxuy7a2d3nbJRPi9D7brM/edit#gid=0

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Cindy Zhang <[email protected]>
@zcin
Copy link
Contributor Author

zcin commented Mar 10, 2023

@rickyyx @scv119 Assigned this to you guys, feel free to re-assign if necessary.

"large_object_size": 107374182400,
"large_object_time": 325.56598805499993,
"large_object_time": 490.7879660510001,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are the PRs we have cherry picked since 2.3.0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the one with very high variance on the V1 stack due to EBS. I think we should run it again to be sure that it's a regression and not just network tail latency.

#30711 (comment)

Copy link
Contributor Author

@zcin zcin Mar 10, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Synced with @rickyyx @cadedaniel, the metrics difference is believed to just be the expected variance. Kicked off a rerun of tests single_node and microbenchmark here and here so that we can be more confident in this hypothesis. @scv119

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @zcin, yeah let's retry here!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this may not be a transient issue...the rerun results are uploaded at this PR for ease of comparing.

  • New results for large_object_time are here, still a 49% regression.
  • New results for 1_n_async_actor_calls_async are here, <1%.

@scv119 scv119 merged commit 6f27610 into ray-project:master Mar 19, 2023
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request Mar 21, 2023
edoakes pushed a commit to edoakes/ray that referenced this pull request Mar 22, 2023
chaowanggg pushed a commit to chaowanggg/ray-dev that referenced this pull request Apr 4, 2023
elliottower pushed a commit to elliottower/ray that referenced this pull request Apr 22, 2023
ProjectsByJackHe pushed a commit to ProjectsByJackHe/ray that referenced this pull request May 4, 2023
@zcin zcin deleted the compare-2.3.0-2.3.1 branch September 8, 2023 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants