Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release] Add release logs for 2.9.0 commit 15a558e #41992

Merged
merged 1 commit into from
Dec 19, 2023

Conversation

architkulkarni
Copy link
Contributor

Adds performance logs for Ray 2.9.0. Taken from the tests at https://buildkite.com/ray-project/release-tests-branch/builds?branch=releases%2F2.9.0 for commit 15a558e by running python fetch_release_logs.py 2.9.0.

Below I have included the result of running the regression script. From the release instructions:

This script will catch regressions in perf_metrics, you still need to manually check other metrics (e.g. _peak_memory)

I have not done the "manual check" and will leave that to the reviewers of this PR.

(base) architkulkarni@archit-Q4WXGF2WQY release_logs % python compare_perf_metrics 2.8.0 2.9.0                        
REGRESSION 21.61%: actors_per_second (THROUGHPUT) regresses from 753.4446893211699 to 590.5931553046038 (21.61%) in 2.9.0/benchmarks/many_actors.json
REGRESSION 14.94%: multi_client_put_gigabytes (THROUGHPUT) regresses from 36.34816401372876 to 30.918984626602807 (14.94%) in 2.9.0/microbenchmark.json
REGRESSION 13.26%: 1_n_async_actor_calls_async (THROUGHPUT) regresses from 8601.993472120319 to 7460.962715134404 (13.26%) in 2.9.0/microbenchmark.json
REGRESSION 13.10%: single_client_tasks_sync (THROUGHPUT) regresses from 1161.670131632561 to 1009.4349525282154 (13.10%) in 2.9.0/microbenchmark.json
REGRESSION 11.34%: n_n_actor_calls_async (THROUGHPUT) regresses from 30108.565209428394 to 26694.138600078164 (11.34%) in 2.9.0/microbenchmark.json
REGRESSION 10.64%: multi_client_tasks_async (THROUGHPUT) regresses from 27211.51041454346 to 24316.337428119852 (10.64%) in 2.9.0/microbenchmark.json
REGRESSION 10.02%: 1_n_actor_calls_async (THROUGHPUT) regresses from 9581.728569086026 to 8622.116661460657 (10.02%) in 2.9.0/microbenchmark.json
REGRESSION 9.21%: 1_1_async_actor_calls_sync (THROUGHPUT) regresses from 1377.3257452550822 to 1250.487251391533 (9.21%) in 2.9.0/microbenchmark.json
REGRESSION 8.67%: placement_group_create/removal (THROUGHPUT) regresses from 926.0840791839338 to 845.7511547073977 (8.67%) in 2.9.0/microbenchmark.json
REGRESSION 6.25%: 1_1_actor_calls_sync (THROUGHPUT) regresses from 2213.6033025230176 to 2075.2443816745968 (6.25%) in 2.9.0/microbenchmark.json
REGRESSION 6.12%: n_n_actor_calls_with_arg_async (THROUGHPUT) regresses from 2895.292478069285 to 2718.2145554952413 (6.12%) in 2.9.0/microbenchmark.json
REGRESSION 5.79%: client__put_gigabytes (THROUGHPUT) regresses from 0.12401864230452364 to 0.1168388142260294 (5.79%) in 2.9.0/microbenchmark.json
REGRESSION 5.62%: client__put_calls (THROUGHPUT) regresses from 856.533614603169 to 808.3571852957423 (5.62%) in 2.9.0/microbenchmark.json
REGRESSION 4.94%: n_n_async_actor_calls_async (THROUGHPUT) regresses from 24290.541801601616 to 23089.526825423094 (4.94%) in 2.9.0/microbenchmark.json
REGRESSION 3.77%: client__get_calls (THROUGHPUT) regresses from 1164.1583807193044 to 1120.242286739544 (3.77%) in 2.9.0/microbenchmark.json
REGRESSION 3.26%: single_client_get_object_containing_10k_refs (THROUGHPUT) regresses from 13.55352518200595 to 13.112230033151658 (3.26%) in 2.9.0/microbenchmark.json
REGRESSION 3.24%: single_client_tasks_and_get_batch (THROUGHPUT) regresses from 8.7124898510668 to 8.429852592930626 (3.24%) in 2.9.0/microbenchmark.json
REGRESSION 3.09%: client__1_1_actor_calls_concurrent (THROUGHPUT) regresses from 1038.8711159440322 to 1006.7547148607874 (3.09%) in 2.9.0/microbenchmark.json
REGRESSION 2.32%: single_client_tasks_async (THROUGHPUT) regresses from 8643.833466025399 to 8443.260998630982 (2.32%) in 2.9.0/microbenchmark.json
REGRESSION 2.29%: single_client_put_calls_Plasma_Store (THROUGHPUT) regresses from 5697.447666436941 to 5567.259268000422 (2.29%) in 2.9.0/microbenchmark.json
REGRESSION 2.27%: client__1_1_actor_calls_async (THROUGHPUT) regresses from 1036.0321459583472 to 1012.4837493368098 (2.27%) in 2.9.0/microbenchmark.json
REGRESSION 0.89%: client__1_1_actor_calls_sync (THROUGHPUT) regresses from 535.3383020010909 to 530.5597986550025 (0.89%) in 2.9.0/microbenchmark.json
REGRESSION 65.87%: stage_0_time (LATENCY) regresses from 7.927043914794922 to 13.148497581481934 (65.87%) in 2.9.0/stress_tests/stress_test_many_tasks.json
REGRESSION 44.84%: dashboard_p99_latency_ms (LATENCY) regresses from 3088.301 to 4473.111 (44.84%) in 2.9.0/benchmarks/many_actors.json
REGRESSION 15.81%: avg_pg_remove_time_ms (LATENCY) regresses from 0.7885501576572757 to 0.913254288288353 (15.81%) in 2.9.0/stress_tests/stress_test_placement_group.json
REGRESSION 15.50%: time_to_broadcast_1073741824_bytes_to_50_nodes (LATENCY) regresses from 82.940892212 to 95.796644017 (15.50%) in 2.9.0/scalability/object_store.json
REGRESSION 14.83%: dashboard_p95_latency_ms (LATENCY) regresses from 2237.99 to 2569.856 (14.83%) in 2.9.0/benchmarks/many_actors.json
REGRESSION 10.19%: stage_3_time (LATENCY) regresses from 2943.001654624939 to 3242.995056629181 (10.19%) in 2.9.0/stress_tests/stress_test_many_tasks.json
REGRESSION 9.51%: stage_3_creation_time (LATENCY) regresses from 2.260662794113159 to 2.475653648376465 (9.51%) in 2.9.0/stress_tests/stress_test_many_tasks.json
REGRESSION 3.30%: 3000_returns_time (LATENCY) regresses from 5.899374322999989 to 6.094248331000003 (3.30%) in 2.9.0/scalability/single_node.json
REGRESSION 2.08%: avg_pg_create_time_ms (LATENCY) regresses from 0.8868904699705661 to 0.9053212447438167 (2.08%) in 2.9.0/stress_tests/stress_test_placement_group.json
REGRESSION 0.86%: 10000_args_time (LATENCY) regresses from 17.66019733799999 to 17.811292093000006 (0.86%) in 2.9.0/scalability/single_node.json

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@architkulkarni architkulkarni added release-blocker P0 Issue that blocks the release P0 Issues that should be fixed in short order labels Dec 18, 2023
@rickyyx
Copy link
Contributor

rickyyx commented Dec 18, 2023

multi_client_put_gigabytes is variance.
image

n_n_actor_calls_with_arg_async variance
image

dashboard_p99_latency_ms variance
image

time_to_broadcast_1073741824_bytes_to_50_nodes seems variance - rerunning.
image

@rickyyx
Copy link
Contributor

rickyyx commented Dec 18, 2023

1_n_async_actor_calls_async due to GRPC upgrade at 10.31 -> no fix
image

n_n_actor_calls_async also grpc

image

1_n_actor_calls_async same
image

Same for

  • 1_1_async_actor_calls_sync
  • placement_group_create/removal
  • 1_1_actor_calls_sync
  • stage_3_time

@rickyyx
Copy link
Contributor

rickyyx commented Dec 18, 2023

single_client_tasks_sync also due to grpc upgrade (the initial drop from 1.2k has been fixed in #41695) but the grpc regression wasn't fixed yet.

image

multi_client_tasks_async same story (there are 2 drops, one fixed, another due to grpc)
image

@architkulkarni
Copy link
Contributor Author

FYI, we are still awaiting one last cherry-pick PR: #41990

@raulchen @rickyyx @jjyao do you think we should rerun these performance metrics after that PR is picked?

@rickyyx
Copy link
Contributor

rickyyx commented Dec 18, 2023

FYI, we are still awaiting one last cherry-pick PR: #41990

@raulchen @rickyyx @jjyao do you think we should rerun these performance metrics after that PR is picked?

Shouldn't impact core metrics I think

@architkulkarni
Copy link
Contributor Author

@rickyyx Gotcha, thanks!

Thanks for the details about the regressions! Is the conclusion that there's no release-blocking regression? If so you can approve this PR (we need two independent approvals to proceed with the release)

@rickyyx
Copy link
Contributor

rickyyx commented Dec 18, 2023

@rickyyx Gotcha, thanks!

Thanks for the details about the regressions! Is the conclusion that there's no release-blocking regression? If so you can approve this PR (we need two independent approvals to proceed with the release)

@jjyao and I looked through them together, and we think there are 2 we wanted to run to verify if they are variance merely in the release branch. Will update once that's cleared.

@architkulkarni
Copy link
Contributor Author

Sounds good, thanks for your diligence

@jjyao
Copy link
Collaborator

jjyao commented Dec 18, 2023

Ran time_to_broadcast_1073741824_bytes_to_50_nodes again (https://buildkite.com/ray-project/release/builds/4509):


broadcast_time = 64.09479726399996
--
  | object_size = 1073741824
  | num_nodes = 50
  | success = 1
  | perf_metrics = [{'perf_metric_name': 'time_to_broadcast_1073741824_bytes_to_50_nodes', 'perf_metric_value': 64.09479726399996, 'perf_metric_type': 'LATENCY'}]

So it's noise.

@raulchen
Copy link
Contributor

FYI, we are still awaiting one last cherry-pick PR: #41990

@raulchen @rickyyx @jjyao do you think we should rerun these performance metrics after that PR is picked?

I submitted another PR to fix the issue instead #42000. This PR only touches data code. Shouldn't impact the core metrics you listed.

@jjyao
Copy link
Collaborator

jjyao commented Dec 18, 2023

Ran actors_per_second again (https://buildkite.com/ray-project/release/builds/4506#018c7e60-0aed-48e3-9eed-825cbbc5566e):

actors_per_second = 614.2315272090922

Still slower than master. There might be a real regression in release branch.

@jjyao
Copy link
Collaborator

jjyao commented Dec 18, 2023

Another run of actors_per_second: https://buildkite.com/ray-project/release/builds/4527#018c7ef7-7ef3-4d50-85c2-cd63ba1b071e

actors_per_second = 652.0240412474651

So it's noise.

@jjyao jjyao merged commit 04f024a into ray-project:master Dec 19, 2023
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P0 Issues that should be fixed in short order release-blocker P0 Issue that blocks the release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants