Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize layout of function arguments in the Rust ABI - take 2 #97559

Closed
wants to merge 12 commits into from

Conversation

Urgau
Copy link
Member

@Urgau Urgau commented May 30, 2022

This PR is another attempt to solving the problem of floats being aggregated as integer, it's purpose is to supersede #94570 which was a big bulldozer and has cause some problem like #97540.

This PR is meant to address issues like #85265 (comment) where floats where being aggregated as integer causing poor codegen. It does this by only passing homogeneous aggregate of floats as array if <= ptr-size and by pointer on array if > ptr-size as an optimization.

It's also a reopening of #93564 with hopefully no regression at all.

Rust code + asm before/after

Rust code:

pub fn foo() -> Result<u64, core::ptr::NonNull<()>> {
    Ok(0)
}

ASM Before:

foo:
        mov     rax, rdi
        xorps   xmm0, xmm0
        movups  xmmword ptr [rdi], xmm0
        ret

ASM After:

foo:
	xor	eax, eax
	xor	edx, edx
	ret

cc @nbdd0121 @workingjubilee

Revert #94570
Fixes #97540

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label May 30, 2022
@rust-highfive
Copy link
Collaborator

r? @cjgillot

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 30, 2022
Comment on lines 3382 to 3385
} else if unlikely!(self.has_all_float(&arg.layout)) {
// We don't want to aggregate floats as an aggregates of Integer
// because this will hurt the generated assembly (#93490)
//
// As an optimization we want to pass homogeneous aggregate of floats
// greater than pointer size as indirect
if size > Pointer.size(self) {
arg.make_indirect();
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else if unlikely!(self.has_all_float(&arg.layout)) {
// We don't want to aggregate floats as an aggregates of Integer
// because this will hurt the generated assembly (#93490)
//
// As an optimization we want to pass homogeneous aggregate of floats
// greater than pointer size as indirect
if size > Pointer.size(self) {
arg.make_indirect();
}
} else {
} else if size > Pointer.size(self) && unlikely!(self.has_all_float(&arg.layout)) {
// We don't want to aggregate floats as an aggregates of Integer
// because this will hurt the generated assembly (#93490)
//
// As an optimization we want to pass homogeneous aggregate of floats
// greater than pointer size as indirect
arg.make_indirect();
} else {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same logic as before. Before your suggestion if it was self.has_all_float(&arg.layout) but NOT size > Pointer.size(self) nothing would happen, with this suggestion it will take the else branch doing exactly the opposite of what we want to make.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any particular reason that the current behaviour is undesirable for small aggregates with float?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, even if it's desirable that small aggregates with float has a special treatment, it should probably be a separate PR that's backed by separate perf run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well on x86_64 this would mean that f64 -> i64, f32 -> i32 and [f32; 2] -> i64 which is not ideal and is the reason of this PR, ie let the possibility for the codegen to use floating point register for small aggregate of floats or to put them behind an indirection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

f64 and f32 are not Abi::Aggregate and will not hit this path. For [f32; 2], the current status quo is to convert it to integer.

Essentially this PR is doing two things:

  1. Pass not-all-float aggregates smaller than 2 register directly
  2. Stop casting all-float aggregates smaller than 1 register to integer

I am suggesting just do 1 in this PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried your suggestion locally and only one test failed. I will push it after the perf run so we can have a baseline to know if it's even necessary or not.

Copy link
Member Author

@Urgau Urgau May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@the8472
Copy link
Member

the8472 commented May 30, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 30, 2022
@bors
Copy link
Contributor

bors commented May 30, 2022

⌛ Trying commit 428cb36d1f205f6b99b2ae33849b1957b496d7eb with merge f220c170f6f0f60621e43d2cb761f7711ae57c43...

@bors
Copy link
Contributor

bors commented May 30, 2022

☀️ Try build successful - checks-actions
Build commit: f220c170f6f0f60621e43d2cb761f7711ae57c43 (f220c170f6f0f60621e43d2cb761f7711ae57c43)

@rust-timer
Copy link
Collaborator

Queued f220c170f6f0f60621e43d2cb761f7711ae57c43 with parent 4a8d2e3, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f220c170f6f0f60621e43d2cb761f7711ae57c43): comparison url.

Instruction count

  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
0.6% 3.3% 132
Regressions 😿
(secondary)
1.7% 7.3% 76
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) 0.6% 3.3% 132

Max RSS (memory usage)

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
3.1% 3.9% 3
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-3.1% -3.1% 1
All 😿🎉 (primary) N/A N/A 0

Cycles

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
3.4% 4.4% 4
Regressions 😿
(secondary)
4.4% 7.9% 6
Improvements 🎉
(primary)
-2.5% -2.8% 2
Improvements 🎉
(secondary)
-3.6% -6.4% 15
All 😿🎉 (primary) 1.5% 4.4% 6

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels May 30, 2022
@Urgau Urgau force-pushed the fn-args-abi-optimization branch from 1d103bc to 4a6387a Compare May 31, 2022 08:47
@the8472
Copy link
Member

the8472 commented May 31, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 31, 2022
@bors
Copy link
Contributor

bors commented May 31, 2022

⌛ Trying commit 4a6387ad206cdbfda544eb8955d521547bbf8ea6 with merge 5f9cf23a74c3b0e90cc66c155ed37b74a7906093...

@bors
Copy link
Contributor

bors commented May 31, 2022

☀️ Try build successful - checks-actions
Build commit: 5f9cf23a74c3b0e90cc66c155ed37b74a7906093 (5f9cf23a74c3b0e90cc66c155ed37b74a7906093)

@rust-timer
Copy link
Collaborator

Queued 5f9cf23a74c3b0e90cc66c155ed37b74a7906093 with parent dcbd5f5, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (5f9cf23a74c3b0e90cc66c155ed37b74a7906093): comparison url.

Instruction count

  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
0.6% 3.1% 126
Regressions 😿
(secondary)
1.6% 7.3% 77
Improvements 🎉
(primary)
-0.9% -1.1% 4
Improvements 🎉
(secondary)
-1.1% -1.1% 3
All 😿🎉 (primary) 0.6% 3.1% 130

Max RSS (memory usage)

Results
  • Primary benchmarks: 😿 relevant regression found
  • Secondary benchmarks: 🎉 relevant improvement found
mean1 max count2
Regressions 😿
(primary)
1.1% 1.1% 1
Regressions 😿
(secondary)
N/A N/A 0
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-2.2% -2.2% 1
All 😿🎉 (primary) 1.1% 1.1% 1

Cycles

Results
  • Primary benchmarks: mixed results
  • Secondary benchmarks: mixed results
mean1 max count2
Regressions 😿
(primary)
2.8% 4.1% 4
Regressions 😿
(secondary)
2.7% 3.7% 3
Improvements 🎉
(primary)
-2.5% -2.5% 1
Improvements 🎉
(secondary)
-4.6% -8.1% 13
All 😿🎉 (primary) 1.7% 4.1% 5

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 31, 2022
@apiraino
Copy link
Contributor

apiraino commented Jun 30, 2022

Maybe ready for another round of review after the latest perf. run? Please switch back to the author if more work is needed, thanks!

@rustbot ready

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r? @workingjubilee

This is not an improvement in compile times, and I do not think the improvement in codegen is so clearly significant, from the examples given, as to warrant merging in spite of that. It would be sufficient if we had a sample microbenchmark this wins on in terms of runtime before/after.

@rustbot author

@rustbot rustbot removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 1, 2022
@Urgau Urgau force-pushed the fn-args-abi-optimization branch from a0ede51 to 683e13f Compare July 5, 2022 10:01
@Urgau
Copy link
Member Author

Urgau commented Jul 5, 2022

I'm surprised that regex-processing would be affected by changes to float codegen at all.

Me too, and thanks to @bjorn3 #97559 (comment) we now know that it wasn't related to float at all but a changed that I done the the max_by_val_size, which explain the regressions in regex.
Could I get another perf run, it should now measure only the cost of not aggregating floats.

@the8472
Copy link
Member

the8472 commented Jul 5, 2022

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 5, 2022
@bors
Copy link
Contributor

bors commented Jul 5, 2022

⌛ Trying commit 683e13f with merge f683963771547e2f8634ff14af3fe3926c12ebd5...

@bors
Copy link
Contributor

bors commented Jul 5, 2022

☀️ Try build successful - checks-actions
Build commit: f683963771547e2f8634ff14af3fe3926c12ebd5 (f683963771547e2f8634ff14af3fe3926c12ebd5)

@rust-timer
Copy link
Collaborator

Queued f683963771547e2f8634ff14af3fe3926c12ebd5 with parent 4045ce6, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (f683963771547e2f8634ff14af3fe3926c12ebd5): comparison url.

Instruction count

  • Primary benchmarks: 😿 relevant regressions found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
0.7% 1.0% 9
Regressions 😿
(secondary)
3.1% 9.3% 15
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
-0.2% -0.2% 1
All 😿🎉 (primary) 0.7% 1.0% 9

Max RSS (memory usage)

Results
  • Primary benchmarks: 🎉 relevant improvement found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
4.4% 5.1% 2
Improvements 🎉
(primary)
-0.0% -0.0% 1
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) -0.0% -0.0% 1

Cycles

Results
  • Primary benchmarks: no relevant changes found
  • Secondary benchmarks: 😿 relevant regressions found
mean1 max count2
Regressions 😿
(primary)
N/A N/A 0
Regressions 😿
(secondary)
3.8% 4.9% 11
Improvements 🎉
(primary)
N/A N/A 0
Improvements 🎉
(secondary)
N/A N/A 0
All 😿🎉 (primary) N/A N/A 0

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Footnotes

  1. the arithmetic mean of the percent change 2 3

  2. number of relevant changes 2 3

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jul 5, 2022
@eddyb
Copy link
Member

eddyb commented Jul 7, 2022

My issue with the current casting behaviour is that it forces everything through integer registers, which is really suboptimal when float is involved. My hope was that by using LLVM aggregates, LLVM could generate more optimal code, but obviously that's not case. I suppose another way would be having some logic that does classificiation for the Rust ABI similar to how x86_64 ABI classifies arguments.

@nbdd0121 Right, what SysV x64 ABI does is more or less the only way to reliably do this.

The whole thing with LLVM aggregates and us calling this mode "casting" obscures the fact that if you want to choose which registers something gets passed in, there's no way around looking at its contents and making a decision, nothing else can make that decision for you (though we can provide helpers for common patterns across ABIs, like the "homogeneous aggregate" one).

It would be much clearer if we were decomposing a Rust argument into LLVM scalar/vector arguments, and optionally a stack passing argument (but since Clang doesn't already do this for its ABI handling, we risk ABI incompatibilities, so we'd need a lot more testing - see also #65111).

@Mark-Simulacrum Mark-Simulacrum removed the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 31, 2022
@JohnCSimon JohnCSimon added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 11, 2022
notriddle added a commit to notriddle/rust that referenced this pull request Sep 20, 2022
…ts, r=Amanieu

Test that target feature mix up with homogeneous floats is sound

This pull-request adds a test in `src/test/abi/` that test that target feature mix up with homogeneous floats is sound.

This is basically is ripoff of [src/test/ui/simd/target-feature-mixup.rs](https://github.com/rust-lang/rust/blob/47d1cdb0bcac8e417071ce1929d261efe2399ae2/src/test/ui/simd/target-feature-mixup.rs) but for floats and without `#[repr(simd)]`.

*Extracted from rust-lang#97559 since I don't yet know what to do with that PR.*
@Urgau
Copy link
Member Author

Urgau commented Sep 25, 2022

I don't currently have the time to continue this particular PR, especially if I want to follow eddyb comment.

Closing it.

@Urgau Urgau closed this Sep 25, 2022
Manishearth added a commit to Manishearth/rust that referenced this pull request Nov 9, 2022
…ts, r=Amanieu

Test that target feature mix up with homogeneous floats is sound

This pull-request adds a test in `src/test/abi/` that test that target feature mix up with homogeneous floats is sound.

This is basically is ripoff of [src/test/ui/simd/target-feature-mixup.rs](https://github.com/rust-lang/rust/blob/47d1cdb0bcac8e417071ce1929d261efe2399ae2/src/test/ui/simd/target-feature-mixup.rs) but for floats and without `#[repr(simd)]`.

*Extracted from rust-lang#97559 since I don't yet know what to do with that PR.*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Result that uses niches resulting in a final size of 16 bytes emits poor LLVM IR due to ABI