Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overflow checks don't get optimized out surprisingly #103132

Closed
YaLTeR opened this issue Oct 17, 2022 · 3 comments · Fixed by #109895
Closed

Overflow checks don't get optimized out surprisingly #103132

YaLTeR opened this issue Oct 17, 2022 · 3 comments · Fixed by #109895
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@YaLTeR
Copy link
Contributor

YaLTeR commented Oct 17, 2022

I tried this code:

pub fn accumulate(arr: &[u8], weight: f32) {
    let weight = (weight * 256. * 256. * 256.) as u32;
    let weight = weight.min(256 * 256 * 256);

    // If you uncomment this, all examples will get optimized out fine on all versions.
    // let weight = weight.min(256 * 256 * 256 - 1);

    for x in arr {
        // This avoids overflow checks on 1.64.0 (but not on beta/nightly).
        // let result = f(*x, weight);

        // This inlined version has overflow checks.
        assert!(weight <= 256 * 256 * 256);
        let result = *x as u32 * weight;

        // This also has overflow checks for some reason.
        // let result = f2(*x as u32, weight);
    }
}

#[inline(always)]
fn f(x: u8, weight: u32) -> u32 {
    assert!(weight <= 256 * 256 * 256);
    x as u32 * weight
}

#[inline(always)]
fn f2(x: u32, weight: u32) -> u32 {
    assert!(weight <= 256 * 256 * 256);
    x * weight
}

https://godbolt.org/z/9c8qc88qo

I expected to see this happen: all three versions of the loop body have no overflow checks and are optimized out.

Instead, this happened: let result = f(*x, weight); has no overflow checks whereas the two other versions do have overflow checks on 1.64.0 and below, and on beta/nightly all versions have overflow checks. Also, uncommenting line #6, which makes weight bound one smaller, makes all three versions have no overflow checks on both 1.64.0 and nightly, which is extra weird because the bounds are satisfied with either value.

@YaLTeR YaLTeR added the C-bug Category: This is a bug. label Oct 17, 2022
@nikic nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Oct 17, 2022
@inquisitivecrystal inquisitivecrystal added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 17, 2022
@nikic
Copy link
Contributor

nikic commented Nov 4, 2022

IR test case:

define void @test(ptr %arr.base, i64 %arr.size, i32 %arg) {
start:
  %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216)
  %end = getelementptr inbounds i8, ptr %arr.base, i64 %arr.size
  br label %loop

loop:
  %ptr = phi ptr [ %arr.base, %start ], [ %ptr.next, %loop.latch ]
  %cmp = icmp eq ptr %ptr, %end
  br i1 %cmp, label %exit, label %loop.latch

loop.latch:
  %ptr.next = getelementptr inbounds i8, ptr %ptr, i64 1
  %v = load i8, ptr %ptr, align 1
  %v.ext = zext i8 %v to i32
  %mulo = tail call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %v.ext, i32 %arg.min)
  %ov = extractvalue { i32, i1 } %mulo, 1
  br i1 %ov, label %panic, label %loop

exit:
  ret void

panic:
  call void @panic()
  unreachable
}

declare void @panic()
declare { i32, i1 } @llvm.umul.with.overflow.i32(i32, i32)
declare i32 @llvm.umin.i32(i32, i32)

I'd generally expect CVP to handle this, but:

LVI Getting block end value   %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216) at 'loop.latch'
PUSH:   %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216) in loop.latch
PUSH:   %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216) in loop
 compute BB 'loop' - overdefined because of pred (non local).
POP   %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216) in loop = overdefined
 compute BB 'loop.latch' - overdefined because of pred (non local).
POP   %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216) in loop.latch = overdefined
  Result = overdefined

Apparently, LVI fails to determine the range of a value that is used inside a loop. Sinking the umin into loop.latch does fold.

There's two possible fixes here. One would be to support this in LVI. The other would be to add with.overflow support to SCCP.

@nikic
Copy link
Contributor

nikic commented Nov 9, 2022

Upstream patch: https://reviews.llvm.org/D137713

@nikic nikic self-assigned this Nov 9, 2022
@nikic nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Apr 3, 2023
@nikic
Copy link
Contributor

nikic commented Apr 3, 2023

Fixed by the LLVM 16 upgrade.

JohnTitor added a commit to JohnTitor/rust that referenced this issue Apr 11, 2023
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 12, 2023
@bors bors closed this as completed in 73f40d4 Apr 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants