Overflow checks don't get optimized out surprisingly #103132

YaLTeR · 2022-10-17T02:45:41Z

I tried this code:

pub fn accumulate(arr: &[u8], weight: f32) {
    let weight = (weight * 256. * 256. * 256.) as u32;
    let weight = weight.min(256 * 256 * 256);

    // If you uncomment this, all examples will get optimized out fine on all versions.
    // let weight = weight.min(256 * 256 * 256 - 1);

    for x in arr {
        // This avoids overflow checks on 1.64.0 (but not on beta/nightly).
        // let result = f(*x, weight);

        // This inlined version has overflow checks.
        assert!(weight <= 256 * 256 * 256);
        let result = *x as u32 * weight;

        // This also has overflow checks for some reason.
        // let result = f2(*x as u32, weight);
    }
}

#[inline(always)]
fn f(x: u8, weight: u32) -> u32 {
    assert!(weight <= 256 * 256 * 256);
    x as u32 * weight
}

#[inline(always)]
fn f2(x: u32, weight: u32) -> u32 {
    assert!(weight <= 256 * 256 * 256);
    x * weight
}

https://godbolt.org/z/9c8qc88qo

I expected to see this happen: all three versions of the loop body have no overflow checks and are optimized out.

Instead, this happened: let result = f(*x, weight); has no overflow checks whereas the two other versions do have overflow checks on 1.64.0 and below, and on beta/nightly all versions have overflow checks. Also, uncommenting line #6, which makes weight bound one smaller, makes all three versions have no overflow checks on both 1.64.0 and nightly, which is extra weird because the bounds are satisfied with either value.

The text was updated successfully, but these errors were encountered:

nikic · 2022-11-04T15:54:42Z

IR test case:

define void @test(ptr %arr.base, i64 %arr.size, i32 %arg) {
start:
  %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216)
  %end = getelementptr inbounds i8, ptr %arr.base, i64 %arr.size
  br label %loop

loop:
  %ptr = phi ptr [ %arr.base, %start ], [ %ptr.next, %loop.latch ]
  %cmp = icmp eq ptr %ptr, %end
  br i1 %cmp, label %exit, label %loop.latch

loop.latch:
  %ptr.next = getelementptr inbounds i8, ptr %ptr, i64 1
  %v = load i8, ptr %ptr, align 1
  %v.ext = zext i8 %v to i32
  %mulo = tail call { i32, i1 } @llvm.umul.with.overflow.i32(i32 %v.ext, i32 %arg.min)
  %ov = extractvalue { i32, i1 } %mulo, 1
  br i1 %ov, label %panic, label %loop

exit:
  ret void

panic:
  call void @panic()
  unreachable
}

declare void @panic()
declare { i32, i1 } @llvm.umul.with.overflow.i32(i32, i32)
declare i32 @llvm.umin.i32(i32, i32)

I'd generally expect CVP to handle this, but:

LVI Getting block end value   %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216) at 'loop.latch'
PUSH:   %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216) in loop.latch
PUSH:   %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216) in loop
 compute BB 'loop' - overdefined because of pred (non local).
POP   %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216) in loop = overdefined
 compute BB 'loop.latch' - overdefined because of pred (non local).
POP   %arg.min = tail call i32 @llvm.umin.i32(i32 %arg, i32 16777216) in loop.latch = overdefined
  Result = overdefined

Apparently, LVI fails to determine the range of a value that is used inside a loop. Sinking the umin into loop.latch does fold.

There's two possible fixes here. One would be to support this in LVI. The other would be to add with.overflow support to SCCP.

nikic · 2022-11-09T14:45:34Z

Upstream patch: https://reviews.llvm.org/D137713

nikic · 2023-04-03T12:23:48Z

Fixed by the LLVM 16 upgrade.

Add codegen tests for issues fixed by LLVM 16 Fixes rust-lang#75978. Fixes rust-lang#99960. Fixes rust-lang#101048. Fixes rust-lang#101082. Fixes rust-lang#101814. Fixes rust-lang#103132. Fixes rust-lang#103327.

YaLTeR added the C-bug Category: This is a bug. label Oct 17, 2022

nikic added the A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. label Oct 17, 2022

inquisitivecrystal added I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 17, 2022

nikic self-assigned this Nov 9, 2022

nikic added the E-needs-test Call for participation: An issue has been fixed and does not reproduce, but no test has been added. label Apr 3, 2023

nikic mentioned this issue Apr 3, 2023

Add codegen tests for issues fixed by LLVM 16 #109895

Merged

bors closed this as completed in 73f40d4 Apr 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overflow checks don't get optimized out surprisingly #103132

Overflow checks don't get optimized out surprisingly #103132

YaLTeR commented Oct 17, 2022

nikic commented Nov 4, 2022

nikic commented Nov 9, 2022

nikic commented Apr 3, 2023

Overflow checks don't get optimized out surprisingly #103132

Overflow checks don't get optimized out surprisingly #103132

Comments

YaLTeR commented Oct 17, 2022

nikic commented Nov 4, 2022

nikic commented Nov 9, 2022

nikic commented Apr 3, 2023