Bug 106523 - [11 Regression] forwprop miscompile
Summary: [11 Regression] forwprop miscompile
Status: RESOLVED FIXED
Alias: None
Product: gcc
Classification: Unclassified
Component: tree-optimization (show other bugs)
Version: 13.0
: P2 normal
Target Milestone: 12.3
Assignee: Jakub Jelinek
URL:
Keywords: wrong-code
Depends on:
Blocks:
 
Reported: 2022-08-04 10:23 UTC by Krister Walfridsson
Modified: 2024-07-19 12:08 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Known to work: 12.3.0, 13.0, 4.8.5
Known to fail: 10.1.0, 11.5.0, 12.1.0, 4.9.0, 5.1.0, 7.5.0
Last reconfirmed: 2022-08-04 00:00:00


Attachments
gcc13-pr106523.patch (2.85 KB, patch)
2023-01-16 17:43 UTC, Jakub Jelinek
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Krister Walfridsson 2022-08-04 10:23:41 UTC
The function f7 from testsuite/c-c++-common/rotate-2.c is miscompiled by
forwprop. This can be seen by running the function as

__attribute__((noinline)) unsigned char
f7 (unsigned char x, unsigned int y)
{
  unsigned int t = x;
  return (t << y) | (t >> ((-y) & 7));
}

int
main (void)
{
  volatile unsigned char x = 152;
  volatile unsigned int y = 19;
  if (f7(x, y) != 4)
    __builtin_abort ();

  return 0;
}

This fails at -O1 and higher optimization levels.

What is happening here is that forwprop1 has optimized the function
to
  _10 = x_7(D) r<< y_9(D);
  return _10;
Comment 1 Richard Biener 2022-08-04 10:48:58 UTC
Confirmed.
Comment 2 Martin Liška 2022-08-09 12:40:01 UTC
Started with 4.9.0.
Comment 3 Jakub Jelinek 2023-01-16 12:27:48 UTC
Started most likely with my r0-123331-gcb3b8d33faa3a649b0282 (r0-123327 was ok, r0-123331 is bad and the only one that is about rotates).
Comment 4 Jakub Jelinek 2023-01-16 12:54:10 UTC
So, from the matched patterns described in the comment above simplify_rotate,
I'm afraid
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
aren't valid, unless we can prove that Y < B (say through ranger), because
if Y is >= B, then the left shift is valid but doesn't leave any bits from X in the return value.
If ranger can't prove Y < B, then the only supportable patterns for those cases
I'm afraid are what is accepted since PR62263 change, those patterns with Y & (B - 1)
even for the first operand.
Comment 5 Jakub Jelinek 2023-01-16 17:43:18 UTC
Created attachment 54282 [details]
gcc13-pr106523.patch

Untested fix.
Comment 6 GCC Commits 2023-01-17 11:19:05 UTC
The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:001121e8921d5d1a439ce0e64ab04c5959b0bfd8

commit r13-5223-g001121e8921d5d1a439ce0e64ab04c5959b0bfd8
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jan 17 12:14:25 2023 +0100

    forwprop: Fix up rotate pattern matching [PR106523]
    
    The comment above simplify_rotate roughly describes what patterns
    are matched into what:
       We are looking for X with unsigned type T with bitsize B, OP being
       +, | or ^, some type T2 wider than T.  For:
       (X << CNT1) OP (X >> CNT2)                           iff CNT1 + CNT2 == B
       ((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2))     iff CNT1 + CNT2 == B
    
       transform these into:
       X r<< CNT1
    
       Or for:
       (X << Y) OP (X >> (B - Y))
       (X << (int) Y) OP (X >> (int) (B - Y))
       ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
       ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
       (X << Y) | (X >> ((-Y) & (B - 1)))
       (X << (int) Y) | (X >> (int) ((-Y) & (B - 1)))
       ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    
       transform these into (last 2 only if ranger can prove Y < B):
       X r<< Y
    
       Or for:
       (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)))
       (X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1)))
       ((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) (Y & (B - 1)))) \
         | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    
       transform these into:
       X r<< (Y & (B - 1))
    
    The following testcase shows that 2 of these are problematic.
    If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one
    of the shift counts but Y on the can do something different from
    rotate.  E.g.:
    __attribute__((noipa)) unsigned char
    f7 (unsigned char x, unsigned int y)
    {
      unsigned int t = x;
      return (t << y) | (t >> ((-y) & 7));
    }
    if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U]
    then it is UB, but for y in [9, 31] the left shift in this case
    will never leave any bits in the result, while in a rotate they are
    left there.  Say for y 5 and x 0xaa the expression gives
    0x55 which is the same thing as rotate, while for y 19 and x 0xaa
    0x5, which is different.
    Now, I believe the
       ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
       ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
    forms are ok, because B - Y still needs to be a valid shift count,
    and if Y > B then B - Y should be either negative or very large
    positive (for unsigned types).
    And similarly the last 2 cases above which use & (B - 1) on both
    shift operands are definitely ok.
    
    The following patch disables the
       ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    unless ranger says Y is not in [B, B2 - 1] range.
    
    And, looking at it again this morning, actually the Y equal to B
    case is still fine, if Y is equal to 0, then it is
    (T) (((T2) X << 0) | ((T2) X >> 0))
    and so X, for Y == B it is
    (T) (((T2) X << B) | ((T2) X >> 0))
    which is the same as
    (T) (0 | ((T2) X >> 0))
    which is also X.  So instead of the [B, B2 - 1] range we could use
    [B + 1, B2 - 1].  And, if we wanted to go further, even multiplies
    of B are ok if they are smaller than B2, so we could construct a detailed
    int_range_max if we wanted.
    
    2023-01-17  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/106523
            * tree-ssa-forwprop.cc (simplify_rotate): For the
            patterns with (-Y) & (B - 1) in one operand's shift
            count and Y in another, if T2 has wider precision than T,
            punt if Y could have a value in [B, B2 - 1] range.
    
            * c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16,
            f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
            __builtin_unreachable about shift count.
            * c-c++-common/rotate-2b.c: New test.
            * c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16,
            f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
            __builtin_unreachable about shift count.
            * c-c++-common/rotate-4b.c: New test.
            * gcc.c-torture/execute/pr106523.c: New test.
Comment 7 Jakub Jelinek 2023-01-17 11:20:22 UTC
Fixed on the trunk so far.
Comment 8 Krister Walfridsson 2023-01-17 23:46:52 UTC
This fixed most of the rotate issues my translation validation tool found. I assume the remaining issues are due to a different (but similar) bug, so I opened Bug 108440 for those. 

But the issue in Bug 108440 seems similar to the "Y equal to B case" discussed in comment #6, so I believe the comment is slightly wrong (as the rotate instruction will invoke UB when Y is equal to B).
Comment 9 GCC Commits 2023-02-10 17:47:08 UTC
The releases/gcc-12 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:a558a4d3d1b488783b96dff7141d12e02ded3ad3

commit r12-9157-ga558a4d3d1b488783b96dff7141d12e02ded3ad3
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jan 17 12:14:25 2023 +0100

    forwprop: Fix up rotate pattern matching [PR106523]
    
    The comment above simplify_rotate roughly describes what patterns
    are matched into what:
       We are looking for X with unsigned type T with bitsize B, OP being
       +, | or ^, some type T2 wider than T.  For:
       (X << CNT1) OP (X >> CNT2)                           iff CNT1 + CNT2 == B
       ((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2))     iff CNT1 + CNT2 == B
    
       transform these into:
       X r<< CNT1
    
       Or for:
       (X << Y) OP (X >> (B - Y))
       (X << (int) Y) OP (X >> (int) (B - Y))
       ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
       ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
       (X << Y) | (X >> ((-Y) & (B - 1)))
       (X << (int) Y) | (X >> (int) ((-Y) & (B - 1)))
       ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    
       transform these into (last 2 only if ranger can prove Y < B):
       X r<< Y
    
       Or for:
       (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)))
       (X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1)))
       ((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) (Y & (B - 1)))) \
         | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    
       transform these into:
       X r<< (Y & (B - 1))
    
    The following testcase shows that 2 of these are problematic.
    If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one
    of the shift counts but Y on the can do something different from
    rotate.  E.g.:
    __attribute__((noipa)) unsigned char
    f7 (unsigned char x, unsigned int y)
    {
      unsigned int t = x;
      return (t << y) | (t >> ((-y) & 7));
    }
    if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U]
    then it is UB, but for y in [9, 31] the left shift in this case
    will never leave any bits in the result, while in a rotate they are
    left there.  Say for y 5 and x 0xaa the expression gives
    0x55 which is the same thing as rotate, while for y 19 and x 0xaa
    0x5, which is different.
    Now, I believe the
       ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
       ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
    forms are ok, because B - Y still needs to be a valid shift count,
    and if Y > B then B - Y should be either negative or very large
    positive (for unsigned types).
    And similarly the last 2 cases above which use & (B - 1) on both
    shift operands are definitely ok.
    
    The following patch disables the
       ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    unless ranger says Y is not in [B, B2 - 1] range.
    
    And, looking at it again this morning, actually the Y equal to B
    case is still fine, if Y is equal to 0, then it is
    (T) (((T2) X << 0) | ((T2) X >> 0))
    and so X, for Y == B it is
    (T) (((T2) X << B) | ((T2) X >> 0))
    which is the same as
    (T) (0 | ((T2) X >> 0))
    which is also X.  So instead of the [B, B2 - 1] range we could use
    [B + 1, B2 - 1].  And, if we wanted to go further, even multiplies
    of B are ok if they are smaller than B2, so we could construct a detailed
    int_range_max if we wanted.
    
    2023-01-17  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/106523
            * tree-ssa-forwprop.cc (simplify_rotate): For the
            patterns with (-Y) & (B - 1) in one operand's shift
            count and Y in another, if T2 has wider precision than T,
            punt if Y could have a value in [B, B2 - 1] range.
    
            * c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16,
            f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
            __builtin_unreachable about shift count.
            * c-c++-common/rotate-2b.c: New test.
            * c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16,
            f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
            __builtin_unreachable about shift count.
            * c-c++-common/rotate-4b.c: New test.
            * gcc.c-torture/execute/pr106523.c: New test.
    
    (cherry picked from commit 001121e8921d5d1a439ce0e64ab04c5959b0bfd8)
Comment 10 Jakub Jelinek 2023-02-10 17:56:23 UTC
Fixed for gcc 12.3 too.
Comment 11 Richard Biener 2023-07-07 10:43:46 UTC
GCC 10 branch is being closed.
Comment 12 Richard Biener 2024-07-19 12:08:12 UTC
Fixed in GCC 12.3