106523 – [11 Regression] forwprop miscompile

Bug 106523 - [11 Regression] forwprop miscompile

Summary: [11 Regression] forwprop miscompile

Status:	RESOLVED FIXED

Alias:	None

Product:	gcc
Classification:	Unclassified
Component:	tree-optimization (show other bugs)
Version:	13.0

Importance:	P2 normal
Target Milestone:	12.3
Assignee:	Jakub Jelinek

URL:
Keywords:	wrong-code

Depends on:
Blocks:

Reported:	2022-08-04 10:23 UTC by Krister Walfridsson
Modified:	2024-07-19 12:08 UTC (History)
CC List:	5 users (show)

See Also:	108440
Host:
Target:
Build:
Known to work:	12.3.0, 13.0, 4.8.5
Known to fail:	10.1.0, 11.5.0, 12.1.0, 4.9.0, 5.1.0, 7.5.0
Last reconfirmed:	2022-08-04 00:00:00

Attachments
gcc13-pr106523.patch (2.85 KB, patch) 2023-01-16 17:43 UTC, Jakub Jelinek	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Krister Walfridsson 2022-08-04 10:23:41 UTC

The function f7 from testsuite/c-c++-common/rotate-2.c is miscompiled by
forwprop. This can be seen by running the function as

__attribute__((noinline)) unsigned char
f7 (unsigned char x, unsigned int y)
{
  unsigned int t = x;
  return (t << y) | (t >> ((-y) & 7));
}

int
main (void)
{
  volatile unsigned char x = 152;
  volatile unsigned int y = 19;
  if (f7(x, y) != 4)
    __builtin_abort ();

  return 0;
}

This fails at -O1 and higher optimization levels.

What is happening here is that forwprop1 has optimized the function
to
  _10 = x_7(D) r<< y_9(D);
  return _10;

Comment 1 Richard Biener 2022-08-04 10:48:58 UTC

Confirmed.

Comment 2 Martin Liška 2022-08-09 12:40:01 UTC

Started with 4.9.0.

Comment 3 Jakub Jelinek 2023-01-16 12:27:48 UTC

Started most likely with my r0-123331-gcb3b8d33faa3a649b0282 (r0-123327 was ok, r0-123331 is bad and the only one that is about rotates).

Comment 4 Jakub Jelinek 2023-01-16 12:54:10 UTC

So, from the matched patterns described in the comment above simplify_rotate,
I'm afraid
   ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
   ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
aren't valid, unless we can prove that Y < B (say through ranger), because
if Y is >= B, then the left shift is valid but doesn't leave any bits from X in the return value.
If ranger can't prove Y < B, then the only supportable patterns for those cases
I'm afraid are what is accepted since PR62263 change, those patterns with Y & (B - 1)
even for the first operand.

Comment 5 Jakub Jelinek 2023-01-16 17:43:18 UTC

Created attachment 54282 [details]
gcc13-pr106523.patch

Untested fix.

Comment 6 GCC Commits 2023-01-17 11:19:05 UTC

The master branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:001121e8921d5d1a439ce0e64ab04c5959b0bfd8

commit r13-5223-g001121e8921d5d1a439ce0e64ab04c5959b0bfd8
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jan 17 12:14:25 2023 +0100

    forwprop: Fix up rotate pattern matching [PR106523]
    
    The comment above simplify_rotate roughly describes what patterns
    are matched into what:
       We are looking for X with unsigned type T with bitsize B, OP being
       +, | or ^, some type T2 wider than T.  For:
       (X << CNT1) OP (X >> CNT2)                           iff CNT1 + CNT2 == B
       ((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2))     iff CNT1 + CNT2 == B
    
       transform these into:
       X r<< CNT1
    
       Or for:
       (X << Y) OP (X >> (B - Y))
       (X << (int) Y) OP (X >> (int) (B - Y))
       ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
       ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
       (X << Y) | (X >> ((-Y) & (B - 1)))
       (X << (int) Y) | (X >> (int) ((-Y) & (B - 1)))
       ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    
       transform these into (last 2 only if ranger can prove Y < B):
       X r<< Y
    
       Or for:
       (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)))
       (X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1)))
       ((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) (Y & (B - 1)))) \
         | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    
       transform these into:
       X r<< (Y & (B - 1))
    
    The following testcase shows that 2 of these are problematic.
    If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one
    of the shift counts but Y on the can do something different from
    rotate.  E.g.:
    __attribute__((noipa)) unsigned char
    f7 (unsigned char x, unsigned int y)
    {
      unsigned int t = x;
      return (t << y) | (t >> ((-y) & 7));
    }
    if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U]
    then it is UB, but for y in [9, 31] the left shift in this case
    will never leave any bits in the result, while in a rotate they are
    left there.  Say for y 5 and x 0xaa the expression gives
    0x55 which is the same thing as rotate, while for y 19 and x 0xaa
    0x5, which is different.
    Now, I believe the
       ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
       ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
    forms are ok, because B - Y still needs to be a valid shift count,
    and if Y > B then B - Y should be either negative or very large
    positive (for unsigned types).
    And similarly the last 2 cases above which use & (B - 1) on both
    shift operands are definitely ok.
    
    The following patch disables the
       ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    unless ranger says Y is not in [B, B2 - 1] range.
    
    And, looking at it again this morning, actually the Y equal to B
    case is still fine, if Y is equal to 0, then it is
    (T) (((T2) X << 0) | ((T2) X >> 0))
    and so X, for Y == B it is
    (T) (((T2) X << B) | ((T2) X >> 0))
    which is the same as
    (T) (0 | ((T2) X >> 0))
    which is also X.  So instead of the [B, B2 - 1] range we could use
    [B + 1, B2 - 1].  And, if we wanted to go further, even multiplies
    of B are ok if they are smaller than B2, so we could construct a detailed
    int_range_max if we wanted.
    
    2023-01-17  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/106523
            * tree-ssa-forwprop.cc (simplify_rotate): For the
            patterns with (-Y) & (B - 1) in one operand's shift
            count and Y in another, if T2 has wider precision than T,
            punt if Y could have a value in [B, B2 - 1] range.
    
            * c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16,
            f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
            __builtin_unreachable about shift count.
            * c-c++-common/rotate-2b.c: New test.
            * c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16,
            f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
            __builtin_unreachable about shift count.
            * c-c++-common/rotate-4b.c: New test.
            * gcc.c-torture/execute/pr106523.c: New test.

Comment 7 Jakub Jelinek 2023-01-17 11:20:22 UTC

Fixed on the trunk so far.

Comment 8 Krister Walfridsson 2023-01-17 23:46:52 UTC

This fixed most of the rotate issues my translation validation tool found. I assume the remaining issues are due to a different (but similar) bug, so I opened Bug 108440 for those. 

But the issue in Bug 108440 seems similar to the "Y equal to B case" discussed in comment #6, so I believe the comment is slightly wrong (as the rotate instruction will invoke UB when Y is equal to B).

Comment 9 GCC Commits 2023-02-10 17:47:08 UTC

The releases/gcc-12 branch has been updated by Jakub Jelinek <jakub@gcc.gnu.org>:

https://gcc.gnu.org/g:a558a4d3d1b488783b96dff7141d12e02ded3ad3

commit r12-9157-ga558a4d3d1b488783b96dff7141d12e02ded3ad3
Author: Jakub Jelinek <jakub@redhat.com>
Date:   Tue Jan 17 12:14:25 2023 +0100

    forwprop: Fix up rotate pattern matching [PR106523]
    
    The comment above simplify_rotate roughly describes what patterns
    are matched into what:
       We are looking for X with unsigned type T with bitsize B, OP being
       +, | or ^, some type T2 wider than T.  For:
       (X << CNT1) OP (X >> CNT2)                           iff CNT1 + CNT2 == B
       ((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2))     iff CNT1 + CNT2 == B
    
       transform these into:
       X r<< CNT1
    
       Or for:
       (X << Y) OP (X >> (B - Y))
       (X << (int) Y) OP (X >> (int) (B - Y))
       ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
       ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
       (X << Y) | (X >> ((-Y) & (B - 1)))
       (X << (int) Y) | (X >> (int) ((-Y) & (B - 1)))
       ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    
       transform these into (last 2 only if ranger can prove Y < B):
       X r<< Y
    
       Or for:
       (X << (Y & (B - 1))) | (X >> ((-Y) & (B - 1)))
       (X << (int) (Y & (B - 1))) | (X >> (int) ((-Y) & (B - 1)))
       ((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) (Y & (B - 1)))) \
         | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    
       transform these into:
       X r<< (Y & (B - 1))
    
    The following testcase shows that 2 of these are problematic.
    If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one
    of the shift counts but Y on the can do something different from
    rotate.  E.g.:
    __attribute__((noipa)) unsigned char
    f7 (unsigned char x, unsigned int y)
    {
      unsigned int t = x;
      return (t << y) | (t >> ((-y) & 7));
    }
    if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U]
    then it is UB, but for y in [9, 31] the left shift in this case
    will never leave any bits in the result, while in a rotate they are
    left there.  Say for y 5 and x 0xaa the expression gives
    0x55 which is the same thing as rotate, while for y 19 and x 0xaa
    0x5, which is different.
    Now, I believe the
       ((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
       ((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
    forms are ok, because B - Y still needs to be a valid shift count,
    and if Y > B then B - Y should be either negative or very large
    positive (for unsigned types).
    And similarly the last 2 cases above which use & (B - 1) on both
    shift operands are definitely ok.
    
    The following patch disables the
       ((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
       ((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
    unless ranger says Y is not in [B, B2 - 1] range.
    
    And, looking at it again this morning, actually the Y equal to B
    case is still fine, if Y is equal to 0, then it is
    (T) (((T2) X << 0) | ((T2) X >> 0))
    and so X, for Y == B it is
    (T) (((T2) X << B) | ((T2) X >> 0))
    which is the same as
    (T) (0 | ((T2) X >> 0))
    which is also X.  So instead of the [B, B2 - 1] range we could use
    [B + 1, B2 - 1].  And, if we wanted to go further, even multiplies
    of B are ok if they are smaller than B2, so we could construct a detailed
    int_range_max if we wanted.
    
    2023-01-17  Jakub Jelinek  <jakub@redhat.com>
    
            PR tree-optimization/106523
            * tree-ssa-forwprop.cc (simplify_rotate): For the
            patterns with (-Y) & (B - 1) in one operand's shift
            count and Y in another, if T2 has wider precision than T,
            punt if Y could have a value in [B, B2 - 1] range.
    
            * c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16,
            f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
            __builtin_unreachable about shift count.
            * c-c++-common/rotate-2b.c: New test.
            * c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16,
            f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
            __builtin_unreachable about shift count.
            * c-c++-common/rotate-4b.c: New test.
            * gcc.c-torture/execute/pr106523.c: New test.
    
    (cherry picked from commit 001121e8921d5d1a439ce0e64ab04c5959b0bfd8)

Comment 10 Jakub Jelinek 2023-02-10 17:56:23 UTC

Fixed for gcc 12.3 too.

Comment 11 Richard Biener 2023-07-07 10:43:46 UTC

GCC 10 branch is being closed.

Comment 12 Richard Biener 2024-07-19 12:08:12 UTC

Fixed in GCC 12.3