-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #7868 #7883
Fix #7868 #7883
Conversation
Also turns out this doesn't actually fix the issue in all cases. sigh |
Actually does anybody know what the hardware does here? |
Ok, this version should be correct. It was pointed out to me that there is no way to build a compliant fneq out of arithmetic operations and logical ones should be used instead. @JeffBezanson please have a look and merge. |
I am very worried that throwing in a bit cast and integer operation for such a simple floating-point op might mess with the vectorizer or otherwise hurt performance. IEEE says the sign bit of a NaN is only specified for copy, negate, abs, and copySign. After the change to the constant folder for |
Because I don't actually think that's what the hardware does. |
Do you mean the hardware |
That's my impression, yes. I couldn't actually find a reference on what the hardware does, but from my initial tests last night it does not seem to match IEEE negate. |
Also, note that I'm referring to the hardware subtraction instruction here. With the proper optimizations, LLVM will turn it into the vxorps, which does behave like IEEE negate. |
That's not surprising; the hardware is correct there. Subtraction is of course a different operation. Does this code, with integer bitwise ops, also get optimized to vxorps? |
No, it does not unfortunately. I was planning to address that in LLVM though. |
The high-level argument why LLVM's behavior is incorrect here is that this is supposed to be a pure optimization and have no user-visible effect. Producing different bits is highly user visible. It doesn't really matter that the values involved are NaNs. By the IEEE standard, they can for many of these operations return any NaN they want – but it should be the same NaN regardless of whether constant folding is done or not. The fact that the hardware instruction flips the sign bit in the optimized version indicates to me that LLVM really ought to be flipping the sign in both versions. |
Well, technically the sign bit is undefined by IEEE if either input or output are NaNs so LLVM thinks it's fine. We all know how much LLVM likes to exploit undefined behavior. |
I think it's fine for an optimization to change a result that's undefined. Without that flexibility many low-level optimizations would be way too difficult or blocked entirely. |
Yes, I agree, It's just too bad that there's really no way implement negate other than bitcasting. |
I tried a C test program, and clang emits
for negation. Maybe that |
No, I doubt it. I'm pretty sure with optimizations disabled, LLVM will always emit the hardware subtraction. |
Looks like I was wrong. The |
This looks like it works reliably now. |
Awesome. Thank goodness we can take advantage of hacks they need to make clang work :) Does the -0.0 trick really not work on LLVM 3.3? Was clang just buggy then? |
@JeffBezanson The constant folder definitely doesn't. This was only fixed on Jun 8: llvm-mirror/llvm@bef256f. |
Ok, then this will have to do. |
It was pointed out to me that the reason fsub -0.0, x works, but fsub 0.0, x doesn't is that the latter does not negate if x==+0.0 |
Somewhere in the LLVM 3.4 timeframe somebody decided that the constant folder should always return positive NaNs. Anyway, a couple of months somebody finally realized that you can't implement IEEE negate if your Constant Folder doesn't produce negative NaNs, so the constant folder for fsub was made to preserve sign bits. This is the resulting mess 😢 Also IEEE floating point experts, is there anything in there (other than the negate argument), that would let me argue that -1.0*NaN should be -NaN rather than +NaN?