Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #7868 #7883

Merged
merged 1 commit into from
Aug 7, 2014
Merged

Fix #7868 #7883

merged 1 commit into from
Aug 7, 2014

Conversation

Keno
Copy link
Member

@Keno Keno commented Aug 7, 2014

Somewhere in the LLVM 3.4 timeframe somebody decided that the constant folder should always return positive NaNs. Anyway, a couple of months somebody finally realized that you can't implement IEEE negate if your Constant Folder doesn't produce negative NaNs, so the constant folder for fsub was made to preserve sign bits. This is the resulting mess 😢 Also IEEE floating point experts, is there anything in there (other than the negate argument), that would let me argue that -1.0*NaN should be -NaN rather than +NaN?

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

Also turns out this doesn't actually fix the issue in all cases. sigh

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

Actually does anybody know what the hardware does here?

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

Ok, this version should be correct. It was pointed out to me that there is no way to build a compliant fneq out of arithmetic operations and logical ones should be used instead. @JeffBezanson please have a look and merge.

@JeffBezanson
Copy link
Sponsor Member

I am very worried that throwing in a bit cast and integer operation for such a simple floating-point op might mess with the vectorizer or otherwise hurt performance.

IEEE says the sign bit of a NaN is only specified for copy, negate, abs, and copySign.

After the change to the constant folder for fsub, why doesn't it work just to use fsub?

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

Because I don't actually think that's what the hardware does.

@JeffBezanson
Copy link
Sponsor Member

Do you mean the hardware 0 - x does not implement IEEE negate, and LLVM lacks a negate instruction, so accessing the hardware neg is impossible? If so that is a very serious design problem in LLVM.

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

That's my impression, yes. I couldn't actually find a reference on what the hardware does, but from my initial tests last night it does not seem to match IEEE negate.

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

Also, note that I'm referring to the hardware subtraction instruction here. With the proper optimizations, LLVM will turn it into the vxorps, which does behave like IEEE negate.

@JeffBezanson
Copy link
Sponsor Member

That's not surprising; the hardware is correct there. Subtraction is of course a different operation.

Does this code, with integer bitwise ops, also get optimized to vxorps?

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

No, it does not unfortunately. I was planning to address that in LLVM though.

@StefanKarpinski
Copy link
Sponsor Member

The high-level argument why LLVM's behavior is incorrect here is that this is supposed to be a pure optimization and have no user-visible effect. Producing different bits is highly user visible. It doesn't really matter that the values involved are NaNs. By the IEEE standard, they can for many of these operations return any NaN they want – but it should be the same NaN regardless of whether constant folding is done or not. The fact that the hardware instruction flips the sign bit in the optimized version indicates to me that LLVM really ought to be flipping the sign in both versions.

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

Well, technically the sign bit is undefined by IEEE if either input or output are NaNs so LLVM thinks it's fine. We all know how much LLVM likes to exploit undefined behavior.

@JeffBezanson
Copy link
Sponsor Member

I think it's fine for an optimization to change a result that's undefined. Without that flexibility many low-level optimizations would be way too difficult or blocked entirely.

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

Yes, I agree, It's just too bad that there's really no way implement negate other than bitcasting.

@JeffBezanson
Copy link
Sponsor Member

I tried a C test program, and clang emits

  %4 = fsub fast double -0.000000e+00, %3

for negation. Maybe that -0.0 does the trick within llvm somehow?

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

No, I doubt it. I'm pretty sure with optimizations disabled, LLVM will always emit the hardware subtraction.

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

Looks like I was wrong. The -0.0 does indeed do the trick on LLVM 3.5.

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

This looks like it works reliably now.

@JeffBezanson
Copy link
Sponsor Member

Awesome. Thank goodness we can take advantage of hacks they need to make clang work :)

Does the -0.0 trick really not work on LLVM 3.3? Was clang just buggy then?

@Keno
Copy link
Member Author

Keno commented Aug 7, 2014

@JeffBezanson The constant folder definitely doesn't. This was only fixed on Jun 8: llvm-mirror/llvm@bef256f.

@JeffBezanson
Copy link
Sponsor Member

Ok, then this will have to do.

JeffBezanson added a commit that referenced this pull request Aug 7, 2014
@JeffBezanson JeffBezanson merged commit 04034ca into master Aug 7, 2014
@Keno
Copy link
Member Author

Keno commented Aug 8, 2014

It was pointed out to me that the reason fsub -0.0, x works, but fsub 0.0, x doesn't is that the latter does not negate if x==+0.0

@Keno Keno deleted the kf/7868 branch August 10, 2014 18:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants