-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
faster Float32 and Float16 pow #40236
Conversation
Do we require |
d1cd58b
to
7aab2dd
Compare
I think this is ready to go. I haven't done fully rigorous tests on it, but every test-case I've used has worked, and conceptually, this should be just over .5 ULP. |
I've tested a wide variety of random numbers and haven't found more than |
Bumping this. Can someone look at it and merge? |
@@ -867,29 +867,35 @@ end | |||
z | |||
end | |||
@inline function ^(x::Float32, y::Float32) | |||
z = ccall("llvm.pow.f32", llvmcall, Float32, (Float32, Float32), x, y) | |||
z = Float32(exp2_fast(log2(Float64(x))*y)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why don't we want to trust llvm (aka libm) here anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.6 introduced about a 3x regression on this due to a switch in which libm got loaded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we fix openlibm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably. That said, I generally think that we shouldn't be relying on external libraries, so I'm not that sad about replacing this anyway.
@oscardssmith, does this cause a problem when used with julia> versioninfo()
Julia Version 1.7.0-DEV.1006
Commit 248c02f531* (2021-04-24 17:37 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
julia> Float16(-1)^2
Float16(1.0)
julia> @fastmath Float16(-1)^2 # used in Colors.jl
ERROR: DomainError with -1.0:
log2 will only return a complex result if called with a complex argument. Try log2(Complex(x)). The previous commit is OK. julia> versioninfo()
Julia Version 1.7.0-DEV.998
Commit ac7974acef* (2021-04-23 20:59 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
julia> @fastmath Float16(-1)^2
Float16(1.0) |
Another case (found in ColorVectorSpace.jl) julia> (-1.0f0)^2.0f0
ERROR: DomainError with -1.0:
log2 will only return a complex result if called with a complex argument. Try log2(Complex(x)). cf. PkgEval: https://github.com/JuliaCI/NanosoldierReports/blob/master/pkgeval/by_date/2021-04/25/report.md |
That's unfortunate. I'll revert tomorrow unless I can think of something clever to fix it without too much performance impact. |
This reverts commit 1474566.
Approximately .5 ULP, relatively fast. Update float^integer as well
Approximately .5 ULP, relatively fast. Update float^integer as well
Approximately .5 ULP, relatively fast. Update float^integer as well
Returns
Float32
speed to roughly the speed of 1.5. Also speeds upFloat16
pow. These new methods are .5 ULP (from limited testing). There is further room to optimize these, but this fixes the regression. At some point, I hope to have aFloat64
version, but that will be much harder as it requires alog2
function that gives extra bits of precision, and anexp2
function that takes in aDouble Double