Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pure Julia ldexp function #19491

Merged
merged 4 commits into from
Dec 8, 2016
Merged

Add pure Julia ldexp function #19491

merged 4 commits into from
Dec 8, 2016

Conversation

musm
Copy link
Contributor

@musm musm commented Dec 3, 2016

Get rid of openlibm ldexp function for a pure Julia implementation, which is also faster.

benchmark (median) using BenchmarkTools (master) and julia 0.5 for Float64

regular branch is
~ 1.5x speedup
subnormal number and subnormal output
~ 3.6x speedup
normal number and subnormal output
~ 2.6x speedup
subnormal number and normal output
~ 13x speedup

Test values for all branches (T = Float64, Float32)

    @test Amal.ldexp(T(0.0), 0)                       === T(0.0)
    @test Amal.ldexp(T(-0.0), 0)                      === T(-0.0)
    @test Amal.ldexp(T(Inf), 1)                       === T(Inf)
    @test Amal.ldexp(T(-Inf), 1)                      === T(-Inf)
    @test Amal.ldexp(T(NaN), 10)                      === T(NaN)
    @test Amal.ldexp(T(0.8), 4)                       === T(12.8)
    @test Amal.ldexp(T(-0.854375), 5)                 === T(-27.34)
    @test Amal.ldexp(T(1.0), 0)                       === T(1.0)
    @test  Amal.ldexp( realmin(T)/10, typemin(Int64)) === T(0.0)
    @test  Amal.ldexp(-realmin(T)/10, typemin(Int64)) === T(-0.0)
    @test Amal.ldexp(T(1.0), typemax(Int64))          === T(Inf)
    @test Amal.ldexp(T(1.0), typemin(Int64))          === T(0.0)

	# test against reference 
    @test  Amal.ldexp(realmin(T), -1)      == Base.ldexp(realmin(T), -1)
    @test  Amal.ldexp(realmin(T), -2)      == Base.ldexp(realmin(T), -2)
    @test  Amal.ldexp(realmin(T)/2, 0)     == Base.ldexp(realmin(T)/2, 0)
    @test  Amal.ldexp(realmin(T)/3, 0)     == Base.ldexp(realmin(T)/3, 0)
    @test  Amal.ldexp(realmin(T)/3, -1)    == Base.ldexp(realmin(T)/3, -1)
    @test  Amal.ldexp(realmin(T)/3, 11)    == Base.ldexp(realmin(T)/3, 11)
    @test  Amal.ldexp(realmin(T)/11, -10)  == Base.ldexp(realmin(T)/11, -10)
    @test  Amal.ldexp(-realmin(T)/11, -10) == Base.ldexp(-realmin(T)/11, -10)

@musm musm force-pushed the ldexp branch 2 times, most recently from 22c56ef to d680b8f Compare December 3, 2016 06:38
ldexp(x::Float64,e::Integer) = ccall((:scalbn,libm), Float64, (Float64,Int32), x, Int32(e))
ldexp(x::Float32,e::Integer) = ccall((:scalbnf,libm), Float32, (Float32,Int32), x, Int32(e))
function ldexp{T<:AbstractFloat}(x::T, q::Integer)
n = Int(q)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well correctly handle values outside of the range of Int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I have this is if q::Int128 then things are type unstable with k + q . It's not a problem for Int32 or Int64

return flipsign(T(0.0), x)
end
end
k += n
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if this overflows (e.g. if n = typemax(Int)?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k will overflow and be negative and line 378 catches that case

Copy link
Contributor

@simonbyrne simonbyrne Dec 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could handle Int128 and BigInt by moving line 353 here, and

if q > 50000
    return flipsign(T(Inf), x)
elseif q < -50000
    return flipsign(T(0), x)
end
n = q % Int

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would add another branch, is it worth it, because you still have to check that k is not overflowing and underflowing in addition to this, which currently is cleverly handled. This adds, based on benchmarking the various branches, approximately a 1ns penalty.

Currently we do a lot worse and cast to Int32.

I don't necessarily have a problem with this, but question if this is the best way to do it. The quest for the most generic function is satisfying, but it typically entails performance penalties.

Internally for my codes that need this function, I don't use this since it's a bit too slow in practice but instead use the following:
https://github.com/JuliaMath/Amal.jl/blob/master/src/ldexp.jl
it has some important cases where it doesn't pass all cases, but that's not important since those cases are handeled externally.

Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, another option would be

if q > typemax(Int)
    return flipsign(T(Inf), x)
elseif q < typemin(Int)
    return flipsign(T(0), x)
end
n = q % Int

which should be able to optimise away the check if q::Int.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice!

@musm musm force-pushed the ldexp branch 3 times, most recently from 9b88338 to ce4a913 Compare December 3, 2016 22:48
ys = xs << m
xu = ys | (xu & sign_mask(T))
k = 1 - signed(m)
if n < -50000 # underflow
Copy link
Contributor Author

@musm musm Dec 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only needs to checked in the subnormal number case, otherwise k > 0 and it's not possible to underflow, thereby eliminating an unecessary branch.

@musm musm force-pushed the ldexp branch 3 times, most recently from 3782359 to 928c973 Compare December 4, 2016 19:19
inttype(::Type{Float64}) = Int64
inttype(::Type{Float32}) = Int32
inttype(::Type{Float16}) = Int16
exponent_max{T<:AbstractFloat}(::Type{T}) = Int(exponent_mask(T) >> significand_bits(T))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonbyrne what do you think about these?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might be able to do without inttype (see comment below), but if not perhaps a more descriptive name (maybe intofwidth or something like that?).

an exponent_max function seems reasonable, but as it's really the biased exponent, maybe exponent_raw_max?

Also, it would probably be more appropriate to have these in float.jl rather than here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we don't need inttype using your comment below.

Another case where I think this function would be useful is for something like this
intasfloat{T<:IEEEFloat}(::Type{T}, m::Integer) = reinterpret(T, (m % inttype(T)) << significand_bits(T))

but personally I think it's best to not include it, until there's actually another function that would benefit.

for exponent_max I've just inlined it. It seems rather special case to add it as a another function in float.jl. However, if you have objections I don't mind to put it in float.jl or similar

return flipsign(T(Inf), x)
end
if k > 0 # normal case
xu = (xu & ~exponent_mask(T)) | unsigned((k % inttype(T)) << significand_bits(T))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not k % typeof(xu)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep thanks

ys = xs << unsigned(m)
xu = ys | (xu & sign_mask(T))
k = 1 - m
if q < -50000 # underflow
Copy link
Contributor Author

@musm musm Dec 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number is more than safe for up to Float128

@musm
Copy link
Contributor Author

musm commented Dec 5, 2016

Fixed some typos in the comment and cleaned things up.
@simonbyrne I believe I have addressed all your comments.

@simonbyrne
Copy link
Contributor

@nanosoldier runbenchmarks("scalar", vs = ":master")

@simonbyrne simonbyrne added the domain:maths Mathematical functions label Dec 7, 2016
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@musm
Copy link
Contributor Author

musm commented Dec 8, 2016

exponent (didn't touch that function) and frexp (same change made here #19512 and no difference on nanosoldier) are totally spurios
cc @jrevels

@simonbyrne
Copy link
Contributor

Would you mind adding some tests for Int128 or BigInt (including a case where they cause under/overflow)?

Get rid of openlibm ldexp function for a pure Julia implementation,
which is also faster.
@musm
Copy link
Contributor Author

musm commented Dec 8, 2016

hi @simonbyrne I added tests, thanks.

@musm
Copy link
Contributor Author

musm commented Dec 8, 2016

I made a very trivial change to using == instead of === for comparison of finite nonzero values. Can we merge this or do we need to wait on tests again?

@simonbyrne
Copy link
Contributor

I've already mistakenly merged 1 "trivial" change that broke things in the last 24 hours, so maybe best to wait.

@@ -72,22 +72,22 @@ end
@test y == 0

# more ldexp tests
@test ldexp(T(0.8), 4) === T(12.8)
@test ldexp(T(-0.854375), 5) === T(-27.34)
@test ldexp(T(0.8), 4) == T(12.8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why test less specifically? === verifies that the types are the same. Would it fail here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it doesn't fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then === is better, as == could miss return-type regressions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest of the file is inconsistent anyways

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok w/e I dropped the commit, we can now merge

@@ -349,8 +349,49 @@ julia> ldexp(5., 2)
20.0
```
"""
ldexp(x::Float64,e::Integer) = ccall((:scalbn,libm), Float64, (Float64,Int32), x, Int32(e))
ldexp(x::Float32,e::Integer) = ccall((:scalbnf,libm), Float32, (Float32,Int32), x, Int32(e))
function ldexp{T<:AbstractFloat}(x::T, e::Integer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't really valid for every single AbstractFloat subtype, is it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not: realistically it's probably only valid for Float16,Float32 and Float64

Copy link
Contributor

@simonbyrne simonbyrne Feb 2, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it assumes:

  1. Is a bitstype.
  2. Is base-2
  3. Uses the same layout as the IEEE format (sign | biased exponent | significand )
  4. Uses gradual underflow for subnormals

So it could be valid for, e.g. an x87 Float80 or an IEEE Float128, but not BigFloat, IEEE decimal (as in DecFP.jl), DoubleDoubles, IBM floats or formats which do funny things like two's complement exponents.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:maths Mathematical functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants