`hypot` performance questionable #36353

KlausC · 2020-06-18T21:46:29Z

I just made massive use of hypot(a, b) with real arguments. I discovered an interesting performance bottleneck.
I am aware of #31922 and #33224, which both focus on accuracy.
I want to present the following measurements:

julia> @benchmark hypot(1.0, 2.0)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     18.065 ns (0.00% GC)
  median time:      18.232 ns (0.00% GC)
  mean time:        18.726 ns (0.00% GC)
  maximum time:     60.926 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

julia> @benchmark MatrixAlgebra._hypot(1.0, 2.0)
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     1.830 ns (0.00% GC)
  median time:      1.843 ns (0.00% GC)
  mean time:        1.890 ns (0.00% GC)
  maximum time:     10.976 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1000

That is a runtime factor of 10 compared to the (maybe naive) implementation:

 function _hypot(x, y)
     s = sqrt(abs2(x) + abs2(y))
     _isclean(s) ? s : hypot(x, y)
 end
 function _isclean(a::T) where T<:AbstractFloat
     isfinite(a) &&
    a >= sqrt(4 * floatmin(T))
end

Comparing the accuracy of both implementations gives:

julia> foo(10^6, Float64, hypot)
(4.005836504709864e-17, 4.7168514470489265e-17, 1.168161969438445e-16)

julia> foo(10^6, Float64, MatrixAlgebra._hypot)
(4.6543949325418543e-17, 5.704669978693873e-17, 2.108974170018499e-16)

julia> function foo(n, T, f)
           s3 = s2 = s1 = zero(T)
           for i = 1:n
               a, b = T.(randn(2))
               h = hypot(big(a), big(b))
               d = T(abs(h - f(a, b)) / h)
               s3 = max(s3, d)
               s2 = hypot(s2, d)
               s1 += d
           end
           s1 / n, s2 / sqrt(n), s3
       end
foo (generic function with 3 methods)

The return values of foo are the mean values of the relative deviations from the accurate values in 1-norm, 2-norm, and Inf-norm.
I want to discuss, if those deviations in accuracy justify the performance burden.

The text was updated successfully, but these errors were encountered:

longemen3000 · 2020-06-19T00:56:19Z

I like this fast track, the only thing that comes to my mind Is if the ulp error is enough

kimikage · 2020-06-19T04:36:50Z

I am concerned about the degradation of accuracy, too.

However, as for the Float32, I think the following specialization may be available (for the intermediate variable s) in most cases.

hypot_f32(a::Float32, b::Float32) = Float32(sqrt(Float64(a)^2 + Float64(b)^2))

(I have not identified the cases which would lead to inaccuracy yet.)

kimikage · 2020-06-19T05:27:34Z

BTW, the current implementation of hypot uses Base.Math.FMA_NATIVE. (cf. #33011)

KlausC · 2020-06-19T09:49:26Z

Can anybody run the benchmarks on a machine, with FMA_NATIVE == true? For example as of Intel I5 Generation 4.

KlausC · 2020-06-19T14:00:36Z

would also be interested in the times on that machine, using _hypot from my post.

kimikage · 2020-06-19T14:12:18Z

Please wait a minute. 🤔

Does this eps fail the constant folding?

julia/base/math.jl

Line 662 in d6d5208

scale = eps(sqrt(floatmin(T))) #Rescaling constant

cf. #31922 (comment)
cc: @cfborges, @simonbyrne

simonbyrne · 2020-06-19T15:39:16Z

It looks like it does...

julia> f(T) = eps(sqrt(floatmin(T)))
f (generic function with 1 method)

julia> @code_llvm f(Float64)

;  @ REPL[9]:1 within `f'
define double @julia_f_17254(%jl_value_t addrspace(10)*) {
top:
; ┌ @ float.jl:745 within `eps'
   %1 = call double @julia_ldexp_918(double 0x3CB0000000000000, i64 -511)
; └
  ret double %1
}

simonbyrne · 2020-06-19T15:41:59Z

This seems to work correctly:

julia> g(T) = eps(T)*sqrt(floatmin(T))
g (generic function with 1 method)

julia> @code_llvm g(Float64)

;  @ REPL[21]:1 within `g'
define double @julia_g_17323(%jl_value_t addrspace(10)*) {
top:
  ret double 0x1CC0000000000000
}

The scaling constant in `hypot` wasn't being constant folded. Fixes #36353.

KlausC · 2020-06-19T17:31:24Z

PR #36365 improves the relation from 10-fold to 7-fold on my machine (without FMA_NATIVE). That may go down to 5-fold on FMA machines.
Question is: do we want to pay the price for 1 bit of accuracy in hypot?

cfborges · 2020-06-19T18:07:44Z

I had noticed that evaluating those constants seems to slow things down but did not know why. Glad someone has a notion about that.
It's clear that a code that avoids unnecessary floating exceptions and attempts to correctly round will be slower. My goal with that algorithm was simply to get to correct rounding since that is what the IEEE standard suggests (but does not require) for hypot. That said, I think for most applications where speed is critical it's better to just use h = sqrt(x*x+y*y) since this can never be off by more than one ulp (as opposed to the Julia 1.1 version of hypot which could be off by two). The only problem with this approach is that the user needs to be sure that there are no extreme arguments since those could lead to unecessary overflow/underflow.

kimikage · 2020-06-19T19:20:26Z

micro optimization...

@inline function hypot(x::T, y::T) where T<:AbstractFloat
    absx, absy = abs(x), abs(y)
    
    # Return Inf if either or both inputs is Inf (Compliance with IEEE754)
    isinf(absx) && return absx
    isinf(absy) && return absy
    
    # Order the operands (ax >= ay)
    ax = ifelse(absx < absy, absy, absx)
    ay = ifelse(absx < absy, absx, absy)
    ...

Edit:
I applied the suggestion by @cfborges (#36353 (comment))
They seem to have flipped while I was struggling with NaN. The intention is to use the hardware min/max instructions (without @fastmath). So my bug should have no effect on "speed".

simonbyrne · 2020-06-23T20:19:22Z

@kimikage can you open a pull request?

kimikage · 2020-06-23T23:48:11Z

Perhaps I can, but I don't think it's worth the separate PR now. We still need the discussion about #33011, the approach of OP and the Float32 specialization.

cfborges · 2020-06-24T21:47:21Z

One minor comment I think

# Order the operands
    ax = ifelse(absx < absy, absx, absy)
    ay = ifelse(absx < absy, absy, absx)

would actually need to be

# Order the operands
    ax = ifelse(absx < absy, absy, absx)
    ay = ifelse(absx < absy, absx, absy)

Since the goal is to have x > y.

The scaling constant in `hypot` wasn't being constant folded. Fixes #36353.

simonbyrne · 2020-08-01T04:31:00Z

I'll leave this open since my fix was only partial

The scaling constant in `hypot` wasn't being constant folded. Fixes JuliaLang#36353.

Moelf · 2021-09-05T01:14:14Z

@kimikage I like the hypot_f32, do you know if this is supported by others?

julia> a = sqrt(typemax(Float32) |> prevfloat)
1.8446743f19

julia> hypot(a,a)
2.6087633f19

julia> hypot_f32(a,a)
2.6087633f19

oscardssmith · 2021-09-05T02:02:43Z

yeah. I would approve that PR. Trying to do fancy stuff with Float32 is almost always worse than doing simple stuff with Float64.

ViralBShah · 2022-03-12T21:29:51Z

hypot performance is greatly improved (perhaps because of #42122):

julia> @benchmark hypot(1.0, 2.0)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.430 ns … 10.624 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.437 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.442 ns ±  0.097 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

    ▁▄█▆▆▆▃                                                   
  ▂▅████████▇▇▇▆▁▇▆▅▃▂▂▂▂▁▂▁▁▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▃▃▃▃▃▃▃▂▂▂▂▂ ▃
  1.43 ns        Histogram: frequency by time        1.48 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

This comment has been minimized.

Sign in to view

simonbyrne added a commit that referenced this issue Jun 19, 2020

hypot scaling constant folding

9e19d23

The scaling constant in `hypot` wasn't being constant folded. Fixes #36353.

simonbyrne mentioned this issue Jun 19, 2020

hypot scaling constant folding #36365

Merged

simonbyrne closed this as completed in #36365 Aug 1, 2020

simonbyrne added a commit that referenced this issue Aug 1, 2020

hypot scaling constant folding (#36365)

f2319b5

The scaling constant in `hypot` wasn't being constant folded. Fixes #36353.

simonbyrne reopened this Aug 1, 2020

simeonschaub pushed a commit to simeonschaub/julia that referenced this issue Aug 11, 2020

hypot scaling constant folding (JuliaLang#36365)

94b75fa

The scaling constant in `hypot` wasn't being constant folded. Fixes JuliaLang#36353.

Moelf mentioned this issue Sep 5, 2021

faster hypot for Float32 and Float16 #42122

Merged

ViralBShah added domain:maths Mathematical functions performance Must go faster labels Mar 12, 2022

ViralBShah closed this as completed Mar 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`hypot` performance questionable #36353

`hypot` performance questionable #36353

KlausC commented Jun 18, 2020 •

edited

Loading

longemen3000 commented Jun 19, 2020

kimikage commented Jun 19, 2020 •

edited

Loading

kimikage commented Jun 19, 2020

KlausC commented Jun 19, 2020

This comment has been minimized.

KlausC commented Jun 19, 2020

kimikage commented Jun 19, 2020 •

edited

Loading

simonbyrne commented Jun 19, 2020

simonbyrne commented Jun 19, 2020

KlausC commented Jun 19, 2020 •

edited

Loading

cfborges commented Jun 19, 2020

kimikage commented Jun 19, 2020 •

edited

Loading

simonbyrne commented Jun 23, 2020

kimikage commented Jun 23, 2020

cfborges commented Jun 24, 2020

simonbyrne commented Aug 1, 2020

Moelf commented Sep 5, 2021 •

edited

Loading

oscardssmith commented Sep 5, 2021

ViralBShah commented Mar 12, 2022

hypot performance questionable #36353

hypot performance questionable #36353

Comments

KlausC commented Jun 18, 2020 • edited Loading

longemen3000 commented Jun 19, 2020

kimikage commented Jun 19, 2020 • edited Loading

kimikage commented Jun 19, 2020

KlausC commented Jun 19, 2020

This comment has been minimized.

KlausC commented Jun 19, 2020

kimikage commented Jun 19, 2020 • edited Loading

simonbyrne commented Jun 19, 2020

simonbyrne commented Jun 19, 2020

KlausC commented Jun 19, 2020 • edited Loading

cfborges commented Jun 19, 2020

kimikage commented Jun 19, 2020 • edited Loading

simonbyrne commented Jun 23, 2020

kimikage commented Jun 23, 2020

cfborges commented Jun 24, 2020

simonbyrne commented Aug 1, 2020

Moelf commented Sep 5, 2021 • edited Loading

oscardssmith commented Sep 5, 2021

ViralBShah commented Mar 12, 2022

`hypot` performance questionable #36353

`hypot` performance questionable #36353

KlausC commented Jun 18, 2020 •

edited

Loading

kimikage commented Jun 19, 2020 •

edited

Loading

kimikage commented Jun 19, 2020 •

edited

Loading

KlausC commented Jun 19, 2020 •

edited

Loading

kimikage commented Jun 19, 2020 •

edited

Loading

Moelf commented Sep 5, 2021 •

edited

Loading