Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make inv(::Complex) return zero when input has Inf magnitude #32050

Merged
merged 1 commit into from
Jun 5, 2019

Conversation

giordano
Copy link
Contributor

@giordano giordano commented May 16, 2019

Fix #23134.

@giordano giordano changed the title Make inv(::ComplexF64) return zero when input has Inf magnitude Make inv(::Complex) return zero when input has Inf magnitude May 16, 2019
@stevengj
Copy link
Member

What is the performance impact of this?

@giordano
Copy link
Contributor Author

giordano commented May 17, 2019

Before the PR:

julia> v64 = randn(ComplexF64, 100000);

julia> @btime inv.($v64);
  1.393 ms (2 allocations: 1.53 MiB)

julia> v32 = randn(ComplexF32, 100000);

julia> @btime inv.($v32);
  426.785 μs (2 allocations: 781.33 KiB)

julia> vbig = big.(randn(ComplexF64, 1000));

julia> @btime inv.($vbig);
  1.057 ms (13001 allocations: 695.44 KiB)

After the PR:

julia> v64 = randn(ComplexF64, 100000);

julia> @btime inv.($v64);
  1.522 ms (2 allocations: 1.53 MiB)

julia> v32 = randn(ComplexF32, 100000);

julia> @btime inv.($v32);
  1.601 ms (2 allocations: 781.33 KiB)

julia> vbig = big.(randn(ComplexF64, 1000));

julia> @btime inv.($vbig);
  1.064 ms (13001 allocations: 695.44 KiB)

So, ComplexF64 has a ~10% penalty, Complex{BigFloat} is basically unchanged, ComplexF32 is much worse. Note that for ComplexF32 I changed the operation performed (before it was conj(widen(z))/abs2(widen(z)), in the PR it is inv(widen(z))).

Note also that as it is now, ComplexF32 and Complex{BigFloat} give inconsistent results compared to ComplexF64:

julia> inv(complex(Inf32))
NaN32 - 0.0f0im

julia> inv(complex(big(Inf)))
NaN - 0.0im

julia> inv(complex(Inf))
0.0 - 0.0im

The PR makes everything consistent, as checked in the tests.

@stevengj
Copy link
Member

stevengj commented May 17, 2019

With a big array, you are measuring cache and allocation time as well as computation. Can you just time it for a single number?

base/complex.jl Outdated Show resolved Hide resolved
@stevengj
Copy link
Member

I tried

julia> a = rand(ComplexF64, 1000);

julia> f(x) = sum(inv, x)
f (generic function with 1 method)

julia> g(x) = sum(newinv, x)
g (generic function with 1 method)

julia> @btime f($a);
  14.100 μs (0 allocations: 0 bytes)

julia> @btime g($a);
  2.306 μs (0 allocations: 0 bytes)

so the new version is 7x faster? Seems weird.

@giordano
Copy link
Contributor Author

I don't know if that's the cause, but conj(z)/abs2(z) calls three times real(z) and other three times imag(z), the new version calls reim only once

@stevengj
Copy link
Member

It's maybe a change in whether it inlines?

@stevengj
Copy link
Member

If I do

function newinv2(z::Complex)
    c, d = reim(z)
    complex(c, -d)/(c * c + d * d)
end

then g2(x) = sum(newinv2, x) gives

julia> @btime g2($a);
  1.125 μs (0 allocations: 0 bytes)

which is 2x faster than the version with the isinf checks.

@stevengj
Copy link
Member

stevengj commented May 17, 2019

Oh, I just realized that I'm comparing the wrong method — I need the inv(w::ComplexF64) version. With that, I get

julia> @btime g($a);
  14.973 μs (0 allocations: 0 bytes)

or about 6% slowdown.

@chethega
Copy link
Contributor

For Complex{Float32}, my machine counts a slowdown of 2.3 -> 8.5 cycles in batch / simd, and 6 -> 11 cycles for sequential processing. With Complex{Float64}, I count 7->8 and 7->10.5 cycles (on 2ghz broadwell). That's pretty expensive for a very rare corner case.

julia> using BenchmarkTools
julia> function _inv_new(z::Complex)
       c, d = reim(z)
       (isinf(c) | isinf(d)) && return complex(copysign(zero(c), c), flipsign(-zero(d), d))
       complex(c, -d)/(c * c + d * d)
       end
julia> _inv_old(z)=conj(z)/abs2(z);
julia> r=rand(Complex{Float32}, 1000); b=copy(r);
julia> @btime broadcast!($_inv_old, $b, $r); @btime broadcast!($_inv_new, $b, $r); @btime _inv_old($r[1]); @btime _inv_new($r[1]);
  1.143 μs (0 allocations: 0 bytes)
  4.268 μs (0 allocations: 0 bytes)
  3.075 ns (0 allocations: 0 bytes)
  5.506 ns (0 allocations: 0 bytes)
julia> r=rand(Complex{Float64}, 1000); b=copy(r);

julia> @btime broadcast!($_inv_old, $b, $r); @btime broadcast!($_inv_new, $b, $r); @btime _inv_old($r[1]); @btime _inv_new($r[1]);
  3.516 μs (0 allocations: 0 bytes)
  3.992 μs (0 allocations: 0 bytes)
  3.516 ns (0 allocations: 0 bytes)
  5.262 ns (0 allocations: 0 bytes)

@giordano
Copy link
Contributor Author

giordano commented May 18, 2019

I ran your benchmark on my machine and I got very different results

julia> r=rand(Complex{Float32}, 1000); b=copy(r);

julia> @btime broadcast!($_inv_old, $b, $r); @btime broadcast!($_inv_new, $b, $r); @btime _inv_old($r[1]); @btime _inv_new($r[1]);
  1.095 μs (0 allocations: 0 bytes)
  2.793 μs (0 allocations: 0 bytes)
  3.811 ns (0 allocations: 0 bytes)
  3.287 ns (0 allocations: 0 bytes)

julia> r=rand(Complex{Float64}, 1000); b=copy(r);

julia> @btime broadcast!($_inv_old, $b, $r); @btime broadcast!($_inv_new, $b, $r); @btime _inv_old($r[1]); @btime _inv_new($r[1]);
  3.810 μs (0 allocations: 0 bytes)
  3.811 μs (0 allocations: 0 bytes)
  3.812 ns (0 allocations: 0 bytes)
  3.815 ns (0 allocations: 0 bytes)

For Float64 there is basically no change, for Float32 the scalar benchmark seems to favour the new version

@stevengj
Copy link
Member

stevengj commented May 20, 2019

@chethega, you made the same mistake I did: the conj(z)/abs2(z) version is not the method that is called for ComplexF32 or ComplexF64. See the inv(w::ComplexF64) method, which has a bunch of extra checks to handle overflow/underflow already.

@giordano
Copy link
Contributor Author

giordano commented Jun 4, 2019

Any other comment on this?

@StefanKarpinski
Copy link
Sponsor Member

@stevengj, please merge if this looks good to you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Division of finite values by infinite complex values should always return zero
4 participants