-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make inv(::Complex) return zero when input has Inf magnitude #32050
Conversation
e1d5268
to
0b9c9eb
Compare
What is the performance impact of this? |
Before the PR: julia> v64 = randn(ComplexF64, 100000);
julia> @btime inv.($v64);
1.393 ms (2 allocations: 1.53 MiB)
julia> v32 = randn(ComplexF32, 100000);
julia> @btime inv.($v32);
426.785 μs (2 allocations: 781.33 KiB)
julia> vbig = big.(randn(ComplexF64, 1000));
julia> @btime inv.($vbig);
1.057 ms (13001 allocations: 695.44 KiB) After the PR: julia> v64 = randn(ComplexF64, 100000);
julia> @btime inv.($v64);
1.522 ms (2 allocations: 1.53 MiB)
julia> v32 = randn(ComplexF32, 100000);
julia> @btime inv.($v32);
1.601 ms (2 allocations: 781.33 KiB)
julia> vbig = big.(randn(ComplexF64, 1000));
julia> @btime inv.($vbig);
1.064 ms (13001 allocations: 695.44 KiB) So, Note also that as it is now, julia> inv(complex(Inf32))
NaN32 - 0.0f0im
julia> inv(complex(big(Inf)))
NaN - 0.0im
julia> inv(complex(Inf))
0.0 - 0.0im The PR makes everything consistent, as checked in the tests. |
With a big array, you are measuring cache and allocation time as well as computation. Can you just time it for a single number? |
0b9c9eb
to
78aef8f
Compare
I tried
so the new version is 7x faster? Seems weird. |
I don't know if that's the cause, but |
It's maybe a change in whether it inlines? |
If I do function newinv2(z::Complex)
c, d = reim(z)
complex(c, -d)/(c * c + d * d)
end then julia> @btime g2($a);
1.125 μs (0 allocations: 0 bytes) which is 2x faster than the version with the |
Oh, I just realized that I'm comparing the wrong method — I need the julia> @btime g($a);
14.973 μs (0 allocations: 0 bytes) or about 6% slowdown. |
For Complex{Float32}, my machine counts a slowdown of 2.3 -> 8.5 cycles in batch / simd, and 6 -> 11 cycles for sequential processing. With
|
I ran your benchmark on my machine and I got very different results julia> r=rand(Complex{Float32}, 1000); b=copy(r);
julia> @btime broadcast!($_inv_old, $b, $r); @btime broadcast!($_inv_new, $b, $r); @btime _inv_old($r[1]); @btime _inv_new($r[1]);
1.095 μs (0 allocations: 0 bytes)
2.793 μs (0 allocations: 0 bytes)
3.811 ns (0 allocations: 0 bytes)
3.287 ns (0 allocations: 0 bytes)
julia> r=rand(Complex{Float64}, 1000); b=copy(r);
julia> @btime broadcast!($_inv_old, $b, $r); @btime broadcast!($_inv_new, $b, $r); @btime _inv_old($r[1]); @btime _inv_new($r[1]);
3.810 μs (0 allocations: 0 bytes)
3.811 μs (0 allocations: 0 bytes)
3.812 ns (0 allocations: 0 bytes)
3.815 ns (0 allocations: 0 bytes) For |
@chethega, you made the same mistake I did: the |
Any other comment on this? |
@stevengj, please merge if this looks good to you. |
Fix #23134.