Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize real matrix * complex matrix by using real matmul #44074

Merged
merged 3 commits into from
Feb 8, 2022

Conversation

jishnub
Copy link
Contributor

@jishnub jishnub commented Feb 8, 2022

Currently the real matrix is converted to complex, followed by a complex matmul. This is sub-optimal (given that the imaginary part is all zeros), and we may instead split it as A * (Br + im * Bi) = (A*Br) + im*(A*Bi), where we carry out two matrix multiplications, but both are real. This is what is implemented in this PR.

There are two versions that I could come up with, as listed below, but I've currently gone with the allocating but fastest version. Open to changing this, if the allocations seem excessive.

julia> using LinearAlgebra, BenchmarkTools

julia> function mul2(A::StridedMatrix{<:LinearAlgebra.BlasReal}, B::StridedMatrix{Complex{T}}) where {T<:LinearAlgebra.BlasReal}
           TS = promote_type(eltype(A), eltype(B))
           TSR = promote_type(eltype(A), T)
           C = similar(B, TS, (size(A, 1), size(B, 2)))
           Cr = reinterpret(reshape, TSR, C)
           AB = similar(A, TSR, size(A, 1), size(B, 2))
           BRI = real(B)
           mul!(AB, A, BRI)
           Cr[1, :, :] .= AB
           BRI .= imag.(B)
           mul!(AB, A, BRI)
           Cr[2, :, :] .= AB
           return C
       end
mul2 (generic function with 1 method)

julia> mul3(A, B) = (A * real(B)) .+ im .* (A * imag(B))
mul3 (generic function with 1 method)

julia> function multests(f, A, B)
           C1 = @btime $f($A, $B)
           C2 = @btime $f($A, view($B, :, :))
           C3 = @btime $f(view($A, :, :), $B)
           C4 = @btime $f(view($A, :, :), view($B, :, :))
           C5 = @btime $f(view($A, :, 1:2:size($A, 2)), view($B, 1:2:size($B,1), :))
           return C1, C2, C3, C4, C5
       end;

julia> const A = rand(1000, 1000); const B = complex.(A);


julia> AB1 = multests(*, A, B);
  176.926 ms (4 allocations: 30.52 MiB)
  174.206 ms (4 allocations: 30.52 MiB)
  173.648 ms (4 allocations: 30.52 MiB)
  171.779 ms (4 allocations: 30.52 MiB)
  987.113 ms (7 allocations: 22.92 MiB)

julia> AB2 = multests(mul2, A, B);
  145.858 ms (6 allocations: 30.52 MiB)
  144.687 ms (6 allocations: 30.52 MiB)
  144.297 ms (6 allocations: 30.52 MiB)
  145.443 ms (6 allocations: 30.52 MiB)
  95.931 ms (6 allocations: 26.70 MiB)

julia> AB3 = multests(mul3, A, B);
  106.106 ms (10 allocations: 45.78 MiB)
  105.187 ms (10 allocations: 45.78 MiB)
  106.942 ms (10 allocations: 45.78 MiB)
  121.555 ms (10 allocations: 45.78 MiB)
  65.900 ms (10 allocations: 38.15 MiB)

julia> mapreduce(isapprox, &, AB1, AB2)
true

julia> mapreduce(isapprox, &, AB1, AB3)
true

mul2 already gets us some speed-up, which is considerable (albeit with some allocations) if the matrix is non-contiguous. mul3 is significantly faster than both, so I've used it in this PR.

@dkarrasch dkarrasch added domain:linear algebra Linear algebra performance Must go faster labels Feb 8, 2022
@dkarrasch dkarrasch merged commit f6def1a into JuliaLang:master Feb 8, 2022
antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request Feb 17, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:linear algebra Linear algebra performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants