Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement multiply-add interface in LinearAlgebra #29634

Merged
merged 46 commits into from
Aug 14, 2019

Conversation

tkf
Copy link
Member

@tkf tkf commented Oct 13, 2018

I started working on "multiply-add" interface for computing C = αAB + βC for LinearAlgebra. The detail of the API is currently discussed in #23919. I go ahead and implement the feature using the name addmul!. I use this name only because it's easy to switch to other alternatives by simple replace. Let's do (and finish 😄) the naming discussion in #23919.

Remaining TODOs:

  • Implement multiply-add interface for most (?) of the types in LinearAlgebra
  • Pass existing tests for mul!
  • Tests for matmul.jl
  • Tests for generic.jl
  • Tests for uniformscaling.jl
  • Tests for diagonal.jl
  • Tests for bidiag.jl
  • Tests for triangular.jl
  • Benchmark / optimization
  • Pick a final name -> mul! Matrix Multiplication API #23919 (comment)
  • decide and implement NaN behavior

EDIT: I updated the summary "inplace" since I noticed there were some now irrelevant comments. (You can click edited ▼ above to see the old version.)

@jebej
Copy link
Contributor

jebej commented Oct 15, 2018

Wouldn't the two functions defined do exactly the same thing for the default arguments of alpha and beta (which should be true and false) since gemm_wrapper! already has those defaults? If so, what would be the point, except for those two functions to have a different name?

@tkf
Copy link
Member Author

tkf commented Oct 15, 2018

The idea is that ternary mul! and muladd! do C = A * B and C += A * B, respectively, to make muladd! similar to scalar muladd. Of course, the exact signature may be different from the one I write above. Maybe only 5-ary muladd! will be defined. Let's keep the API discussion in #23919.

@andreasnoack
Copy link
Member

I'd forgotten that. It's quite rare that there are no changes detected at all.

Then let's get thing merged such that we can finally start using the feature. It's been in demand for quite some time.

@JeffBezanson
Copy link
Sponsor Member

The tests for this are failing sometimes.

@fredrikekre
Copy link
Member

See #32924

@Jutho
Copy link
Contributor

Jutho commented Oct 30, 2019

I hope that by mentioning the following issue/inconsistency here, the relevant people are directly informed:

The function

mul!(C::StridedMatrix{T}, A::StridedVecOrMat{T}, B::StridedVecOrMat{T}, α::Number, β::Number)

tries to call BLAS/gemm whenever the scalars can be converted to type T.

However, for e.g. adjoint A and complex eltype, there is

mul!(C::StridedMatrix{T}, adjA::Adjoint{<:Any,<:StridedVecOrMat{T}}, B::StridedVecOrMat{T}, alpha::Union{T, Bool}, beta::Union{T, Bool}) where {T<:BlasComplex}

which requires the scalars to be of type T or of type Bool, while other values of the scalars (e.g. I wanted to use alpha=-1) fall back to

mul!(C::AbstractMatrix, adjA::Adjoint{<:Any,<:AbstractVecOrMat}, B::AbstractVecOrMat, alpha::Number, beta::Number)

which directly calls

generic_matmatmul!

instead of BLAS/gemm. The same seems to be true for other combinations of adjoint and transpose.

@Jutho
Copy link
Contributor

Jutho commented Oct 30, 2019

Also, I notice that with a code which uses many small matrix multiplications, the construction of the MulAddMul object takes a significant fraction of the time. I first assumed this was an issue with profiling, but replacing mul! calls with corresponding BLAS.gemm! calls confirms that these profiles are telling the truth.

Here an example profile:
profile

@tkf
Copy link
Member Author

tkf commented Oct 30, 2019

As for Adjoint/Transpose, I think we just need something like #33229 (as you've already discovered).

MulAddMul construction takes time when alpha and beta are not constant. A MWE is

N = 10
A = randn(N, N)
B = randn(N, N)
C = similar(A)

function demo(C, A, B, alpha, beta, n)
    for _ in 1:n
        mul!(C, A, B, alpha, beta)
    end
end

demo(C, A, B, 1, 0, 1)
@time @profile demo(C, A, B, 1, 0, 10^5)

while MulAddMul vanishes from the profile if I do

function demo(C, A, B, n)
    for _ in 1:n
        mul!(C, A, B, 1, 0)
    end
end

Maybe something like #29634 (comment) would be the solution? Maybe we can also add an early branch in mul! to directly call BLAS.gemm! without constructing MulAddMul?

@Jutho
Copy link
Contributor

Jutho commented Oct 30, 2019

Thanks for the quick response and the pointers @tkf .

For the MulAddMul issue, I had constant values in the code, but maybe some higher levels up, so I don't know if they were propagated correctly. It also seemed to be worse for the adjoint(A)*B case than for the A*B case, which was maybe related to the first issue. I can profile further if needed, but have currently worked around these by directly calling gemm!, and it seems you are already aware of these issues. That's also why I didn't open new issues right away.

@tkf
Copy link
Member Author

tkf commented Oct 31, 2019

Yeah, propagating constants for long call chains is not super robust... FYI, one option may be to use StaticNumbers.jl. This calls gemm! and compiles away MulAddMul construction:

using StaticNumbers
demo(C, A, B, static(1), static(0), 1)
@time @profile demo(C, A, B, static(1), static(0), 10^5)

I also tried #29634 (comment) and it works (i.e., MulAddMul vanishes from the profile). But, IIUC, it works by duplicating function body. Not sure if we want the code bloat, especially the ones implemented in pure Julia.

@tkf tkf mentioned this pull request Jan 16, 2020
tkf added a commit to tkf/julia that referenced this pull request Jan 29, 2020
tkf added a commit to tkf/julia that referenced this pull request Jan 31, 2020
KristofferC added a commit that referenced this pull request Feb 1, 2020
* Construct MulAddMul at gemm_wrapper! call sites

* Add branches manually in MulAddMul constructor

This is suggested by chethega in:
#29634 (comment)

* Update stdlib/LinearAlgebra/src/generic.jl

Co-Authored-By: Kristoffer Carlsson <[email protected]>

Co-authored-by: Kristoffer Carlsson <[email protected]>
KristofferC pushed a commit that referenced this pull request Feb 5, 2020
* Construct MulAddMul at gemm_wrapper! call sites

* Add branches manually in MulAddMul constructor

This is suggested by chethega in:
#29634 (comment)

* Update stdlib/LinearAlgebra/src/generic.jl

Co-Authored-By: Kristoffer Carlsson <[email protected]>

Co-authored-by: Kristoffer Carlsson <[email protected]>
(cherry picked from commit 2da42e0)
KristofferC added a commit that referenced this pull request Apr 11, 2020
* Construct MulAddMul at gemm_wrapper! call sites

* Add branches manually in MulAddMul constructor

This is suggested by chethega in:
#29634 (comment)

* Update stdlib/LinearAlgebra/src/generic.jl

Co-Authored-By: Kristoffer Carlsson <[email protected]>

Co-authored-by: Kristoffer Carlsson <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet