Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let muladd accept arrays #37065

Merged
merged 7 commits into from
Oct 29, 2020
Merged

Let muladd accept arrays #37065

merged 7 commits into from
Oct 29, 2020

Conversation

mcabbott
Copy link
Contributor

@mcabbott mcabbott commented Aug 16, 2020

This extends Base.muladd to work on arrays, using mul!, which makes some combinations more efficient:

julia> A = rand(20,5); x = rand(5, 1000); b = rand(20);

julia> @btime $A * $x .+ $b;
  29.787 μs (4 allocations: 312.66 KiB)

julia> @btime muladd($A, $x, $b);
  22.322 μs (2 allocations: 156.33 KiB)

Motivated by things like FluxML/Flux.jl#1272, where it would be convenient to fuse these two, rather than fusing addition & function application tanh.((W*x) .+ b) with broadcasting.

Needs tests, and thinking about muladd(::Adjoint, ...) & muladd(..., ::Number) cases, if anyone thinks this is a good idea.

@mcabbott
Copy link
Contributor Author

mcabbott commented Oct 2, 2020

Now has tests, and muladd(A, x, b) adds any b for which C = A*x; C .= C .+ b would work. Behaves the obvious way when A is an Adjoint vector, with C (and hence the result) either a scalar or another Adjoint vector.

@dkarrasch dkarrasch added the domain:linear algebra Linear algebra label Oct 29, 2020
@dkarrasch dkarrasch merged commit 72971c4 into JuliaLang:master Oct 29, 2020
@mcabbott mcabbott deleted the muladd branch October 29, 2020 12:20
@mateuszbaran
Copy link
Contributor

I think this change broke the following:
On 1.5.2:

julia> using LinearAlgebra

julia> muladd([2 2; 1 2], [2 1; 1 2], I)
2×2 Array{Int64,2}:
 7  6
 4  6

while currently on master:

julia> using LinearAlgebra

julia> muladd([2 2; 1 2], [2 1; 1 2], I)
ERROR: MethodError: no method matching length(::UniformScaling{Bool})
Closest candidates are:
  length(::Union{Base.KeySet, Base.ValueIterator}) at abstractdict.jl:54
  length(::Union{Adjoint{T, S}, Transpose{T, S}} where S where T) at /home/mateusz/repos/julia/julia/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/adjtrans.jl:194
  length(::IdDict) at iddict.jl:136
  ...
Stacktrace:
 [1] _similar_for(c::UnitRange{Int64}, #unused#::Type{Bool}, itr::UniformScaling{Bool}, #unused#::Base.HasLength)
   @ Base ./array.jl:575
 [2] _collect(cont::UnitRange{Int64}, itr::UniformScaling{Bool}, #unused#::Base.HasEltype, isz::Base.HasLength)
   @ Base ./array.jl:608
 [3] collect(itr::UniformScaling{Bool})
   @ Base ./array.jl:602
 [4] broadcastable(x::UniformScaling{Bool})
   @ Base.Broadcast ./broadcast.jl:682
 [5] broadcasted
   @ ./broadcast.jl:1303 [inlined]
 [6] muladd(A::Matrix{Int64}, B::Matrix{Int64}, z::UniformScaling{Bool})
   @ LinearAlgebra ~/repos/julia/julia/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:214
 [7] top-level scope
   @ REPL[12]:1

julia> versioninfo()
Julia Version 1.6.0-DEV.1384
Commit 4711fc3cdf (2020-10-30 14:50 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4800MQ CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.0 (ORCJIT, haswell)

@dkarrasch
Copy link
Member

Thanks for reporting this! I guess we should have restricted the third argument here to Union{Number,AbstractVecOrMat} then? That would actually make some of the dimension checks unnecessary. Or did you have other types in mind, @mcabbott?

@mcabbott
Copy link
Contributor Author

mcabbott commented Oct 30, 2020

Oops, I was working on the assumption that the sole existing use for muladd was with numbers. I left z untyped just for simplicity but it could be constrained as you suggest.

The error here is from [6 6; 4 5] .+ I rand(2,2) .= I, I'm somewhat surprised that doesn't work. If it did work, then this method would be more efficient than the fallback. Might it be worth fixing?

However, if there are other existing uses of muladd with matrices, then this whole thing may need to tread more carefully. For instance, on master this hits this PR's mul!(similar(... method:

julia> muladd(Diagonal([1,2]), [3,4], [5,6])
2-element SparseArrays.SparseVector{Int64, Int64} with 2 stored entries:
  [1]  =  8
  [2]  =  14

julia> muladd(Diagonal([1,2]), [3,4], [5,6]) # on Julia 1.5
2-element Array{Int64,1}:
  8
 14

@mateuszbaran
Copy link
Contributor

Wouldn't replacing .= with copyto! fix this problem? It might be easier than fixing broadcasting with I.

@dkarrasch
Copy link
Member

Yea, broadcasting in setindex! is #34130, and there's also #23197. Unfortunately, I don't think copyto! would be a fix, because if z is a Number, but the output a vector/matrix, then only the first element would get filled. Thus, we would need many more methods.

@mateuszbaran
Copy link
Contributor

Yes, good point. I don't have any good ideas then 🙂 .

@mcabbott
Copy link
Contributor Author

mcabbott commented Oct 30, 2020

Thanks for the links. I guess this use of I is only possible in one method, and could be explicitly handled, or rejected. Things like muladd(I, [2 2; 1 2], [3 4; 5 6]) still hit the fallback, and can't be made more efficient anyway. (Edit: muladd([1,2]', [3,4], I) is an error too right now.)

But I'm a little worried about all the possibilities with UpperTriangular and friends.

@dkarrasch
Copy link
Member

I was thinking whether there is some way to deduce the type of A*B and have C similar to that to do mul!(C, A, B, ...). I tried a little, but I don't think I found a good solution. At times, even that return type happens to be suboptimal. I don't remember the issue, but there was a difference whether the structured matrix was the left or the right factor, because we usually (always?) take the right factor and make C similar to it.

@mcabbott
Copy link
Contributor Author

mcabbott commented Oct 31, 2020

I think quite a lot of the structured types returned by A*B are destroyed by .+z. There are a few exceptions:

di = Diagonal(1:3)
ut = UpperTriangular(rand(1:9, 3,3))

di * di .+ di isa Diagonal
ut * ut .+ ut isa UpperTriangular
ut * ut .+ di isa UpperTriangular

And it looks like only similar(:: Diagonal, axes...) gives a sparse array:

similar(di, 3,3) isa SparseMatrixCSC
similar(parent(di), 3,3) isa Matrix
similar(ut, 3,3) isa Matrix

So maybe something like this is almost enough?

const UpperUnion = Union{UpperTriangular, UnitUpperTriangular, Diagonal}

function Base.muladd(A::UpperUnion, B::UpperUnion, z::UpperUnion)
    T = promote_type(eltype(A), eltype(B), eltype(z))
    C = similar(parent(B), T, axes(A,1), axes(B,2))
    C .= z
    mul!(C, A, B, true, true)
    UpperTriangular(C)
end
Base.muladd(A::Diagonal, B::Diagonal, z::Diagonal) =
    Diagonal(A.diag .* B.diag .+ z.diag)
Base.muladd(A::Union{Diagonal, UniformScaling}, B::Union{Diagonal, UniformScaling}, z::Union{Diagonal, UniformScaling}) =
    Diagonal(_diag_or_value(A) .* _diag_or_value(B) .+ _diag_or_value(z))
Base.muladd(A::UniformScaling, B::UniformScaling, z::UniformScaling) =
    UniformScaling(A.λ * B.λ + z.λ)

_diag_or_value(A::Diagonal) = A.diag
_diag_or_value(A::UniformScaling) = A.λ

@mateuszbaran
Copy link
Contributor

I think there may be a lot more patterns where A*B is sparse but z is dense. Bidiagonal, Tridiagonal, SymTridiagonal, SparseMatrixCSC, etc. Determining the correct result type of Base.muladd(A::AbstractMatrix{TA}, B::AbstractMatrix{TB}, z) where {TA, TB} would be very complex -- the signature is very broad. Maybe for now you could just implement this muladd only for either A or B being a StridedVecOrMat?

@mcabbott
Copy link
Contributor Author

Sparse matrices appear to be handled correctly, in that similar(sm) gives the same type as the naiive A*y .+ z version:

m = rand(3,3); sm = sparse(m)
v = rand(3); sv = sparse(v);

sm * sm .+ 1 isa SparseMatrixCSC
sm * sm .+ v isa SparseMatrixCSC
sm * sm .+ m isa SparseMatrixCSC

sm * sm + 1 # error
sm * sm + v # error
sm * sm + m isa Matrix

The snippet above should handle dense triangular cases. Acting with similar on Bidiagonal or SymTridiagonal gives a sparse matrix, matching what * does.

Still hoping a little bit that it won't be necessary to go all the way down to StridedArray, but we may end up there.

@mateuszbaran
Copy link
Contributor

I don't quite understand. Right now in Base.muladd(A::AbstractMatrix{TA}, B::AbstractMatrix{TB}, z) where {TA, TB} you allocate the result based on the type of A. But if either B or z is dense, the result should also be dense. How are you going to achieve that?

@mcabbott
Copy link
Contributor Author

Sure, matching the existing choices for every possible trio of types probably can't be done with some simple rule. Besides the sparse .+ dense examples, I don't know why these are the way they are:

SymTridiagonal(m+m') * m isa Matrix
m * SymTridiagonal(m+m') isa SparseMatrixCSC

I hope nobody's relying on that. But more importantly, someone may be relying on this, which ought not to go via mul!:

using StaticArrays
muladd(SA[1 2; 3 4], SA[5,6], SA[7,8]) 

Which argues for making it much more opt-in.

@dkarrasch
Copy link
Member

I don't know why these are the way they are:

They are the way they are because, as I said, we call similar on the right factor, not on the left, see #36551.

@mcabbott
Copy link
Contributor Author

mcabbott commented Oct 31, 2020

Thanks I hadn't seen that discussion. There's "why" as in what method, and "why" as in what's desired... and it sounds like for my failed attempt at an exotic case nobody would want, someone wants the opposite.

@mcabbott
Copy link
Contributor Author

mcabbott commented Oct 31, 2020

Not quite done but this is a branch:

master...mcabbott:mulladd2

This supports triangular & diagonal matrices. It also supports UniformScaling by itself, mixed with Diagonal, and perhaps weirdly as the 3rd argument with two matrices.

@mcabbott
Copy link
Contributor Author

Now at #38250.

c42f pushed a commit that referenced this pull request Nov 20, 2020
This adjusts #37065 to be much more cautious about what arrays it acts on: it calls mul! on StridedArrays, treats a few special types like Diagonal, UpperTriangular, and UniformScaling, and sends anything else to muladd(A,y,z) = A*y .+ z.

However this broadcasting restricts the shape of z, mostly such that A*y .= z would work. That ensures you should get the same error from the mul!(::StridedMatrix, ...) method, as from the fallback broadcasting one. Both allow z of lower dimension than the existing muladd(x,y,z) = x*y+z.

But x*y+z also allows z to have trailing dimensions, as long as they are of size 1. I made the broadcasting method allow these too, which I think should make this non-breaking. (I presume this is rarely used, and thus not worth sending to the fast method.)

Structured matrices such as UpperTriangular should all go to x*y+z. Some combinations could be made more efficient but it gets complicated. Only the case of 3 diagonals is handled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants