Adding inplace multiplication for (unit)triangular matrices #36972

mcognetta · 2020-08-08T07:52:28Z

This PR closes #36828.

Adding in place mul! for (ε|Unit)(Upper|Lower)Triangular matrices.

Before, something like

julia> x = UpperTriangular(rand(3, 3))
3×3 UpperTriangular{Float64,Matrix{Float64}}:
 0.896774  0.0032823  0.133023
  ⋅        0.842607   0.414514
  ⋅         ⋅         0.98005

julia> A = UpperTriangular(rand(3, 3))
3×3 UpperTriangular{Float64,Matrix{Float64}}:
 0.42575  0.567843  0.19379
  ⋅       0.137596  0.636028
  ⋅        ⋅        0.959794

julia> mul!(x, A, A)

would fail with a method error, while

julia> mul!(rand(3, 3), A, A)

would work but destroy the underlying triangular structure.

Now,

julia> mul!(x, A, A)

works and maintains the triangular structure.

reverting an accidental doc change reverting an accidental doc change removing old version of function

mcognetta · 2020-08-20T21:40:25Z

Gentle bump

mcognetta · 2020-08-31T02:58:46Z

I was inspired by @bjack205 's JuliaCon talk (sorry for the random mention), so I changed a slice to a view to reduce allocations. It now allocates half has much as using a slice. Making the code

        view(C, :, i) = A*view(B, :, i)

seems to still allocate as much as the slicing. Hopefully someone with more experience with views will see this and explain.

mcognetta · 2020-08-31T04:27:22Z

I was able to reduce it to only 1 allocation using a preallocated vector (a la https://discourse.julialang.org/t/inplace-multiplication-by-a-square-matrix/1702/4), but in benchmarking I found the original version + @views to be the most performant.

For the original change:

julia> @btime mul!(x, A, A)
  10.779 ms (1000 allocations: 3.97 MiB)

For the first view change:

julia> @btime mul!(x, A, A)
  11.684 ms (1000 allocations: 3.97 MiB)

For the preallocated view change:

julia> @btime mul!(x, A, A)
  49.991 ms (1 allocation: 4.06 KiB)

For the @views change:

julia> @btime mul!(x, A, A)
  10.594 ms (500 allocations: 1.98 MiB)

mcognetta · 2020-08-31T05:32:43Z

Sorry, I am using this thread as my personal notebook. I can get it slightly faster (seems to scale at about 10% faster from my benchmarks) and with zero allocations using:

julia> function mul!(C::UpperTriangular, A::UpperTriangular, B::UpperTriangular)
           m, n = size(B, 1), size(B, 2)
           if m != size(A, 1)
               throw(DimensionMismatch("right hand side B needs first dimension of size $(size(A,1)), has size $m"))
           end
           if C === A || C === B
               throw(ArgumentError("output matrix must not be aliased with input matrix"))
           end
           @views for i in 1:n
               mul!(parent(C)[:, i], A, B[:, i])
           end
           return C
       end

Current version (slicing):

julia> @btime mul!(x, A, A)
  11.551 ms (1000 allocations: 3.97 MiB)

The above version:

julia> @btime mul!(x, A, A)
  9.076 ms (0 allocations: 0 bytes)

But this strikes me as hacky and highly dependant on triangular matrices being backed by a dense matrix and parent being an O(1) time operation. Without parent(C) I get a method error:

ERROR: MethodError: no method matching lmul!(::UpperTriangular{Float64,Matrix{Float64}}, ::SubArray{Float64,1,UpperTriangular{Float64,Matrix{Float64}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},false})

which stems from

julia/stdlib/LinearAlgebra/src/triangular.jl

Line 699 in 41e603e

 mul!(C::AbstractVector , A::AbstractTriangular, B::AbstractVector) = lmul!(A, copyto!(C, B)) 

This can probably be fixed, but seems like more work than what should be done here.

Pinging @dkarrasch since you seem to be active in this area. Any thoughts?

oxinabox · 2020-10-11T22:02:14Z

The docstring of mul! says that normally one should extend 5 arg mul!, and 3 arg mul! will redispatch to that.

It seems like it would not be too hard to bring this PR inline with that.

mcognetta · 2020-10-12T02:14:26Z

Not ready for review anymore. Adding the 5-argument method causes a big slowdown due to allocations. I am working on it now.

dkarrasch · 2020-10-12T07:45:16Z

stdlib/LinearAlgebra/src/triangular.jl

+ @views for i in 1:n
+ C[:, i] = alpha*A*B[:, i] + beta*C[:, i]
+ end


What about

Suggested change

@views for i in 1:n

C[:, i] = alpha*A*B[:, i] + beta*C[:, i]

end

for (Bi, Ci) in zip(eachcol(B), eachcol(C))

mul!(Ci, A, Bi, alpha, beta)

end

eachcol, zip, and 5-arg mul! should all be essentially allocation free.

stdlib/LinearAlgebra/src/triangular.jl

dkarrasch · 2020-10-12T07:47:52Z

stdlib/LinearAlgebra/test/triangular.jl

+@testset "inplace mul of appropriate types should preserve triagular structure" begin
+ for elty1 in (Float32, Float64, ComplexF32, ComplexF64, Int)
+ for elty2 in (Float32, Float64, ComplexF32, ComplexF64, Int)
+ A = UpperTriangular(rand(elty1, 5, 5))


rand(Int,...) may yield very large integers, which may overflow upon multiplication and addition.

dkarrasch · 2020-10-12T08:48:37Z

~~For 3-arg mul!, there is another option to fix this:~~

function lmul!(A::UpperTriangular, B::UpperTriangular)
    lmul!(A, parent(B))
    return B
end

~~etc. This overwrites also the hidden part of parent(B), but hey, it's a mutating function.~~

EDIT: I forgot to erase the hidden terms, so one would have to run tril! or triu! on parent(B) first.

dkarrasch · 2020-10-12T14:13:49Z

Ok, so I did some more testing.

So my eachcol proposal did improve the performance significantly, but implementing the 5-arg mul! columnwise is not a good idea. For both small and large matrices, this is outperformed by the currently called generic_matmat_mul!. I'd suggest to simply remove those methods.
For the 3-arg mul!, I strongly suggest to do (in the big loop)

@eval mul!(C::$cty, A::$aty, B::$bty) = (lmul!(A, copyto!(parent(C), B)); C)

because this has then the chance to call BLAS triangular multiplication. That (a) solves the issue, and (b) is faster than A*B, because you avoid the allocation of the array behind C.

dkarrasch · 2020-10-16T12:36:07Z

Sorry, @mcognetta, I couldn't withstand pushing some changes and simplifying the tests. They were pretty heavy. Now, the tests also check for the correct output type.

dkarrasch · 2020-10-29T11:29:51Z

Bump?

mcognetta · 2020-10-30T01:05:46Z

@dkarrasch sorry for the delay. Thanks for your changes. It seems they are substantially faster than what I tried earlier. I don't think there is much else to do?

mcognetta · 2020-10-30T01:57:42Z

Added a few tests for the 5-arg version.

dkarrasch · 2020-10-30T06:02:30Z

There is the gc-failure, which I'm afraid of ignoring. I thought rebasing onto current master should help.

oxinabox · 2020-11-02T14:07:34Z

Since this is changing 3 arg mul! and not 5 arg mul! do we need to add a condition into the 5-arg mul! to redispatch mul!(C, A, B, false, true) to use it?

dkarrasch · 2020-11-16T15:19:27Z

There are two strange errors for which tests pass, though, and two clearly unrelated test failures that are known and tracked. I suggest to finally merge this.

mcognetta · 2020-11-16T15:27:39Z

Shall we close this since it's merged?

dkarrasch · 2020-11-16T15:29:11Z

Wait, no, it's not merged yet. I was seeing if anyone raised objections.

adding inplace multiplication for (unit)triangular matrices

88a7b7c

reverting an accidental doc change reverting an accidental doc change removing old version of function

mcognetta force-pushed the inplace_triangular_mul branch from ef4b949 to 88a7b7c Compare August 8, 2020 07:55

change slice to view to reduce allocations by half

7c4150f

change to @views

3297a53

changing to 5-arg version of mul!

5eb923b

dkarrasch reviewed Oct 12, 2020

View reviewed changes

dkarrasch added the domain:linear algebra Linear algebra label Oct 16, 2020

rm 5-arg mul!, simplify tests, test result type

9a50dbf

expand function definition

abde4fe

dkarrasch mentioned this pull request Oct 16, 2020

Preserve structure under unit[upper/lower]triangular multiplication #38058

Merged

mcognetta mentioned this pull request Oct 26, 2020

mul! on 3 Triangular Matrixes errors #38185

Closed

5 arg tests

089173a

Merge branch 'master' into inplace_triangular_mul

859739e

oxinabox approved these changes Nov 16, 2020

View reviewed changes

dkarrasch merged commit ef1b6d3 into JuliaLang:master Nov 17, 2020

mcognetta deleted the inplace_triangular_mul branch November 17, 2020 21:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding inplace multiplication for (unit)triangular matrices #36972

Adding inplace multiplication for (unit)triangular matrices #36972

mcognetta commented Aug 8, 2020 •

edited by dkarrasch

Loading

mcognetta commented Aug 20, 2020

mcognetta commented Aug 31, 2020

mcognetta commented Aug 31, 2020 •

edited

Loading

mcognetta commented Aug 31, 2020 •

edited

Loading

oxinabox commented Oct 11, 2020

mcognetta commented Oct 12, 2020

dkarrasch Oct 12, 2020

dkarrasch Oct 12, 2020

dkarrasch commented Oct 12, 2020 •

edited

Loading

dkarrasch commented Oct 12, 2020

dkarrasch commented Oct 16, 2020

dkarrasch commented Oct 29, 2020

mcognetta commented Oct 30, 2020

mcognetta commented Oct 30, 2020

dkarrasch commented Oct 30, 2020

oxinabox commented Nov 2, 2020

dkarrasch commented Nov 16, 2020

mcognetta commented Nov 16, 2020

dkarrasch commented Nov 16, 2020

Adding inplace multiplication for (unit)triangular matrices #36972

Adding inplace multiplication for (unit)triangular matrices #36972

Conversation

mcognetta commented Aug 8, 2020 • edited by dkarrasch Loading

mcognetta commented Aug 20, 2020

mcognetta commented Aug 31, 2020

mcognetta commented Aug 31, 2020 • edited Loading

mcognetta commented Aug 31, 2020 • edited Loading

oxinabox commented Oct 11, 2020

mcognetta commented Oct 12, 2020

dkarrasch Oct 12, 2020

Choose a reason for hiding this comment

dkarrasch Oct 12, 2020

Choose a reason for hiding this comment

dkarrasch commented Oct 12, 2020 • edited Loading

dkarrasch commented Oct 12, 2020

dkarrasch commented Oct 16, 2020

dkarrasch commented Oct 29, 2020

mcognetta commented Oct 30, 2020

mcognetta commented Oct 30, 2020

dkarrasch commented Oct 30, 2020

oxinabox commented Nov 2, 2020

dkarrasch commented Nov 16, 2020

mcognetta commented Nov 16, 2020

dkarrasch commented Nov 16, 2020

mcognetta commented Aug 8, 2020 •

edited by dkarrasch

Loading

mcognetta commented Aug 31, 2020 •

edited

Loading

mcognetta commented Aug 31, 2020 •

edited

Loading

dkarrasch commented Oct 12, 2020 •

edited

Loading