-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding inplace multiplication for (unit)triangular matrices #36972
Conversation
reverting an accidental doc change reverting an accidental doc change removing old version of function
ef4b949
to
88a7b7c
Compare
Gentle bump |
I was inspired by @bjack205 's JuliaCon talk (sorry for the random mention), so I changed a slice to a view to reduce allocations. It now allocates half has much as using a slice. Making the code view(C, :, i) = A*view(B, :, i) seems to still allocate as much as the slicing. Hopefully someone with more experience with views will see this and explain. |
I was able to reduce it to only 1 allocation using a preallocated vector (a la https://discourse.julialang.org/t/inplace-multiplication-by-a-square-matrix/1702/4), but in benchmarking I found the original version + For the original change: julia> @btime mul!(x, A, A)
10.779 ms (1000 allocations: 3.97 MiB) For the first view change: julia> @btime mul!(x, A, A)
11.684 ms (1000 allocations: 3.97 MiB) For the preallocated view change: julia> @btime mul!(x, A, A)
49.991 ms (1 allocation: 4.06 KiB) For the julia> @btime mul!(x, A, A)
10.594 ms (500 allocations: 1.98 MiB) |
Sorry, I am using this thread as my personal notebook. I can get it slightly faster (seems to scale at about 10% faster from my benchmarks) and with zero allocations using: julia> function mul!(C::UpperTriangular, A::UpperTriangular, B::UpperTriangular)
m, n = size(B, 1), size(B, 2)
if m != size(A, 1)
throw(DimensionMismatch("right hand side B needs first dimension of size $(size(A,1)), has size $m"))
end
if C === A || C === B
throw(ArgumentError("output matrix must not be aliased with input matrix"))
end
@views for i in 1:n
mul!(parent(C)[:, i], A, B[:, i])
end
return C
end Current version (slicing): julia> @btime mul!(x, A, A)
11.551 ms (1000 allocations: 3.97 MiB) The above version: julia> @btime mul!(x, A, A)
9.076 ms (0 allocations: 0 bytes) But this strikes me as hacky and highly dependant on triangular matrices being backed by a dense matrix and ERROR: MethodError: no method matching lmul!(::UpperTriangular{Float64,Matrix{Float64}}, ::SubArray{Float64,1,UpperTriangular{Float64,Matrix{Float64}},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},false}) which stems from julia/stdlib/LinearAlgebra/src/triangular.jl Line 699 in 41e603e
This can probably be fixed, but seems like more work than what should be done here. Pinging @dkarrasch since you seem to be active in this area. Any thoughts? |
The docstring of It seems like it would not be too hard to bring this PR inline with that. |
Not ready for review anymore. Adding the 5-argument method causes a big slowdown due to allocations. I am working on it now. |
@views for i in 1:n | ||
C[:, i] = alpha*A*B[:, i] + beta*C[:, i] | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
@views for i in 1:n | |
C[:, i] = alpha*A*B[:, i] + beta*C[:, i] | |
end | |
for (Bi, Ci) in zip(eachcol(B), eachcol(C)) | |
mul!(Ci, A, Bi, alpha, beta) | |
end |
eachcol
, zip
, and 5-arg mul!
should all be essentially allocation free.
@testset "inplace mul of appropriate types should preserve triagular structure" begin | ||
for elty1 in (Float32, Float64, ComplexF32, ComplexF64, Int) | ||
for elty2 in (Float32, Float64, ComplexF32, ComplexF64, Int) | ||
A = UpperTriangular(rand(elty1, 5, 5)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rand(Int,...)
may yield very large integers, which may overflow upon multiplication and addition.
function lmul!(A::UpperTriangular, B::UpperTriangular)
lmul!(A, parent(B))
return B
end
EDIT: I forgot to erase the hidden terms, so one would have to run |
Ok, so I did some more testing.
@eval mul!(C::$cty, A::$aty, B::$bty) = (lmul!(A, copyto!(parent(C), B)); C) because this has then the chance to call BLAS triangular multiplication. That (a) solves the issue, and (b) is faster than |
Sorry, @mcognetta, I couldn't withstand pushing some changes and simplifying the tests. They were pretty heavy. Now, the tests also check for the correct output type. |
Bump? |
@dkarrasch sorry for the delay. Thanks for your changes. It seems they are substantially faster than what I tried earlier. I don't think there is much else to do? |
Added a few tests for the 5-arg version. |
There is the gc-failure, which I'm afraid of ignoring. I thought rebasing onto current master should help. |
Since this is changing 3 arg |
There are two strange errors for which tests pass, though, and two clearly unrelated test failures that are known and tracked. I suggest to finally merge this. |
Shall we close this since it's merged? |
Wait, no, it's not merged yet. I was seeing if anyone raised objections. |
This PR closes #36828.
Adding in place
mul!
for(ε|Unit)(Upper|Lower)Triangular
matrices.Before, something like
would fail with a method error, while
would work but destroy the underlying triangular structure.
Now,
works and maintains the triangular structure.