add generalized dot product #32739

dkarrasch · 2019-07-30T16:54:14Z

I went ahead and added a generalized dot product dot(x, A, y), as discussed in #16573. I shamelessly copied (and slightly adjusted) @simonbyrne's code for the completely sparse case #16573 (comment) and the dense case #16573 (comment), and added methods for structured matrices and sparse matrices/generic vectors. There are presumably a couple of things to do:

add tests
NEWS entry (I'm not sure, but I guess we would want to announce this feature)
further optimizations (edit: I tried to reduce memory access similar to the corresponding multiplication functions, such that I hope that avoiding allocating the intermediate multiplication result will pay off)

I'm not an expert in @simd and these kind of enhancements, so I'm happy to learn whether one would expect improvements by using such features.

This is written with the option of block matrices in mind, and that should be tested obviously.~~Since we need zero of eltypes for type promotion, how would one do that? We don't have StaticArrays to our disposal.~~

If I'm not mistaken, then this would close #16573. ~~I don't see how one would leverage symmetry of A, or the fact that x === y.~~

stdlib/LinearAlgebra/src/generic.jl

chethega · 2019-07-31T09:49:26Z

I think we also want specializations for Transpose and Adjoint for better memory access patterns.

For example dot(x, A::Adjoint, y) = dot(y, A', x)' and something for dot(x, a::Transpose, y). I think we should write out the code instead of trying to be clever like above, especially since we probably need to write something for the transpose case anyway. I have not thought about possible ambiguity errors yet, so don't trust the above signature.

stdlib/LinearAlgebra/src/generic.jl

stevengj · 2019-07-31T12:50:58Z

I think we also want specializations for Transpose and Adjoint for better memory access patterns.

This wouldn't hurt, but I'm not clear on how useful it would be. For any dot(x, A, y) that defines a true inner product you should have A == A', so there's not much point in doing dot(x, A', y).

dkarrasch · 2019-07-31T13:11:07Z

@stevengj I had true inner products in mind when I had the square assumption earlier, but I guess we are implementing here the machinery for ternary multiplication in general, in which there is no square and no symmetry assumption?

stevengj · 2019-07-31T13:17:14Z

Sure, the dot(x, A, y) function should work for general A. I'm just not sure how much trouble we should take to optimize nonsymmetric cases.

stevengj · 2019-07-31T13:21:20Z

Probably should also have a specialized implementation for UniformScaling so that dot(x, I, y) calls dot(x,y)

stdlib/LinearAlgebra/src/generic.jl

thchr · 2019-07-31T15:18:07Z

Given the proposed function signature, this should probably use eachindex and axes rather than 1:n and size; otherwise, access will in fact not be inbounds if supplied with e.g. an OffsetArray, leading to arbitrary answers/segfaults.
For the general signature, the fix could be something like:

function dot(x::AbstractVector, A::AbstractMatrix, y::AbstractVector)
    (axes(x)..., axes(y)...) == axes(A) || throw(DimensionMismatch())

    T = promote_type(promote_type(eltype(x), eltype(A)), eltype(y))
    s = zero(T)
    @inbounds for j in eachindex(y)
        yj = y[j]
        temp = zero(T)
        @simd for i in eachindex(x)
            temp += adjoint(x[i]) * A[i,j]
        end
        s += temp * yj
    end
    return s
end

(The above also removes some of the early-termination calls; they don't really seem warranted to me)

Irregardless of the above, I noticed that this is not doing better than ordinary x'*A*y, time-wise, for large input. Specifically, if

x′ = rand(N); A′ = rand(N,M); y′ = rand(M)

Then, for N = M = 1000

@btime adjoint($x′)*$A′*$y′  # 193.201 μs (1 allocation: 7.94 KiB)
@btime dot($x′,$A′,$y′)      # 312.699 μs (0 allocations: 0 bytes)

Though it does better for (N,M) less than ~700. Still, there might be room for improvement.

More timings ...

For N = M = 2000:

@btime adjoint($x′)*$A′*$y′  # 1.340 ms (1 allocation: 15.75 KiB)
@btime dot($x′,$A′,$y′)      # 1.615 ms (0 allocations: 0 bytes)

For N = M = 4000

@btime adjoint($x′)*$A′*$y′  # 6.418 ms (2 allocations: 31.33 KiB)
@btime dot($x′,$A′,$y′)      # 7.083 ms (0 allocations: 0 bytes)

stdlib/LinearAlgebra/src/matmul.jl

dkarrasch · 2019-07-31T15:57:06Z

@thchr Thanks for your comments. You're right, the generic one will need careful optimization, because it competes with BLAS multiplications, which has some overhead due to the allocation of the intermediate result. For very large problems, as you show, this does not outweigh the performance advantage of BLAS multiplication. So, we will need to make sure we catch up with the pure Julia multiplication.

dkarrasch · 2019-07-31T16:41:36Z

I got started with some tests for the generic case. More tests will follow, and maybe the structured versions also need a little update. I was including the above comments into the generic version. This is how it looks like now:

function dot(x::AbstractVector, A::AbstractMatrix, y::AbstractVector)
    (axes(x)..., axes(y)...) == axes(A) || throw(DimensionMismatch())
    T = typeof(dot(first(x), first(A), first(y)))
    zeroxA = zero(adjoint(first(x))*first(A))
    s = zero(T)
    @inbounds for j in eachindex(y)
        yj = y[j]
        if !iszero(yj)
            temp = zeroxA
            @simd for i in eachindex(x)
                temp += adjoint(x[i]) * A[i,j]
            end
            s += temp * yj
        end
    end
    return s
end

There is a little issue with this. I wrote it to make it work for both number and matrix eltypes. In the block matrix case, the generic 3-arg dot allocates each time it adds something to temp. I thought I'll add a corresponding version where the eltypes of x, A, and y are Numbers, but this introduces ambiguity. So I could restrict to number types and simplify the type stuff, or need to find a way to distinguish these cases and have temp += adjoint(x[i]) * A[i,j] not allocate in the block matrix case.

…ifolds

dkarrasch · 2019-08-01T20:05:27Z

Tests are passing! I won't have much/any time the upcoming weeks (due to exams, holidays 🏖 etc), so I'd like to suggest to keep this PR at the current level (modulo minor improvements). The tests cover the number eltype cases and pass. I think I wrote the code such that it should work with block matrices/vectors. However, I'd like to keep testing (and potentially required adjustments/fixes) for an independent PR. Compared to my initial commits, I have also taken care of minimizing memory access whenever possible.

Regarding the fully dense version and the comparison to plain multiplication, I'm fine with putting x'A*y as a fallback for now, or a simple size-dependent distinction. The goal here was to introduce dot with the 3-arg signature, to direct *(::Adjoint,::Matrix,::Vector) to it, and to provide implementations for the sparse and structured cases, for which interest was expressed by different people. Optimizing the fully dense version may require thorough investigation and benchmarking, and seems to me like warrant a separate PR.

If there is no hurry, I'm also fine with developing the mentioned features in this PR further as time permits.

stevengj · 2019-08-03T14:21:05Z

I think if the user calls dot(x, A, y), then we should assume that they want an allocation-free algorithm even if it is slightly slower.

thchr · 2019-08-05T20:25:37Z

stdlib/LinearAlgebra/src/generic.jl

+function dot(x::AbstractVector, A::AbstractMatrix, y::AbstractVector)
+ (axes(x)..., axes(y)...) == axes(A) || throw(DimensionMismatch())
+ T = typeof(dot(first(x), first(A), first(y)))
+ zeroxA = zero(adjoint(first(x))*first(A))


Something like

T = promote_op(dot, eltype(x), eltype(A), eltype(y))

and similar for zeroxA might be better since that gets inferred (the current formulation isn't, at least on v1.1.1; appears to incur some array-size testing so far as I can tell from @code_warntype).

Yes, I know. These two lines are written such that the code can handle block matrices/vectors, so eltypes like Vector or Matrix. The issue is that calling one or zero on them fails, because the type does not contain any information on the size. And also, the same "first" approach is taken in the

function dot(x::AbstractArray, y::AbstractArray)

method a few lines above. Of course, this approach still breaks down when we have blocks of varying lengths, so this may not be a good idea either. To handle that, we would need to initialize temp in every loop separately. I'll push a suggestion for how to fix that.

dkarrasch · 2019-08-07T16:18:23Z

The one failing test "wasn't me". So this seems to be in rather good shape, I'd say. Is it realistic to include this into Julia v1.3? I think it would be nice to have that feature (if it is complete), and optimize it later if necessary.

I have consistently interpreted dot(x, A, y) as dot(A'x, y), which in most cases is nothing but dot(x, A*y) anyway, except for non-associative numbers. I felt that the interpretation as dot(A'x, y) really expresses what we're doing: (i) computing some dot, which acts recursively, in contrast to ternary *, (ii) we first multiply x and A, and only then with elements from y, (iii) but makes obvious what the implementation difference must be to dot(A'x, y), i.e., not storing A'x.

I have covered all special matrices (bidiag, tridiags, diagonal, uniformscaling, symmetric/hermitian, triangular), what is missing is (Upper)Hessenberg, if I haven't overlooked anything else. Tests cover the Number eltype cases, the generic version (but no other) is also tested for block matrices. The hermitian/symmetric case is slightly suboptimal in that we go over the diagonal twice too often, but I felt that writing out all 4 cases with :U, :L, x===y and x !== y would lead to "code bloat", as @stevengj would say.

dkarrasch · 2019-08-09T16:14:33Z

I'm close to pulling the plug for a while, so don't be surprised to see no response. UpperHessenberg now also has its own ternary dot method.

I'd be very happy to find the feature in Julia v1.3, of which feature freeze is in one week, or so?

StefanKarpinski · 2019-08-09T16:21:43Z

Correct: feature freeze is slated for Aug 15th.

antoine-levitt · 2019-08-10T09:33:53Z

I think if the user calls dot(x, A, y), then we should assume that they want an allocation-free algorithm even if it is slightly slower.

Not sure this is a correct assumption; If I see that there's a specialized implementation for this, I'm going to assume that it's faster than the standard ones, or people wouldn't have bothered writing a specialized implementation. Especially in the common case where x and y are vectors and A is a dense matrix, the allocation of a temporary vector doesn't matter much. Also BLAS might use threading and therefore be (potentially much) faster. At least if this method has the potential to be slower than dot(x,A*y) it should be documented as such.

stdlib/LinearAlgebra/src/generic.jl

stdlib/LinearAlgebra/src/uniformscaling.jl

dkarrasch · 2019-08-16T19:18:17Z

I have tried to improve/fix a couple of things quickly via the web interface; thanks @stevengj for the review. I have also created a NEWS entry. Unfortunately, there is a little merge conflict which I don’t know how to resolve via the web interface. If someone could rebase, that would be much appreciated.

This should be only relevant to cases like `dot(x, J, y)`, where `x` and `y` are vectors of quaternion vectors, and `J` is a quaternion `UniformScaling`.

StefanKarpinski · 2019-09-05T13:59:23Z

Man, who knew that was going to take so long? Thanks for sticking it out, @dkarrasch!

pkofod · 2019-09-05T14:39:12Z

This request originally came from @cortner's work to add preconditioners to Optim.jl! Can't believe that was in 2016.

dkarrasch · 2019-09-05T16:27:43Z

Well, thanks, and thanks also to the reviewers for their patience and care, especially @stevengj!

ararslan requested a review from andreasnoack July 30, 2019 17:32

ararslan added the domain:linear algebra Linear algebra label Jul 30, 2019

chethega reviewed Jul 30, 2019

View reviewed changes

stdlib/LinearAlgebra/src/generic.jl Outdated Show resolved Hide resolved

chethega reviewed Jul 31, 2019

View reviewed changes

stdlib/LinearAlgebra/src/generic.jl Outdated Show resolved Hide resolved

stevengj reviewed Jul 31, 2019

View reviewed changes

stdlib/LinearAlgebra/src/generic.jl Outdated Show resolved Hide resolved

thchr reviewed Jul 31, 2019

View reviewed changes

stdlib/LinearAlgebra/src/matmul.jl Outdated Show resolved Hide resolved

dkarrasch referenced this pull request in JuliaManifolds/Manifolds.jl Aug 1, 2019

flat!, sharp! and optimized inner of cotangent vectors for metric man…

ea6ef68

…ifolds

StefanKarpinski added the status:forget me not PRs that one wants to make sure aren't forgotten label Aug 2, 2019

StefanKarpinski requested a review from stevengj August 2, 2019 20:40

thchr reviewed Aug 5, 2019

View reviewed changes

dkarrasch changed the title ~~WIP: add generalized dot product~~ add generalized dot product Aug 8, 2019

dkarrasch mentioned this pull request Aug 9, 2019

Add dot method for symmetric matrices #32827

Merged

stevengj reviewed Aug 16, 2019

View reviewed changes

stdlib/LinearAlgebra/src/generic.jl Show resolved Hide resolved

stevengj reviewed Aug 16, 2019

View reviewed changes

stdlib/LinearAlgebra/src/uniformscaling.jl Outdated Show resolved Hide resolved

stevengj added the needs news A NEWS entry is required for this change label Aug 16, 2019

dkarrasch added 19 commits August 31, 2019 18:26

fix complex case in symmetric gendot

791be8b

interpret dot(x, A, y) as dot(A'x, y), test accordingly

5d6bbbb

use correct tolerance in triangular tests

c176bfc

add gendot for UpperHessenberg, and tests

9510dfd

fix docstring of 3-arg dot

a6bbb45

add generic 3-arg dot for UniformScaling

e4668ec

add generic fallback

0932238

This should be only relevant to cases like `dot(x, J, y)`, where `x` and `y` are vectors of quaternion vectors, and `J` is a quaternion `UniformScaling`.

add gendot with middle argument Number

6310fa6

merge NEWS

a66a6a1

attach docstring to generic fallback

fb97cc3

simplify scalar/uniform scaling gendot

c9112fc

use dot(A'x,y) for fallback

cffa4aa

use accessor functions in sparse code, generalize to Abstract..., tests

48bdbc1

revert fallback definition

ff377e6

add compat note and jldoctest

48874e2

remove redundant Number version

5568795

write out loops in symmetric/hermitian case

8051614

test quaternions in uniformscaling gendot

200732c

fix uniformscaling test

d304264

andreasnoack merged commit 2425ae7 into JuliaLang:master Sep 5, 2019

dkarrasch deleted the dk/gendot branch September 5, 2019 16:27

tkf mentioned this pull request Sep 17, 2019

Define private dot function to avoid type-piracy JuliaNLSolvers/Optim.jl#749

Closed

mcabbott mentioned this pull request Dec 4, 2019

Add dot(x,A,y) JuliaLang/Compat.jl#683

Merged

mtfishman mentioned this pull request Mar 31, 2020

Function name changes ITensor/ITensors.jl#245

Closed

simeonschaub removed the status:forget me not PRs that one wants to make sure aren't forgotten label May 29, 2021

JeffFessler mentioned this pull request Sep 19, 2021

Suboptimal dot product speed for last dimension SubArray slicing #42305

Closed

stevengj mentioned this pull request Aug 23, 2022

dot(x,A,x) fails for 1x1 SymTridiagonal #46461

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add generalized dot product #32739

add generalized dot product #32739

dkarrasch commented Jul 30, 2019 •

edited

Loading

chethega commented Jul 31, 2019 •

edited

Loading

stevengj commented Jul 31, 2019 •

edited

Loading

dkarrasch commented Jul 31, 2019

stevengj commented Jul 31, 2019

stevengj commented Jul 31, 2019

thchr commented Jul 31, 2019

dkarrasch commented Jul 31, 2019

dkarrasch commented Jul 31, 2019

dkarrasch commented Aug 1, 2019

stevengj commented Aug 3, 2019

thchr Aug 5, 2019 •

edited

Loading

dkarrasch Aug 6, 2019

dkarrasch commented Aug 7, 2019

dkarrasch commented Aug 9, 2019

StefanKarpinski commented Aug 9, 2019

antoine-levitt commented Aug 10, 2019

dkarrasch commented Aug 16, 2019

StefanKarpinski commented Sep 5, 2019

pkofod commented Sep 5, 2019

dkarrasch commented Sep 5, 2019

add generalized dot product #32739

add generalized dot product #32739

Conversation

dkarrasch commented Jul 30, 2019 • edited Loading

chethega commented Jul 31, 2019 • edited Loading

stevengj commented Jul 31, 2019 • edited Loading

dkarrasch commented Jul 31, 2019

stevengj commented Jul 31, 2019

stevengj commented Jul 31, 2019

thchr commented Jul 31, 2019

dkarrasch commented Jul 31, 2019

dkarrasch commented Jul 31, 2019

dkarrasch commented Aug 1, 2019

stevengj commented Aug 3, 2019

thchr Aug 5, 2019 • edited Loading

Choose a reason for hiding this comment

dkarrasch Aug 6, 2019

Choose a reason for hiding this comment

dkarrasch commented Aug 7, 2019

dkarrasch commented Aug 9, 2019

StefanKarpinski commented Aug 9, 2019

antoine-levitt commented Aug 10, 2019

dkarrasch commented Aug 16, 2019

StefanKarpinski commented Sep 5, 2019

pkofod commented Sep 5, 2019

dkarrasch commented Sep 5, 2019

dkarrasch commented Jul 30, 2019 •

edited

Loading

chethega commented Jul 31, 2019 •

edited

Loading

stevengj commented Jul 31, 2019 •

edited

Loading

thchr Aug 5, 2019 •

edited

Loading