Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured matrix multiplication #814

Merged
merged 28 commits into from
Sep 24, 2020

Conversation

mateuszbaran
Copy link
Collaborator

In this PR I'm expanding the support for multiplication of structured, statically sized matrices. This should improve usability of StaticArrays. It fixes (partially) #790 . Together with allocation changes in 1.5 this makes some generic code in Manifolds.jl non-allocating.

@mateuszbaran
Copy link
Collaborator Author

It should be possible to switch matrix multiplication methods from triangular.jl to this approach. What do you think?

@mateuszbaran mateuszbaran linked an issue Aug 12, 2020 that may be closed by this pull request
@mschauer
Copy link
Collaborator

Ref: I still have https://github.com/mschauer/StaticLinearMaps.jl/blob/master/src/StaticLinearMaps.jl lying around, just to demonstrate that sometimes you want to think of static matrices as static linear maps and leave the storage/details to the implementation

@mateuszbaran
Copy link
Collaborator Author

Ref: I still have https://github.com/mschauer/StaticLinearMaps.jl/blob/master/src/StaticLinearMaps.jl lying around, just to demonstrate that sometimes you want to think of static matrices as static linear maps and leave the storage/details to the implementation

That's interesting but this PR is not that ambitious 🙂 .

@mateuszbaran
Copy link
Collaborator Author

A few notes:

  1. Triangular matrix multiplication on master is always fully unrolled.
  2. The approach I'm using here for code generation only works with full unrolling as far as I can tell, so it will only be used for small matrices.
  3. For larger matrices I'm going to retain the old behavior, that is chunked or looped multiplication for non-wrapped matrices and falling back to LinearAlgebra for larger ones. Do we wan't to convert back to a static matrix after going through LinearAlgebra?
  4. In-place multiplication needs a bit of work.

@mateuszbaran
Copy link
Collaborator Author

I've made a few updates and here are some benchmarks: https://gist.github.com/mateuszbaran/ff63993fb4f66a9089dcf324139de6ac

This excludes many cases where master just throws an error. There are a few small regressions that I will investigate but overall it is a huge improvement.

@mateuszbaran
Copy link
Collaborator Author

These performance regressions are most likely random, I've check the generated code for one of the worst regressions and it's the same in this PR and in master.

@mateuszbaran mateuszbaran marked this pull request as ready for review August 19, 2020 11:18
@mateuszbaran
Copy link
Collaborator Author

Here are some new benchmarks after incorporating changes from #818 (thanks @chriselrod !): https://gist.github.com/mateuszbaran/ff63993fb4f66a9089dcf324139de6ac . I've also tried looking for better heuristics for method selection but there are no obviously better choices as far as I can tell: https://gist.github.com/mateuszbaran/e62ba317690b25270b2c8bc1ef307d6b

@c42f
Copy link
Member

c42f commented Sep 16, 2020

Do we wan't to convert back to a static matrix after going through LinearAlgebra?

Yes, I think the rule should be that we preserve the static array type where possible. I think it makes composite array operations more predictable overall, even though there's a conversion cost.

Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like an epic contribution! And it deletes almost as much code as it adds which always great to see. I'm very glad to see the triangular multiplication special cases gone.

To check, does this incorporate all of #818?

I think at this point you and @chriselrod are the experts on this, so I think you should merge it when you're both ready.

Unless, that is, you want to direct my attention to any particular part of the implementation? I had a quick look through at a high level.

TSize(s::Number) = TSize(Size(s))
istranpose(::TSize{<:Any,T}) where T = T
istranspose(::TSize{<:Any,T}) where T = (T === :transpose)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha 👍

@mateuszbaran
Copy link
Collaborator Author

This seems like an epic contribution! And it deletes almost as much code as it adds which always great to see. I'm very glad to see the triangular multiplication special cases gone.

Thanks! I would like to go through those triangular multiplication special cases again because the old versions are sometimes faster and I'm not yet sure why.

To check, does this incorporate all of #818?

Not all of it, just the muladd part. On the other hand the reordering thing has smaller impact: #818 (comment) , and I don't know how would reordering impact performance for other types than Float64.

I think at this point you and @chriselrod are the experts on this, so I think you should merge it when you're both ready.

Unless, that is, you want to direct my attention to any particular part of the implementation? I had a quick look through at a high level.

I would like to go through these changes once more before it's merged but I don't have any particular parts that would need your attention.

@mateuszbaran
Copy link
Collaborator Author

mateuszbaran commented Sep 23, 2020

Performance regressions are very few, and I don't see a consistent pattern in them. They look more or less random and not that major (up to about 50% slower on Skylake). I will merge this tomorrow if no one objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Uncallable method of _mul
4 participants