Optimize real matrix * complex matrix by using real matmul #44074
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Currently the real matrix is converted to complex, followed by a complex matmul. This is sub-optimal (given that the imaginary part is all zeros), and we may instead split it as
A * (Br + im * Bi) = (A*Br) + im*(A*Bi)
, where we carry out two matrix multiplications, but both are real. This is what is implemented in this PR.There are two versions that I could come up with, as listed below, but I've currently gone with the allocating but fastest version. Open to changing this, if the allocations seem excessive.
mul2
already gets us some speed-up, which is considerable (albeit with some allocations) if the matrix is non-contiguous.mul3
is significantly faster than both, so I've used it in this PR.