Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make matmulsm mergeable (Fixes #1407) #1417

Merged

Conversation

vincent-ehrmanntraut
Copy link

As promised in #1407 I spent some time on making MATMULSM mergeable.
For most protocols, the new implementation will merge as much communication as possible while respecting the batch size.

In order for the allocator to be able to determine the memory addresses accessed by a MATMUSLM, I needed to add some temporary fields to the instruction that contain the used rows and columns. This is somewhat hacky, but it's the cleanest solution I found.
If those fields are not present, the allocator will fall back to the previous logic where the instructions is made to be dependent of all previous memory writes.

Further, the Hemi protocol's implementation will correctly calculate the matrix products, but will do so sequentially as that would require to somehow merge the communication of multiple matrix preps.

@CLAassistant
Copy link

CLAassistant commented Jun 11, 2024

CLA assistant check
All committers have signed the CLA.

@vincent-ehrmanntraut
Copy link
Author

Is there anything preventing this pull request from being merged? If so, is there something I can do to facilitate the merge?

@mkskeller
Copy link
Member

Thank you for submitting this. I can see several issues with the Python part:

  1. Multithreading causes issues. For example, the following doesn't compile:
M = sint.Matrix(10, 10)
M.dot(M, n_threads=2)
  1. Using the indices argument causes issues as well:
M = sint.Matrix(10, 10)
M.direct_mul(M, indices=[regint(0), regint.inc(10), regint.inc(10),
                         regint(0)])
  1. The pull request stops working with Python 3.8. I would rather not bump the requirement unless there's a good reason to do so.

@vincent-ehrmanntraut
Copy link
Author

You should find that the recent commit fixes the mentioned issues.

@mkskeller
Copy link
Member

mkskeller commented Jun 25, 2024

Thank you. Unfortunately, I've found another issue, namely that the following program doesn't compile in reasonable time:

sint.Matrix(1000, 1000) * sint.Matrix(1000, 1000)

Prior to the change, Scripts/compile-run.py mascot test -- -F takes less than 15 minutes to run.

This is probably related to the changes in allocator.py, which seem to go beyond the basic merging. What is the underlying thinking there? There is no complete dependency logic for memory-related instruction because that would be inefficient, which didn't create any issues with the non-mergeable matrix multiplication.

@vincent-ehrmanntraut
Copy link
Author

vincent-ehrmanntraut commented Jun 25, 2024

I indeed added a complete dependency logic for the matrix multiplication, as the previous logic seemed to add dependencies to all previous writes, making merges impossible.

However, I noticed that the first approach was very naïve, leading to significant redundant computations. I've optimized it a little bit, and added a timeout such that building the dependencies is aborted after ten seconds (for each matrix multiplication) and the "order of memory instructions" warning is printed (if it was not printed before).
This timeout is reached when multiplying two 1700x1700 matrices on my machine, 1000x1000 matrices only required 3 seconds. I think that this should be reasonable for most cases.

@mkskeller
Copy link
Member

Thank you for this. However, the timeout is problematic because it might produce different bytecode on different machines. The idea is that compiling the same high-level code proceeds exactly the same way every time so that different parties can compile it independently. Can you solve this in another way?

@vincent-ehrmanntraut
Copy link
Author

That is a good point, I changed the allocator such that the threshold depends on the size of the matrices, not the compilation time.

@mkskeller mkskeller merged commit 41999a3 into data61:master Jul 2, 2024
2 checks passed
@mkskeller
Copy link
Member

Thank you

@vincent-ehrmanntraut vincent-ehrmanntraut deleted the mergeable-matmulsm-squashed branch July 2, 2024 06:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants