Make matmulsm mergeable (Fixes #1407) #1417

vincent-ehrmanntraut · 2024-06-11T07:26:52Z

As promised in #1407 I spent some time on making MATMULSM mergeable.
For most protocols, the new implementation will merge as much communication as possible while respecting the batch size.

In order for the allocator to be able to determine the memory addresses accessed by a MATMUSLM, I needed to add some temporary fields to the instruction that contain the used rows and columns. This is somewhat hacky, but it's the cleanest solution I found.
If those fields are not present, the allocator will fall back to the previous logic where the instructions is made to be dependent of all previous memory writes.

Further, the Hemi protocol's implementation will correctly calculate the matrix products, but will do so sequentially as that would require to somehow merge the communication of multiple matrix preps.

CLAassistant · 2024-06-11T07:26:59Z

All committers have signed the CLA.

vincent-ehrmanntraut · 2024-06-17T08:09:33Z

Is there anything preventing this pull request from being merged? If so, is there something I can do to facilitate the merge?

mkskeller · 2024-06-19T05:05:11Z

Thank you for submitting this. I can see several issues with the Python part:

Multithreading causes issues. For example, the following doesn't compile:

M = sint.Matrix(10, 10)
M.dot(M, n_threads=2)

Using the indices argument causes issues as well:

M = sint.Matrix(10, 10)
M.direct_mul(M, indices=[regint(0), regint.inc(10), regint.inc(10),
                         regint(0)])

The pull request stops working with Python 3.8. I would rather not bump the requirement unless there's a good reason to do so.

…hon 3.8

vincent-ehrmanntraut · 2024-06-24T12:01:05Z

You should find that the recent commit fixes the mentioned issues.

mkskeller · 2024-06-25T02:15:33Z

Thank you. Unfortunately, I've found another issue, namely that the following program doesn't compile in reasonable time:

sint.Matrix(1000, 1000) * sint.Matrix(1000, 1000)

Prior to the change, Scripts/compile-run.py mascot test -- -F takes less than 15 minutes to run.

This is probably related to the changes in allocator.py, which seem to go beyond the basic merging. What is the underlying thinking there? There is no complete dependency logic for memory-related instruction because that would be inefficient, which didn't create any issues with the non-mergeable matrix multiplication.

vincent-ehrmanntraut · 2024-06-25T07:20:47Z

I indeed added a complete dependency logic for the matrix multiplication, as the previous logic seemed to add dependencies to all previous writes, making merges impossible.

However, I noticed that the first approach was very naïve, leading to significant redundant computations. I've optimized it a little bit, and added a timeout such that building the dependencies is aborted after ten seconds (for each matrix multiplication) and the "order of memory instructions" warning is printed (if it was not printed before).
This timeout is reached when multiplying two 1700x1700 matrices on my machine, 1000x1000 matrices only required 3 seconds. I think that this should be reasonable for most cases.

mkskeller · 2024-06-28T01:15:33Z

Thank you for this. However, the timeout is problematic because it might produce different bytecode on different machines. The idea is that compiling the same high-level code proceeds exactly the same way every time so that different parties can compile it independently. Can you solve this in another way?

…reshold

vincent-ehrmanntraut · 2024-06-28T07:48:07Z

That is a good point, I changed the allocator such that the threshold depends on the size of the matrices, not the compilation time.

mkskeller · 2024-07-02T03:13:51Z

Thank you

Make MATMULSM mergeable

4d7b8a0

Fix multithreaded MATMULSM, direct_mul. Should also now work with Pyt…

a490c39

…hon 3.8

Optimize allocator, add timeout for building the dependency graph

aa2441c

Change MATMULSM allocator timeout from a time threshould to a size th…

c745269

…reshold

mkskeller merged commit 41999a3 into data61:master Jul 2, 2024
2 checks passed

vincent-ehrmanntraut deleted the mergeable-matmulsm-squashed branch July 2, 2024 06:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make matmulsm mergeable (Fixes #1407) #1417

Make matmulsm mergeable (Fixes #1407) #1417

vincent-ehrmanntraut commented Jun 11, 2024

CLAassistant commented Jun 11, 2024 •

edited

Loading

vincent-ehrmanntraut commented Jun 17, 2024

mkskeller commented Jun 19, 2024

vincent-ehrmanntraut commented Jun 24, 2024

mkskeller commented Jun 25, 2024 •

edited

Loading

vincent-ehrmanntraut commented Jun 25, 2024 •

edited

Loading

mkskeller commented Jun 28, 2024

vincent-ehrmanntraut commented Jun 28, 2024

mkskeller commented Jul 2, 2024

Make matmulsm mergeable (Fixes #1407) #1417

Make matmulsm mergeable (Fixes #1407) #1417

Conversation

vincent-ehrmanntraut commented Jun 11, 2024

CLAassistant commented Jun 11, 2024 • edited Loading

vincent-ehrmanntraut commented Jun 17, 2024

mkskeller commented Jun 19, 2024

vincent-ehrmanntraut commented Jun 24, 2024

mkskeller commented Jun 25, 2024 • edited Loading

vincent-ehrmanntraut commented Jun 25, 2024 • edited Loading

mkskeller commented Jun 28, 2024

vincent-ehrmanntraut commented Jun 28, 2024

mkskeller commented Jul 2, 2024

CLAassistant commented Jun 11, 2024 •

edited

Loading

mkskeller commented Jun 25, 2024 •

edited

Loading

vincent-ehrmanntraut commented Jun 25, 2024 •

edited

Loading