Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : add broadcast support for BLAS ggml_mul_mat() #460

Closed
ggerganov opened this issue Aug 19, 2023 · 0 comments · Fixed by #466
Closed

ggml : add broadcast support for BLAS ggml_mul_mat() #460

ggerganov opened this issue Aug 19, 2023 · 0 comments · Fixed by #466
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@ggerganov
Copy link
Owner

ggerganov commented Aug 19, 2023

For F16 and F32 CPU matrix multiplications BLAS will likely remain necessary, especially on Apple Silicon where we have Accelerate CBLAS using the AMX coprocessor. So we need to resolve the following TODOs:

ggml/src/ggml.c

Lines 10796 to 10799 in 08c57df

// TODO: handle case when src0 is broadcast-able into src1 across 2nd,3rd dimension
// ref: https://github.com/ggerganov/ggml/pull/224
GGML_ASSERT(ne02 == ne12);
GGML_ASSERT(ne03 == ne13);

ggml/src/ggml.c

Lines 10782 to 10785 in 08c57df

// TODO: handle case when src0 is broadcast-able into src1 across 2nd,3rd dimension
// ref: https://github.com/ggerganov/ggml/pull/224
GGML_ASSERT(ne02 == ne12);
GGML_ASSERT(ne03 == ne13);

This will fix SAM inference on Apple Silicon: #459

The broadcast logic is implemented in the same function for the non-BLAS branch:

ggml/src/ggml.c

Lines 10905 to 10909 in 08c57df

// broadcast factors
const int64_t r2 = ne12/ne02;
const int64_t r3 = ne13/ne03;

Should be simple enough to adapt, but need to make sure it works correctly. Probably the best way to test it is to use the SAM inference example, which works without BLAS but asserts with BLAS.

@ggerganov ggerganov added enhancement New feature or request good first issue Good for newcomers labels Aug 19, 2023
ggerganov added a commit that referenced this issue Aug 21, 2023
…t BLAS mul_mat (#466)

* Improve ADD_REL_POS perf in SAM by doing it inplace

- Add unit tests for the ADD_REL_POS operation
- I am not sure if this is valid implementation as we reuse the src0
  memory in order to avoid copying it
- When running SAM with the "Example output" command, image, point and
  16 threads, this reduces the cumulative time of the ADD_REL_POS operation
  from 1000-1100 ms to 180-200ms
- There is further room for optimization in the access patterns used in
  the implementation of the opration

* Add non-inplace version for the GGML_OP_ADD_REL_POS

* Fix map_unary warnings and refactor LayerNorm2d + remove ggml_cont in it

* Fix Mac printf format warnings

* sam : add ggml_graph_print() comment

* ggml : add broadcast support for BLAS ggml_mul_mat() (#460)

* Remove not needed build_forward_expand from add-rel-pos unit test

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant