-
Notifications
You must be signed in to change notification settings - Fork 930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : add broadcast support for BLAS ggml_mul_mat() #460
Labels
Comments
ggerganov
added a commit
that referenced
this issue
Aug 21, 2023
ggerganov
added a commit
that referenced
this issue
Aug 21, 2023
…t BLAS mul_mat (#466) * Improve ADD_REL_POS perf in SAM by doing it inplace - Add unit tests for the ADD_REL_POS operation - I am not sure if this is valid implementation as we reuse the src0 memory in order to avoid copying it - When running SAM with the "Example output" command, image, point and 16 threads, this reduces the cumulative time of the ADD_REL_POS operation from 1000-1100 ms to 180-200ms - There is further room for optimization in the access patterns used in the implementation of the opration * Add non-inplace version for the GGML_OP_ADD_REL_POS * Fix map_unary warnings and refactor LayerNorm2d + remove ggml_cont in it * Fix Mac printf format warnings * sam : add ggml_graph_print() comment * ggml : add broadcast support for BLAS ggml_mul_mat() (#460) * Remove not needed build_forward_expand from add-rel-pos unit test --------- Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
For F16 and F32 CPU matrix multiplications BLAS will likely remain necessary, especially on Apple Silicon where we have Accelerate CBLAS using the AMX coprocessor. So we need to resolve the following TODOs:
ggml/src/ggml.c
Lines 10796 to 10799 in 08c57df
ggml/src/ggml.c
Lines 10782 to 10785 in 08c57df
This will fix SAM inference on Apple Silicon: #459
The broadcast logic is implemented in the same function for the non-BLAS branch:
ggml/src/ggml.c
Lines 10905 to 10909 in 08c57df
Should be simple enough to adapt, but need to make sure it works correctly. Probably the best way to test it is to use the SAM inference example, which works without BLAS but asserts with BLAS.
The text was updated successfully, but these errors were encountered: