Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ADD_REL_POS perf in SAM by doing it inplace #466

Merged
merged 7 commits into from
Aug 21, 2023

Conversation

YavorGIvanov
Copy link
Collaborator

@YavorGIvanov YavorGIvanov commented Aug 20, 2023

  • Add unit test for the ADD_REL_POS operation
  • I am not sure if this is valid implementation as we reuse the src0 memory in order to avoid copying it. Alternatively we can do CPY in the model implementation or add_rel_pos with just the rel_w and rel_h tensors and then ggml_add between the result and KQ_scaled
  • When running SAM with the "Example output" command, image, point and 16 threads, this reduces the cumulative time of the ADD_REL_POS operation from 1000-1100 ms to 180-200ms (this is on AMD Ryzen Threadripper 2990WX 32-Core Processor)
  • There is further room for optimization in the access patterns used in the implementation of the operation
  • Resolve ggml : add broadcast support for BLAS ggml_mul_mat() #460

YavorGIvanov added 3 commits August 20, 2023 18:34
- Add unit tests for the ADD_REL_POS operation
- I am not sure if this is valid implementation as we reuse the src0
  memory in order to avoid copying it
- When running SAM with the "Example output" command, image, point and
  16 threads, this reduces the cumulative time of the ADD_REL_POS operation
  from 1000-1100 ms to 180-200ms
- There is further room for optimization in the access patterns used in
  the implementation of the opration
@YavorGIvanov
Copy link
Collaborator Author

Added non-inplace version also for the ADD_REL_POS operation and fix deprecated operation warnings (for using map_unary in SAM). Additionally refactoerd LayerNorm2D in separate function and removed 2 ggml_cont from it. I will remove more ggml_conts and ggml_repeats in a separate PR.

@ggerganov ggerganov merged commit 170388d into master Aug 21, 2023
2 checks passed
@ggerganov ggerganov deleted the improve-add-rel-pos-perf branch August 21, 2023 12:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ggml : add broadcast support for BLAS ggml_mul_mat()
2 participants