-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add some new ops, fix some operators and add batch operations to certain operators. #747
Conversation
@leejet I believe that before merging this, the respective Metal kernels should be created to maintain compatibility, although it doesn't really help much since sd.cpp doesn't benefit from the Metal backend as matrix operations are very slow. |
OK, I will find time to add support for Metal backend. |
@FSSRepo I have implemented Metal backend support for ggml_arange and ggml_timestep_embedding, and tested it with the test-arrange and test-timestep-embedding test cases to ensure it works correctly. Could you find some time to review and merge this PR? |
Ok, I will test this PR with latest sd.cpp |
|
These two operations won't significantly improve performance, but in stable video diffusion, it's necessary to perform these operations multiple times within the UNet network. Additionally, the parameters may change with each execution. If I don't execute these operations when computing the graph, I would have to remember the tensors produced by each operation and their corresponding parameters. Then, during the ggml_gallocr_alloc_graph, I'd have to copy the data, making the process more complex. Implementing this through ggml operators would make the code more elegant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have tests for these operators in test-backend-ops
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After adding tests to test-backend-ops
we can merge
src/ggml.c
Outdated
} | ||
} | ||
} | ||
float variance = sum2 / (ne00 * ne01 * step); | ||
float variance = sum2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the reason to change these normalization to be inside the loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because previously, it calculated the sum of (v * v) and then divided by (ne00 * ne01 * step), I was concerned that in certain situations, it might lead to overflow or precision issues. Therefore, I changed it to calculate the sum of (v * v)/(ne00 * ne01 * step) to obtain the variance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't cause problems since we accumulate into double
(i.e. ggml_float
). I've introduced a separate accumulator per row and restored the normalization to be at the end in order to avoid extra division in the loop
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
Co-authored-by: Georgi Gerganov <[email protected]>
I have modified the code based on the suggestions from the review and added tests to test-backend-ops. There are two uncertain points still awaiting feedback from the maintainer. |
I tested it in the sd.cpp project, and everything works well.