Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why GGML_F16_STEP is 32? #526

Open
xshen053 opened this issue Sep 15, 2023 · 3 comments
Open

Why GGML_F16_STEP is 32? #526

xshen053 opened this issue Sep 15, 2023 · 3 comments

Comments

@xshen053
Copy link

Seems GGML_F16_STEP is an arbitrary value?
I feel it can be16, 64 as well

@AngryLoki
Copy link

AngryLoki commented Sep 21, 2023

F16_STEP on AVX CPUs is set to 32 because these CPUs basically can't perform half-float math operations (other than convert to F32) due to lack of instructions. As a result, unrolled loops of 32 half-floats unroll into 32 floats = 1024 bits, which takes 4 ymm registers. And because most of operations have 2 sources, 4 destination + 8 source ymm registers fit total of 16 ymm registers. It may be not optimal for unary or binary in-place operations, but it is decent approximation.

For other platforms GGML_F16_STEP is not always 32, as you can see in the code.

@xshen053
Copy link
Author

Thanks for the reply, actually on arm it is stil 32, I am testing on arm
image

@ggerganov
Copy link
Owner

The value is whatever was optimal during testing on my machine. Probably there is a proper way to calculate the optimum value (as @AngryLoki explained), but in this case it was just trial and error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants