Why GGML_F16_STEP is 32? #526

xshen053 · 2023-09-15T23:56:16Z

Seems GGML_F16_STEP is an arbitrary value?
I feel it can be16, 64 as well

AngryLoki · 2023-09-21T07:24:07Z

F16_STEP on AVX CPUs is set to 32 because these CPUs basically can't perform half-float math operations (other than convert to F32) due to lack of instructions. As a result, unrolled loops of 32 half-floats unroll into 32 floats = 1024 bits, which takes 4 ymm registers. And because most of operations have 2 sources, 4 destination + 8 source ymm registers fit total of 16 ymm registers. It may be not optimal for unary or binary in-place operations, but it is decent approximation.

For other platforms GGML_F16_STEP is not always 32, as you can see in the code.

xshen053 · 2023-09-21T07:44:33Z

Thanks for the reply, actually on arm it is stil 32, I am testing on arm

ggerganov · 2023-10-06T14:03:37Z

The value is whatever was optimal during testing on my machine. Probably there is a proper way to calculate the optimum value (as @AngryLoki explained), but in this case it was just trial and error

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why GGML_F16_STEP is 32? #526

Why GGML_F16_STEP is 32? #526

xshen053 commented Sep 15, 2023

AngryLoki commented Sep 21, 2023 •

edited

Loading

xshen053 commented Sep 21, 2023

ggerganov commented Oct 6, 2023

Why GGML_F16_STEP is 32? #526

Why GGML_F16_STEP is 32? #526

Comments

xshen053 commented Sep 15, 2023

AngryLoki commented Sep 21, 2023 • edited Loading

xshen053 commented Sep 21, 2023

ggerganov commented Oct 6, 2023

AngryLoki commented Sep 21, 2023 •

edited

Loading