ggml_vec_dot_f16 - efficient way to implement this #85

igsoft · 2023-04-13T20:57:49Z

Hi,
not really an issue but more a curiosity,
instead of sumf += sum[0] + sum[1] + sum[2] + sum[3] + sum[4] + sum[5] + sum[6] + sum[7];

did you try for A64 something similar to this?
sumf = vaddvq_f32(vaddq_f32(vcvt_f32_f16(vget_low_f16(sum)), vcvt_high_f32_f16(sum)));

The text was updated successfully, but these errors were encountered:

ggerganov · 2023-04-13T21:10:53Z

I believe you are looking at some outdated version.
Can you send a link to the exact line?

igsoft · 2023-04-13T21:25:45Z

You are right. I was just checking the homepage gpt-j where you wrote "Still, I'm curious to know if there is a more efficient way to implement this", but now I have seen that actually, you found the same solution. I don't know if the instruction vcvt_high_f32_f16 can give you an advantage in this line.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml_vec_dot_f16 - efficient way to implement this #85

ggml_vec_dot_f16 - efficient way to implement this #85

igsoft commented Apr 13, 2023

ggerganov commented Apr 13, 2023

igsoft commented Apr 13, 2023

ggml_vec_dot_f16 - efficient way to implement this #85

ggml_vec_dot_f16 - efficient way to implement this #85

Comments

igsoft commented Apr 13, 2023

ggerganov commented Apr 13, 2023

igsoft commented Apr 13, 2023