Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml_vec_dot_f16 - efficient way to implement this #85

Open
igsoft opened this issue Apr 13, 2023 · 2 comments
Open

ggml_vec_dot_f16 - efficient way to implement this #85

igsoft opened this issue Apr 13, 2023 · 2 comments

Comments

@igsoft
Copy link

igsoft commented Apr 13, 2023

Hi,
not really an issue but more a curiosity,
instead of sumf += sum[0] + sum[1] + sum[2] + sum[3] + sum[4] + sum[5] + sum[6] + sum[7];

did you try for A64 something similar to this?
sumf = vaddvq_f32(vaddq_f32(vcvt_f32_f16(vget_low_f16(sum)), vcvt_high_f32_f16(sum)));

@ggerganov
Copy link
Owner

I believe you are looking at some outdated version.
Can you send a link to the exact line?

@igsoft
Copy link
Author

igsoft commented Apr 13, 2023

You are right. I was just checking the homepage gpt-j where you wrote "Still, I'm curious to know if there is a more efficient way to implement this", but now I have seen that actually, you found the same solution. I don't know if the instruction vcvt_high_f32_f16 can give you an advantage in this line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants