Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix compile failure and further optimize by using vec_msum & vec_sum4s on ppc64le #849

Merged
merged 4 commits into from
Jun 16, 2024

Conversation

penghongbo
Copy link
Contributor

Recently, I found I can not compile the ggml on ppc64le due to some issues introduced by recent code commit. Here I resolved it and also made some other optimization for ppc64le. Please review. Thank you.

I confirm that I can now compile on ppc64le. All test passed.

The speedup of vec_dot_q float32 throughput numbers compared to current master branch on RHEL9.2 (Power10 machine) by the test-quantizer-perf -i 10000. The code was compiled by gcc-12.2.1. Normally, the new code gives about 10%-30% improvement. The best improvement is for q4_K which is about 60% improvement.

Type q4_0 q4_1 q5_0 q5_1 q8_0 q2_K q3_K q4_K q5_K q6_K iq3_xxs iq4_nl iq3_s iq2_s iq4_xs
speedup 1.09 1.17 0.99 1.15 1.28 1.24 1.24 1.63 1.32 1.09 1.16 1.07 1.09 1.16 1.27

@ggerganov ggerganov merged commit 83a2f3c into ggerganov:master Jun 16, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants