Fix compile failure and further optimize by using vec_msum & vec_sum4s on ppc64le #849

penghongbo · 2024-06-04T05:13:32Z

Recently, I found I can not compile the ggml on ppc64le due to some issues introduced by recent code commit. Here I resolved it and also made some other optimization for ppc64le. Please review. Thank you.

I confirm that I can now compile on ppc64le. All test passed.

The speedup of vec_dot_q float32 throughput numbers compared to current master branch on RHEL9.2 (Power10 machine) by the test-quantizer-perf -i 10000. The code was compiled by gcc-12.2.1. Normally, the new code gives about 10%-30% improvement. The best improvement is for q4_K which is about 60% improvement.

Type	q4_0	q4_1	q5_0	q5_1	q8_0	q2_K	q3_K	q4_K	q5_K	q6_K	iq3_xxs	iq4_nl	iq3_s	iq2_s	iq4_xs
speedup	1.09	1.17	0.99	1.15	1.28	1.24	1.24	1.63	1.32	1.09	1.16	1.07	1.09	1.16	1.27

penghongbo added 4 commits June 3, 2024 22:23

fix compile issues introduced by loongarch_asx

81ab568

restore quant changes to merge

f6e002a

fix compile issues introduced by loongarch_asx

0ba346e

further optimize by using vec_msum & vec_sum4s on ppc64le

9672484

ggerganov approved these changes Jun 16, 2024

View reviewed changes

ggerganov merged commit 83a2f3c into ggerganov:master Jun 16, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix compile failure and further optimize by using vec_msum & vec_sum4s on ppc64le #849

Fix compile failure and further optimize by using vec_msum & vec_sum4s on ppc64le #849

penghongbo commented Jun 4, 2024

Fix compile failure and further optimize by using vec_msum & vec_sum4s on ppc64le #849

Fix compile failure and further optimize by using vec_msum & vec_sum4s on ppc64le #849

Conversation

penghongbo commented Jun 4, 2024