Fix compile failure and further optimize by using vec_msum & vec_sum4s on ppc64le #849
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Recently, I found I can not compile the ggml on ppc64le due to some issues introduced by recent code commit. Here I resolved it and also made some other optimization for ppc64le. Please review. Thank you.
I confirm that I can now compile on ppc64le. All test passed.
The speedup of vec_dot_q float32 throughput numbers compared to current master branch on RHEL9.2 (Power10 machine) by the
test-quantizer-perf -i 10000
. The code was compiled by gcc-12.2.1. Normally, the new code gives about 10%-30% improvement. The best improvement is for q4_K which is about 60% improvement.