Added RISC-V Vector Support for K-Quants and improved the existing intrinsics #3453

Tameem-10xE · 2023-10-03T12:11:53Z

Hi,

In #2929, we have added the RISC-V intrinsics for the dot product functions in GGML, this PR improves these existing dot product functions in ggml.c and also adds the new risc-v vector intrinsics for k_quants and row quantize (Q8_0 and Q8_1) functions. Now LLaMa.cpp fully supports to run on RISC-V vector processor with GGUF.

In future, this will enable GGML and LLaMa.cpp to run efficiently on RISC-V hardware with vector support and also open a way to compare its performance with other vector processors like Intel AVX and Arm Neon.

Update: Got access to RISC-V vector board with 8 cores and 4GB RAM, the performance boost is 6-7 times against the scalar version on the same board.

Runining llama.cpp AI model on RVV1.0 vs RISC-V Scalar

The RISC-V Vector intrinsics support is added for the following K_quants functions with both QKK = 256 and QKK = 64 block size

   ggml_vec_dot_q2_K_q8_K
   ggml_vec_dot_q3_K_q8_K
   ggml_vec_dot_q4_K_q8_K
   ggml_vec_dot_q5_K_q8_K
   ggml_vec_dot_q6_K_q8_K

The RVV intrinsics is also added for the following Q8 quantize row functions

    quantize_row_q8_0
    quantize_row_q8_1

The following dot product functions have also been optimized by using fractional LMUL (i.e. 1/2) instead of LMUL = 1. I am a little skeptical of this since it works correctly but I have noticed some decrease in inference accuracy, which I think could be a problem with my system or weights. Although I prefer to stick with it since it utilizes a much less number of vector registers after product

    ggml_vec_dot_q4_0_q8_0
    ggml_vec_dot_q4_1_q8_1
    ggml_vec_dot_q5_0_q8_0
    ggml_vec_dot_q5_1_q8_1

And finally, the vector initialization in Q5 by the temporary array is also replaced by the vid_v intrinsics

[Compilation]
Ubuntu: 22.10
riscv-toolchain: 2023.07.05 riscv64 linux glibc

To compile it for RISC-V run,

$   make   llama-cli                   # For RISC-V CPU

$   make clean
$   make   RISCV_CROSS_COMPILE=1       # For Cross Compilation only

[Directly on RISC-V CPU]

$   ./llama-cli -m ./path/to/model.gguf -p "Anything" -n 50

[QEMU]

$   qemu-riscv64 -L /path/to/sysroot/  -cpu rv64,v=true,vlen=256,elen=64,vext_spec=v1.0 ./llama-cli -m ./path/to/model.gguf -p "Anything" -n 50

Note: Running on qemu emulator could be very slow and may take 2-5 minutes per token

Any feedback is welcome, if you have any suggestions or improvements, especially for fractional LMUL change, please share.

Thanks!

…e existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <[email protected]>

This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <[email protected]>

…example * 'master' of github.com:ggerganov/llama.cpp: (24 commits) convert : fix Baichuan2 models by using vocab size in config.json (ggerganov#3299) readme : add project status link ggml : fix build after ggerganov#3329 llm : add Refact model (ggerganov#3329) sync : ggml (conv 1d + 2d updates, UB fixes) (ggerganov#3468) finetune : readme fix typo (ggerganov#3465) ggml : add RISC-V Vector Support for K-Quants and improved the existing intrinsics (ggerganov#3453) main : consistent prefix/suffix coloring (ggerganov#3425) llama : fix session saving/loading (ggerganov#3400) llama : expose model's rope_freq_scale in the API (ggerganov#3418) metal : alibi for arbitrary number of heads (ggerganov#3426) cmake : make LLAMA_NATIVE flag actually use the instructions supported by the processor (ggerganov#3273) Work on the BPE tokenizer (ggerganov#3252) convert : fix vocab size when not defined in hparams (ggerganov#3421) cmake : increase minimum version for add_link_options (ggerganov#3444) CLBlast: Add broadcast support for matrix multiplication (ggerganov#3402) gguf : add BERT, MPT, and GPT-J arch info (ggerganov#3408) gguf : general usability improvements (ggerganov#3409) cmake : make CUDA flags more similar to the Makefile (ggerganov#3420) finetune : fix ggerganov#3404 (ggerganov#3437) ...

…ng intrinsics (ggerganov#3453) * Added RVV intrinsics support for Q8 quantize row and also improved the existing dot product function for risc-v. The RVV intrinsics is added for the following quantize row functions quantize_row_q8_0 quantize_row_q8_1 The following dot product functions have also been optimized by using LMUL = 1/2 instead of LMUL = 1 ggml_vec_dot_q4_0_q8_0 ggml_vec_dot_q4_1_q8_1 ggml_vec_dot_q5_0_q8_0 ggml_vec_dot_q5_1_q8_1 And vector initialization in Q5 by temporary array is also replaced by the vid intrinsics Signed-off-by: Ahmad Tameem <[email protected]> * Added RVV intrinsics support for k_quants This adds RISC-V Vector intrinsics support for the following K_quants functions for both QKK = 256 and QKK = 64 ggml_vec_dot_q2_K_q8_K ggml_vec_dot_q3_K_q8_K ggml_vec_dot_q4_K_q8_K ggml_vec_dot_q5_K_q8_K ggml_vec_dot_q6_K_q8_K Signed-off-by: Ahmad Tameem <[email protected]> --------- Signed-off-by: Ahmad Tameem <[email protected]>

grigohas · 2024-07-10T07:54:10Z

hello, i am doing what you suggested and i have results. I have 2 questions, when i want to run it without vector proccesor in qemu, what comand do i have to run? also , how can i check that those 2 runs are different and the one with vector proccesor working like i wanted to ? sorry, i am new to this

Tameem-10xE · 2024-07-10T08:10:49Z

Hi, for running on CPU (scalar) provide the path to risc-v toolchain and then use qemu

make llama-cli CC="riscv64-unknown-linux-gnu-gcc -march=rv64gc -mabi=lp64d" CXX="riscv64-unknown-linux-gnu-g++ -march=rv64gc -mabi=lp64d"

qemu-riscv64 -L /path/to/sysroot/  -cpu rv64 ./llama-cli -m ./path/to/model.gguf -p "Anything" -n 100

You can set the seed to get the same results i.e; llama-cli -s (some_seed number) ...

More details: RVV article
Also, this is old, and many things have change, like main -> llama-cli etc.

Thank you

grigohas · 2024-07-10T08:17:28Z

Hi, for running on CPU (scalar) provide the path to risc-v toolchain and then use qemu
make llama-cli CC="riscv64-unknown-linux-gnu-gcc -march=rv64gc -mabi=lp64d" CXX="riscv64-unknown-linux-gnu-g++ -march=rv64gc -mabi=lp64d"
qemu-riscv64 -L /path/to/sysroot/  -cpu rv64 ./llama-cli -m ./path/to/model.gguf -p "Anything" -n 100
You can set the seed to get the same results i.e; llama-cli -s (some_seed number) ...

More details: RVV article Also, this is old, and many things have change, like main -> llama-cli etc.

Thank you

yeah i read this article but when i do the make you provided , i get an error for "march=native" error and from what i searched on makefile, i have to do RISCV_CROSS_COMPILE=1 RISCV=1

Tameem-10xE · 2024-07-10T09:48:51Z

Sorry yes, I just identified the makefile has been reorder and RISCV=1 is required in current version

Tameem-10xE · 2024-07-10T10:03:37Z

After line 432 in makefile, update the flags for vector version with scalar, i.e

MK_CFLAGS += -march=rv64gc -mabi=lp64d
MK_CXXFLAGS += -march=rv64gc -mabi=lp64d

and then run qemu with,

make llama-cli RISCV=1 CC="riscv64-unknown-linux-gnu-gcc" CXX="riscv64-unknown-linux-gnu-g++"

grigohas · 2024-07-10T11:12:30Z

okay okay, one last question, i use the same seed and i have results for both with vector and without, but the only difference in log print is the print time . with vector is 2-2.5X more than without. Is it correct?

Tameem-10xE · 2024-07-10T12:18:19Z

Yes, on qemu the vector emulation time is much slower (the actual reason is not known to me, could be due to qemu has to additionally emulate vector processor with the scalar one or parallel processing issues, and also the log use the real-time for comparison), but this should not be the case with actual RISC-V vector board

grigohas · 2024-09-11T12:33:04Z

hello again, i am running llama with vector extension on gem5 but since there isnt something on log to check if vector extension is enabled , how do i know ?

Tameem-10xE · 2024-09-11T18:40:29Z

Hi, I’ve submitted a PR (#9442) which will print RISCV_VECT=1 on the terminal, if the vector processor is found. Also, I slightly changed Makefile so it no longer requires a flag for RISC-V vector boards—only RISCV_CROSS_COMPILATION=1 is needed for the emulator (i.e. QEMU).

The following is the output from the RISC-V BPI-F3 board with vector support,
...

...

...

...

grigohas · 2024-10-15T06:45:02Z

Hello, can i run llama-embedding in riscv with rvv too ? i want to run bert llm through llama.cpp but i need torch for riscv and i cannot find a way to import it in a riscv platform

Tameem-10xE added 2 commits October 2, 2023 13:46

Tameem-10xE force-pushed the llama-rvv branch from 388a59a to f6883a7 Compare October 3, 2023 12:17

ggerganov approved these changes Oct 3, 2023

View reviewed changes

ggerganov merged commit 79f34ab into ggerganov:master Oct 3, 2023
33 checks passed

Tameem-10xE deleted the llama-rvv branch October 10, 2023 09:03

Tameem-10xE mentioned this pull request Feb 8, 2024

[ERROR] Futex facility returned an unexpected error code riscv-software-src/riscv-isa-sim#1443

Closed

Tameem-10xE mentioned this pull request Mar 4, 2024

[GGML] Added RISC-V Vector Intrinsics Support #2929

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added RISC-V Vector Support for K-Quants and improved the existing intrinsics #3453

Added RISC-V Vector Support for K-Quants and improved the existing intrinsics #3453

Tameem-10xE commented Oct 3, 2023 •

edited

Loading

grigohas commented Jul 10, 2024

Tameem-10xE commented Jul 10, 2024

grigohas commented Jul 10, 2024

Tameem-10xE commented Jul 10, 2024 •

edited

Loading

Tameem-10xE commented Jul 10, 2024 •

edited

Loading

grigohas commented Jul 10, 2024

Tameem-10xE commented Jul 10, 2024

grigohas commented Sep 11, 2024

Tameem-10xE commented Sep 11, 2024

grigohas commented Oct 15, 2024 •

edited

Loading

Added RISC-V Vector Support for K-Quants and improved the existing intrinsics #3453

Added RISC-V Vector Support for K-Quants and improved the existing intrinsics #3453

Conversation

Tameem-10xE commented Oct 3, 2023 • edited Loading

grigohas commented Jul 10, 2024

Tameem-10xE commented Jul 10, 2024

grigohas commented Jul 10, 2024

Tameem-10xE commented Jul 10, 2024 • edited Loading

Tameem-10xE commented Jul 10, 2024 • edited Loading

grigohas commented Jul 10, 2024

Tameem-10xE commented Jul 10, 2024

grigohas commented Sep 11, 2024

Tameem-10xE commented Sep 11, 2024

grigohas commented Oct 15, 2024 • edited Loading

Tameem-10xE commented Oct 3, 2023 •

edited

Loading

Tameem-10xE commented Jul 10, 2024 •

edited

Loading

Tameem-10xE commented Jul 10, 2024 •

edited

Loading

grigohas commented Oct 15, 2024 •

edited

Loading