Add Nv/AMD sycl target build cmd #5357

abhilash1910 · 2024-02-06T08:36:18Z

Cmake modification for Nv/AMD SYCL builds.
@NeoZhangJianyu @ggerganov @airMeng @AidanBeltonS @Alcpz

CMakeLists.txt

abhilash1910 · 2024-02-06T09:46:44Z

For SYCL runtime perf across nv and amd vendors this is based on sycl.cpp codebase. Features which will get added to the sycl code base will be tested as is on the other vendors . Priority wise we support Intel GPUs optimization and see performance of codebase on nv and amd on sycl runtime and provide improvements.

NeoZhangJianyu · 2024-02-06T09:46:56Z

@abhilash1910

Please make sure test is passed with NV & AMD GPU.
Update the README-sycl.md to guide how to install related software, build and run.
Add the GPU modes verified by you in supported list in README-sycl.md.
Run the CI by ci/run.sh on NV & AMD GPU.

Alcpz · 2024-02-06T10:02:42Z

@abhilash1910 @AidanBeltonS has been trying to run the SYCL version on Nvidia GPUs, but the tests are still not passing.
Another issue is that AMD builds require to manually specify the --offload-arch as there is currently no default parameter for that. We will be aiding with the review shortly.

0cc4m · 2024-02-06T20:01:17Z

I tried running it on Nvidia and AMD on Linux, but ran into some issues. It takes very long to compile and still only reports the Intel/CPU devices in the system, then dies.

» build_sycl/bin/main -t 16 -f ~/llama.cpp/input.txt -b 512 -c 2048 -n 128 --ignore-eos -m ~/koboldcpp/models/airoboros-m-7b-3.1.2.Q4_K_S.gguf -ngl 1000
Log start
main: build = 2080 (bea82a05)
main: built with Intel(R) oneAPI DPC++/C++ Compiler 2024.0.0 (2024.0.0.20231017) for x86_64-unknown-linux-gnu
main: seed  = 1707249047
GGML_SYCL_DEBUG=0
ggml_init_sycl: GGML_SYCL_F16:   no
ggml_init_sycl: SYCL_USE_XMX: yes
found 4 SYCL devices:
  Device 0: Intel(R) Arc(TM) A770 Graphics,     compute capability 1.3,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
  Device 1: Intel(R) FPGA Emulation Device,     compute capability 1.2,
        max compute_units 32,   max work group size 67108864,   max sub group size 64,  global mem size 134931963904
  Device 2: AMD EPYC 7302 16-Core Processor                ,    compute capability 3.0,
        max compute_units 32,   max work group size 8192,       max sub group size 64,  global mem size 134931963904
  Device 3: Intel(R) Arc(TM) A770 Graphics,     compute capability 3.0,
        max compute_units 512,  max work group size 1024,       max sub group size 32,  global mem size 16225243136
Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device
[...]
llama_new_context_with_model: n_ctx      = 2048
llama_new_context_with_model: freq_base  = 10000,0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:            KV buffer size =   256,00 MiB
llama_new_context_with_model: KV self size  =  256,00 MiB, K (f16):  128,00 MiB, V (f16):  128,00 MiB
llama_new_context_with_model:        CPU input buffer size   =    12,01 MiB
llama_new_context_with_model:            compute buffer size =   171,60 MiB
llama_new_context_with_model:        CPU compute buffer size =     8,80 MiB
llama_new_context_with_model: graph splits (measure): 3
Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)Exception caught at file:/home/user/upstream-llama.cpp/ggml-sycl.cpp, line:12706

AidanBeltonS · 2024-02-07T09:13:54Z

I tried running it on Nvidia and AMD on Linux, but ran into some issues. It takes very long to compile and still only reports the Intel/CPU devices in the system, then dies.
Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)Exception caught at file:/home/user/upstream-llama.cpp/ggml-sycl.cpp, line:12706

When you build for non spirv targets (i.e. NVidia and AMD) you must pass the device triple to the compiler i.e. -fsycl-targets=nvptx64-nvidia-cuda and -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a for Nvidia and AMD respectively. Please note, you must replace gfx90a with your AMD GPU's architecture. Then you will no longer get an invalid binary error.

However, as pointed out before NVidia and AMD are not yet passing all tests so you should not expect it to run properly just yet.

NeoZhangJianyu · 2024-02-07T10:05:25Z

I tried running it on Nvidia and AMD on Linux, but ran into some issues. It takes very long to compile and still only reports the Intel/CPU devices in the system, then dies.
Native API failed. Native API returns: -42 (PI_ERROR_INVALID_BINARY) -42 (PI_ERROR_INVALID_BINARY)Exception caught at file:/home/user/upstream-llama.cpp/ggml-sycl.cpp, line:12706
When you build for non spirv targets (i.e. NVidia and AMD) you must pass the device triple to the compiler i.e. -fsycl-targets=nvptx64-nvidia-cuda and -fsycl-targets=amdgcn-amd-amdhsa -Xsycl-target-backend=amdgcn-amd-amdhsa --offload-arch=gfx90a for Nvidia and AMD respectively. Please note, you must replace gfx90a with your AMD GPU's architecture. Then you will no longer get an invalid binary error.

However, as pointed out before NVidia and AMD are not yet passing all tests so you should not expect it to run properly just yet.

@abhilash1910 @AidanBeltonS
Is it possible to update such compile setting info in the CMakeFile.txt or README-sycl.md?
so that reduce same issue.

I suggest adding a sub chapter "AOT" in chapter "build" in README-sycl.md.

AidanBeltonS · 2024-02-07T10:54:41Z

@abhilash1910 @AidanBeltonS Is it possible to update such compile setting info in the CMakeFile.txt or README-sycl.md? so that reduce same issue.

I suggest adding a sub chapter "AOT" in chapter "build" in README-sycl.md.

Yes, I think we should update the CMake and README to properly support this. However, I do not propose making this change until we have the CUDA and HIP backends passing tests which is currently not the case.

abhilash1910 · 2024-03-26T07:30:02Z

Addressed in #5738.Closing

abhilash1910 added 2 commits February 6, 2024 14:02

add nv/amd sycl target build cmd

815b956

format

78dd88a

airMeng reviewed Feb 6, 2024

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

branch

cef47e4

ggerganov reviewed Feb 6, 2024

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

fix align

bea82a0

abhilash1910 marked this pull request as draft February 7, 2024 03:07

Jacoby1218 mentioned this pull request Feb 8, 2024

Add integration/CMake for SYCL backend LostRuins/koboldcpp#656

Open

abhilash1910 closed this Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Nv/AMD sycl target build cmd #5357

Add Nv/AMD sycl target build cmd #5357

abhilash1910 commented Feb 6, 2024

abhilash1910 commented Feb 6, 2024 •

edited

Loading

NeoZhangJianyu commented Feb 6, 2024 •

edited

Loading

Alcpz commented Feb 6, 2024 •

edited

Loading

0cc4m commented Feb 6, 2024 •

edited

Loading

AidanBeltonS commented Feb 7, 2024

NeoZhangJianyu commented Feb 7, 2024

AidanBeltonS commented Feb 7, 2024

abhilash1910 commented Mar 26, 2024

Add Nv/AMD sycl target build cmd #5357

Add Nv/AMD sycl target build cmd #5357

Conversation

abhilash1910 commented Feb 6, 2024

abhilash1910 commented Feb 6, 2024 • edited Loading

NeoZhangJianyu commented Feb 6, 2024 • edited Loading

Alcpz commented Feb 6, 2024 • edited Loading

0cc4m commented Feb 6, 2024 • edited Loading

AidanBeltonS commented Feb 7, 2024

NeoZhangJianyu commented Feb 7, 2024

AidanBeltonS commented Feb 7, 2024

abhilash1910 commented Mar 26, 2024

abhilash1910 commented Feb 6, 2024 •

edited

Loading

NeoZhangJianyu commented Feb 6, 2024 •

edited

Loading

Alcpz commented Feb 6, 2024 •

edited

Loading

0cc4m commented Feb 6, 2024 •

edited

Loading