Support multiple GPUs (split mode) on SYCL backend #5806

NeoZhangJianyu · 2024-03-01T07:52:58Z

Support multiple GPUs (split mode) on SYCL backend.
split mode: [none, layer] supported; [row] not supported, it's on developing.
Unify the GPU setting as Cublas backend:

support set main gpu by: --main-gpu
support detecting all GPUs with level-zero and same top Max compute units.
remove use GGML_SYCL_DEVICE to set main gpu.

format to show the device list, like:

found 6 SYCL devices:
|ID| Name                                        |compute capability|Max compute units|Max work group|Max sub group|Global mem size|
|--|---------------------------------------------|------------------|-----------------|--------------|-------------|---------------|
| 0|            Intel(R) Data Center GPU Flex 170|               1.3|              512|          1024|           32|    16225243136|
| 1|               Intel(R) FPGA Emulation Device|               1.2|               64|      67108864|           64|   540713414656|
| 2|     Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz|               3.0|               64|          8192|           64|   540713414656|
| 3|            Intel(R) Data Center GPU Flex 170|               3.0|              512|          1024|           32|    16225243136|
| 4|            Intel(R) Data Center GPU Flex 170|               3.0|              512|          1024|           32|    16225243136|
| 5|            Intel(R) Data Center GPU Flex 170|               1.3|              512|          1024|           32|    16225243136|
detect 2 SYCL GPUs: [0,5] with Max compute units:512

Support OPs:

hardsigmoid
hardswish
pool2d

Use device index to set/get GPU data internal data.
same as cubals backend.
Use device ID to set/get GPU device info.

…ix for unit test

NeoZhangJianyu · 2024-03-01T08:01:05Z

@airMeng , @luoyu-intel @abhilash1910 could help review this PR?

Thank you!

slaren · 2024-03-01T12:50:37Z

examples/llama-bench/llama-bench.cpp

 int device_list[GGML_SYCL_MAX_DEVICES];
 ggml_sycl_get_gpu_list(device_list, GGML_SYCL_MAX_DEVICES);


I think this can be removed now, device_list does not seem to be used anymore.

yes, rm it.

slaren · 2024-03-01T12:52:03Z

examples/sycl/run-llama2.sh

+#ZES_ENABLE_SYSMAN=1, Support to get free memory of GPU by sycl::aspect::ext_intel_free_memory. Recommended to use when --split-mode = layer.
+
+#use all GPUs with same max compute units
+ZES_ENABLE_SYSMAN=1 ./build/bin/main -m models/llama-2-7b.Q4_0.gguf -p "${INPUT2}" -n 400 -e -ngl 33 -s 0 -mg $GGML_SYCL_DEVICE


Note that -mg is ignored with -sm layer, which is the default, so passing it here does nothing.

yes, rm -mg

ggerganov · 2024-03-01T13:34:17Z

common/common.cpp

 } else if (arg_next == "row") {
+#ifdef GGML_USE_SYCL
+ fprintf(stderr, "warning: The split mode value:[row] is not supported by llama.cpp with SYCL. It's developing.\nExit!\n");
+ exit(1);
+#endif // GGML_USE_SYCL
 params.split_mode = LLAMA_SPLIT_MODE_ROW;


Suggested change

} else if (arg_next == "row") {

#ifdef GGML_USE_SYCL

fprintf(stderr, "warning: The split mode value:[row] is not supported by llama.cpp with SYCL. It's developing.\nExit!\n");

exit(1);

#endif // GGML_USE_SYCL

params.split_mode = LLAMA_SPLIT_MODE_ROW;

} else if (arg_next == "row") {

#ifdef GGML_USE_SYCL

fprintf(stderr, "warning: The split mode value:[row] is not supported by llama.cpp with SYCL. It's developing.\nExit!\n");

exit(1);

#endif // GGML_USE_SYCL

params.split_mode = LLAMA_SPLIT_MODE_ROW;

yes, accept it.

ggerganov · 2024-03-01T13:35:38Z

llama.cpp

+#if (defined(GGML_USE_CUBLAS) || defined(GGML_USE_SYCL))
+#define GGML_USE_CUBLAS_SYCL
+#endif
+


GGML_USE_CUBLAS_SYCL appears to be unused

yes, rm it.

* suport multiple cards: split-mode - layer|row * rm warning * rebase with master, support tow new OPs, close feature for -sm=row, fix for unit test * update news * fix merge error * update according to review comments

NeoZhangJianyu added 5 commits February 28, 2024 19:34

suport multiple cards: split-mode - layer|row

f87da8e

rm warning

33563a8

rebase with master, support tow new OPs, close feature for -sm=row, f…

4c29df3

…ix for unit test

update news

47a572d

Merge branch 'master' into mulcards

6b01068

NeoZhangJianyu requested review from ggerganov and slaren March 1, 2024 07:59

fix merge error

5db8896

slaren reviewed Mar 1, 2024

View reviewed changes

slaren approved these changes Mar 1, 2024

View reviewed changes

ggerganov approved these changes Mar 1, 2024

View reviewed changes

update according to review comments

e4cc412

NeoZhangJianyu merged commit 7156413 into ggerganov:master Mar 2, 2024
60 checks passed

NeoZhangJianyu mentioned this pull request Mar 5, 2024

SYCL backend support Multi-card #5282

Closed

5 tasks

AidanBeltonS mentioned this pull request Mar 12, 2024

SYCL NVidia build failing #6026

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support multiple GPUs (split mode) on SYCL backend #5806

Support multiple GPUs (split mode) on SYCL backend #5806

NeoZhangJianyu commented Mar 1, 2024

NeoZhangJianyu commented Mar 1, 2024

slaren Mar 1, 2024

NeoZhangJianyu Mar 1, 2024

slaren Mar 1, 2024

NeoZhangJianyu Mar 1, 2024

ggerganov Mar 1, 2024

NeoZhangJianyu Mar 1, 2024

ggerganov Mar 1, 2024

NeoZhangJianyu Mar 1, 2024

		int device_list[GGML_SYCL_MAX_DEVICES];
		ggml_sycl_get_gpu_list(device_list, GGML_SYCL_MAX_DEVICES);

Support multiple GPUs (split mode) on SYCL backend #5806

Support multiple GPUs (split mode) on SYCL backend #5806

Conversation

NeoZhangJianyu commented Mar 1, 2024

NeoZhangJianyu commented Mar 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment