Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Excessively slow prompt processing time with 70B partially offloaded in SYCL #5272

Closed
Jacoby1218 opened this issue Feb 2, 2024 · 6 comments

Comments

@Jacoby1218
Copy link

prompt processing is extremely slow with a 70B partially offloaded.
llama-bench.exe -ngl 20 -m "D:\models\lzlv_70b_fp16_hf.Q4_K_M.gguf"
Using device 0 (Intel(R) Arc(TM) A770 Graphics) as main device

model size params backend ngl test t/s
llama 70B Q4_K - Medium 38.58 GiB 68.98 B SYCL 20 pp 512 2.14 ± 0.28
llama 70B Q4_K - Medium 38.58 GiB 68.98 B SYCL 20 tg 128 1.03 ± 0.01

build: a28c5ef (2045)

@airMeng
Copy link
Collaborator

airMeng commented Feb 2, 2024

hi @Jacoby1218 could you provide some reference data to show the magnitude of gaps? for example, performance on RTX-4070ti (16 GB), or entirely on iGPU/CPU?

@Jacoby1218
Copy link
Author

Jacoby1218 commented Feb 2, 2024

I don't have any other GPU to test, but i can provide results from my CPU and other backends.

model size params backend threads test t/s
llama 70B Q4_K - Medium 38.58 GiB 68.98 B BLAS 6 pp 512 1.93 ± 0.06
llama 70B Q4_K - Medium 38.58 GiB 68.98 B BLAS 6 tg 128 0.81 ± 0.02
model size params backend ngl test t/s
llama 70B Q4_K - Medium 38.58 GiB 68.98 B Vulkan 20 pp 512 7.02 ± 0.25
llama 70B Q4_K - Medium 38.58 GiB 68.98 B Vulkan 20 tg 128 0.97 ± 0.04
llama 70B Q4_K - Medium 38.58 GiB 68.98 B OpenCL 20 pp 512 8.81 ± 1.10
llama 70B Q4_K - Medium 38.58 GiB 68.98 B OpenCL 20 tg 128 0.82 ± 0.02

@airMeng
Copy link
Collaborator

airMeng commented Feb 2, 2024

I think this maybe due to lacking optimization on multi-batch, has been recordd in #5277, please stay tuned!

Copy link
Contributor

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Mar 18, 2024
@airMeng
Copy link
Collaborator

airMeng commented Mar 24, 2024

I think this has been improved with #6217, please give a try.

@github-actions github-actions bot removed the stale label Mar 25, 2024
@github-actions github-actions bot added the stale label Apr 24, 2024
Copy link
Contributor

github-actions bot commented May 9, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants