[SYCL] Fixed minor bug when enabling FP16 for non intel targets #6464

OuadiElfarouki · 2024-04-03T16:56:22Z

Similar to gemm_batch and gemm_batch_impl, we move the #ifdef __INTEL_MKL__ from the gemm_impl to its wrapper gemm limiting the guard to non-supported types only. This avoids the misleading runtime error(*) with GGML_SYCL_F16 on non intel device targets.

(*)Error can be reproduced on master branch using NVIDIA target and enabling GGML_SYCL_F16 and running a prompt processing (-n 0) benchmark using batch > 1. Error message : "The oneAPI Math Kernel Library (oneMKL) Interfaces Project does not support this API".

@NeoZhangJianyu @airMeng @AidanBeltonS @abhilash1910

Co-authored-by: AidanBeltonS <[email protected]>

github-actions · 2024-04-03T17:09:34Z

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3: 533 iterations 🚀

Concurrent users: 8, duration: 10m
HTTP request : avg=8757.71ms p(90)=24565.15ms fails=0, finish reason: stop=533 truncated=0
Prompt processing (pp): avg=235.86tk/s p(90)=697.6tk/s total=206.18tk/s
Token generation (tg): avg=98.76tk/s p(90)=281.61tk/s total=132.2tk/s
ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=sycl_fix_non_intel_fp16 commit=a7c67582140a72935054019a25beb35df34addfb

Time series

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 533 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1712163546 --> 1712164168
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 539.39, 539.39, 539.39, 539.39, 539.39, 585.32, 585.32, 585.32, 585.32, 585.32, 622.44, 622.44, 622.44, 622.44, 622.44, 641.79, 641.79, 641.79, 641.79, 641.79, 653.67, 653.67, 653.67, 653.67, 653.67, 655.17, 655.17, 655.17, 655.17, 655.17, 681.88, 681.88, 681.88, 681.88, 681.88, 679.0, 679.0, 679.0, 679.0, 679.0, 689.74, 689.74, 689.74, 689.74, 689.74, 689.96, 689.96, 689.96, 689.96, 689.96, 705.51, 705.51, 705.51, 705.51, 705.51, 735.77, 735.77, 735.77, 735.77, 735.77, 744.3, 744.3, 744.3, 744.3, 744.3, 738.28, 738.28, 738.28, 738.28, 738.28, 736.76, 736.76, 736.76, 736.76, 736.76, 737.42, 737.42, 737.42, 737.42, 737.42, 735.28, 735.28, 735.28, 735.28, 735.28, 733.4, 733.4, 733.4, 733.4, 733.4, 744.12, 744.12, 744.12, 744.12, 744.12, 741.75, 741.75, 741.75, 741.75, 741.75, 739.86, 739.86, 739.86, 739.86, 739.86, 743.2, 743.2, 743.2, 743.2, 743.2, 742.44, 742.44, 742.44, 742.44, 742.44, 745.31, 745.31, 745.31, 745.31, 745.31, 751.3, 751.3, 751.3, 751.3, 751.3, 749.58, 749.58, 749.58, 749.58, 749.58, 752.57, 752.57, 752.57, 752.57, 752.57, 759.29, 759.29, 759.29, 759.29, 759.29, 758.14, 758.14, 758.14, 758.14, 758.14, 756.52, 756.52, 756.52, 756.52, 756.52, 757.42, 757.42, 757.42, 757.42, 757.42, 758.44, 758.44, 758.44, 758.44, 758.44, 756.85, 756.85, 756.85, 756.85, 756.85, 758.75, 758.75, 758.75, 758.75, 758.75, 765.36, 765.36, 765.36, 765.36, 765.36, 770.49, 770.49, 770.49, 770.49, 770.49, 777.03, 777.03, 777.03, 777.03, 777.03, 774.48, 774.48, 774.48, 774.48, 774.48, 773.33, 773.33, 773.33, 773.33, 773.33, 773.93, 773.93, 773.93, 773.93, 773.93, 772.52, 772.52, 772.52, 772.52, 772.52, 768.6, 768.6, 768.6, 768.6, 768.6, 751.28, 751.28, 751.28, 751.28, 751.28, 749.32, 749.32, 749.32, 749.32, 749.32, 748.78, 748.78, 748.78, 748.78, 748.78, 745.34, 745.34, 745.34, 745.34, 745.34, 743.26, 743.26, 743.26, 743.26, 743.26, 745.58, 745.58, 745.58, 745.58, 745.58, 746.65, 746.65, 746.65, 746.65, 746.65, 746.07, 746.07, 746.07, 746.07, 746.07, 748.57, 748.57, 748.57, 748.57, 748.57, 748.05, 748.05, 748.05, 748.05, 748.05, 749.23, 749.23, 749.23, 749.23, 749.23, 748.77, 748.77, 748.77, 748.77, 748.77, 748.24, 748.24, 748.24, 748.24, 748.24, 749.31, 749.31, 749.31, 749.31, 749.31, 748.86, 748.86, 748.86, 748.86, 748.86, 749.48, 749.48, 749.48, 749.48, 749.48, 749.83, 749.83, 749.83, 749.83, 749.83, 749.36, 749.36, 749.36, 749.36, 749.36, 750.75, 750.75]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 533 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1712163546 --> 1712164168
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 28.43, 28.43, 28.43, 28.43, 28.43, 15.06, 15.06, 15.06, 15.06, 15.06, 17.85, 17.85, 17.85, 17.85, 17.85, 18.65, 18.65, 18.65, 18.65, 18.65, 19.03, 19.03, 19.03, 19.03, 19.03, 19.62, 19.62, 19.62, 19.62, 19.62, 20.02, 20.02, 20.02, 20.02, 20.02, 20.22, 20.22, 20.22, 20.22, 20.22, 20.3, 20.3, 20.3, 20.3, 20.3, 20.32, 20.32, 20.32, 20.32, 20.32, 20.3, 20.3, 20.3, 20.3, 20.3, 20.13, 20.13, 20.13, 20.13, 20.13, 19.94, 19.94, 19.94, 19.94, 19.94, 19.55, 19.55, 19.55, 19.55, 19.55, 18.82, 18.82, 18.82, 18.82, 18.82, 18.76, 18.76, 18.76, 18.76, 18.76, 18.88, 18.88, 18.88, 18.88, 18.88, 19.07, 19.07, 19.07, 19.07, 19.07, 18.97, 18.97, 18.97, 18.97, 18.97, 18.85, 18.85, 18.85, 18.85, 18.85, 18.76, 18.76, 18.76, 18.76, 18.76, 18.62, 18.62, 18.62, 18.62, 18.62, 18.51, 18.51, 18.51, 18.51, 18.51, 18.55, 18.55, 18.55, 18.55, 18.55, 18.57, 18.57, 18.57, 18.57, 18.57, 18.46, 18.46, 18.46, 18.46, 18.46, 18.61, 18.61, 18.61, 18.61, 18.61, 18.64, 18.64, 18.64, 18.64, 18.64, 18.58, 18.58, 18.58, 18.58, 18.58, 18.59, 18.59, 18.59, 18.59, 18.59, 18.64, 18.64, 18.64, 18.64, 18.64, 18.68, 18.68, 18.68, 18.68, 18.68, 18.79, 18.79, 18.79, 18.79, 18.79, 18.92, 18.92, 18.92, 18.92, 18.92, 18.94, 18.94, 18.94, 18.94, 18.94, 18.79, 18.79, 18.79, 18.79, 18.79, 18.75, 18.75, 18.75, 18.75, 18.75, 18.69, 18.69, 18.69, 18.69, 18.69, 18.72, 18.72, 18.72, 18.72, 18.72, 18.78, 18.78, 18.78, 18.78, 18.78, 18.83, 18.83, 18.83, 18.83, 18.83, 18.78, 18.78, 18.78, 18.78, 18.78, 18.69, 18.69, 18.69, 18.69, 18.69, 18.53, 18.53, 18.53, 18.53, 18.53, 18.53, 18.53, 18.53, 18.53, 18.53, 18.33, 18.33, 18.33, 18.33, 18.33, 18.04, 18.04, 18.04, 18.04, 18.04, 17.72, 17.72, 17.72, 17.72, 17.72, 17.68, 17.68, 17.68, 17.68, 17.68, 17.73, 17.73, 17.73, 17.73, 17.73, 17.79, 17.79, 17.79, 17.79, 17.79, 17.83, 17.83, 17.83, 17.83, 17.83, 17.83, 17.83, 17.83, 17.83, 17.83, 17.83, 17.83, 17.83, 17.83, 17.83, 17.82, 17.82, 17.82, 17.82, 17.82, 17.79, 17.79, 17.79, 17.79, 17.79, 17.71, 17.71, 17.71, 17.71, 17.71, 17.75, 17.75, 17.75, 17.75, 17.75, 17.81, 17.81, 17.81, 17.81, 17.81, 17.89, 17.89, 17.89, 17.89, 17.89, 17.97, 17.97]

Details

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 533 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1712163546 --> 1712164168
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.26, 0.26, 0.26, 0.26, 0.26, 0.15, 0.15, 0.15, 0.15, 0.15, 0.11, 0.11, 0.11, 0.11, 0.11, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.14, 0.14, 0.14, 0.14, 0.14, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.19, 0.19, 0.19, 0.19, 0.19, 0.24, 0.24, 0.24, 0.24, 0.24, 0.12, 0.12, 0.12, 0.12, 0.12, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.25, 0.25, 0.25, 0.25, 0.25, 0.24, 0.24, 0.24, 0.24, 0.24, 0.29, 0.29, 0.29, 0.29, 0.29, 0.21, 0.21, 0.21, 0.21, 0.21, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.31, 0.31, 0.31, 0.31, 0.31, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.29, 0.29, 0.29, 0.29, 0.29, 0.15, 0.15, 0.15, 0.15, 0.15, 0.13, 0.13, 0.13, 0.13, 0.13, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.22, 0.22, 0.22, 0.22, 0.22, 0.12, 0.12, 0.12, 0.12, 0.12, 0.17, 0.17, 0.17, 0.17, 0.17, 0.14, 0.14, 0.14, 0.14, 0.14, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.08, 0.08, 0.08, 0.08, 0.08, 0.34, 0.34, 0.34, 0.34, 0.34, 0.45, 0.45, 0.45, 0.45, 0.45, 0.49, 0.49, 0.49, 0.49, 0.49, 0.51, 0.51, 0.51, 0.51, 0.51, 0.5, 0.5, 0.5, 0.5, 0.5, 0.39, 0.39, 0.39, 0.39, 0.39, 0.09, 0.09, 0.09, 0.09, 0.09, 0.17, 0.17, 0.17, 0.17, 0.17, 0.18, 0.18, 0.18, 0.18, 0.18, 0.23, 0.23, 0.23, 0.23, 0.23, 0.12, 0.12, 0.12, 0.12, 0.12, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.23, 0.28, 0.28, 0.28, 0.28, 0.28, 0.26, 0.26, 0.26, 0.26, 0.26, 0.09, 0.09, 0.09, 0.09, 0.09, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.15, 0.15]

More

---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 533 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1712163546 --> 1712164168
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0]

abhilash1910

LGTM!
pinging @NeoZhangJianyu @ggerganov for a look when available

…ganov#6464) * moved INTEL_MKL guard from gemm_impl to gemm (wrapper) * Update ggml-sycl.cpp Co-authored-by: AidanBeltonS <[email protected]> --------- Co-authored-by: AidanBeltonS <[email protected]>

OuadiElfarouki and others added 5 commits March 28, 2024 15:43

moved INTEL_MKL guard from gemm_impl to gemm (wrapper)

4070423

Merge branch 'master' into sycl_fix_non_intel_fp16

f746e70

Update ggml-sycl.cpp

84ef62e

Co-authored-by: AidanBeltonS <[email protected]>

Merge branch 'master' into sycl_fix_non_intel_fp16

c0c4b30

Merge branch 'master' into sycl_fix_non_intel_fp16

a7c6758

AidanBeltonS approved these changes Apr 4, 2024

View reviewed changes

abhilash1910 approved these changes Apr 4, 2024

View reviewed changes

abhilash1910 requested a review from ggerganov April 5, 2024 03:10

abhilash1910 merged commit 1b496a7 into ggerganov:master Apr 5, 2024
58 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL] Fixed minor bug when enabling FP16 for non intel targets #6464

[SYCL] Fixed minor bug when enabling FP16 for non intel targets #6464

OuadiElfarouki commented Apr 3, 2024

github-actions bot commented Apr 3, 2024

abhilash1910 left a comment

[SYCL] Fixed minor bug when enabling FP16 for non intel targets #6464

[SYCL] Fixed minor bug when enabling FP16 for non intel targets #6464

Conversation

OuadiElfarouki commented Apr 3, 2024

github-actions bot commented Apr 3, 2024

abhilash1910 left a comment

Choose a reason for hiding this comment