Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml: add hw_accel in data structure #2054

Closed
wants to merge 2 commits into from

Conversation

zhouwg
Copy link
Contributor

@zhouwg zhouwg commented Apr 15, 2024

this PR is for multi purpose:
(1) try to fix issue ggerganov/ggml#795

(2) borrow some advantages from PyTorch(the user could specify whether a GGML OP(such as mulmat) is accelerated by a specify backend)

(3) prepare for submit Qualcomm's QNN backend to upstream GGML from "PoC: Add Qualcomm mobile SoC native backend for GGML,zhouwg/kantv#121 ". whisper.cpp at the first, then llama.cpp, because llama.cpp is much more complicated then whisper.cpp

pls refer to this commit:zhouwg/kantv@ce44da6

or

pls refer to this commit: https://github.com/zhouwg/kantv/blob/kantv-poc-with-qnn/core/ggml/llamacpp/ggml.c#L16137

this is a workaround(it breaks OO principle in internal of original GGML) solution/method for this TODO in original ggml: https://github.com/ggerganov/ggml/blob/master/src/ggml-backend.c#L1127

I personally think this member is not redundant(it's NOT same to existing "backend" in "struct ggml_tensor") and it will NOT bring side-effect to existing codes. of course, I understand we should not bring too much "useful codes" into existing implementation of GGML internal and we should keep GGML as compact/clean as possible.


updated on 04-17-2024, not essential, there is another better method(workaround) for this TODO in original ggml: https://github.com/ggerganov/ggml/blob/master/src/ggml-backend.c#L1127.

in the fact, the "gpu_device" in struct whisper_context_params is same to use_hwaccel semantically. a special value of "gpu_device" could be considered non hardware acceleration or fall into the original default backend.

there are 2 * n combinations here: 2(use_gpu:true/false) * n (gpu_device:0-n)

   struct whisper_context_params {
        bool  use_gpu;
        int   gpu_device;  // CUDA device

        // [EXPERIMENTAL] Token-level timestamps with DTW
        bool dtw_token_timestamps;
        enum whisper_alignment_heads_preset dtw_aheads_preset;

        int dtw_n_top;
        struct whisper_aheads dtw_aheads;

        size_t dtw_mem_size; // TODO: remove
    };

so I'd like to close this PR accordingly.


updated on 04-17-2024,22:58

this member is useful/helpful for some scenarios. so I'd like reopen this PR.

@zhouwg zhouwg closed this Apr 17, 2024
@zhouwg
Copy link
Contributor Author

zhouwg commented Apr 17, 2024

@hey-shashikant, thanks for your time and approval. could you help to take a look another PR:#2073 (same to this PR)?

this PR is useful&helpful for some scenarios.

Thank again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants