-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : add option for controlling work distribution across threads #291
Comments
Making this configurable would also be nice for the cuBLAS backend. When the whole model fits on the GPU, increasing the number of threads doesn't improve token/sec eval time. But it does increase the CPU load on the system due to the busy loop. Even with So a yield flag would be a great addition to give the user control. A busy-loop with a fallback to a yield might also be a good 'automatic' solution, that could be used as default. |
See ggerganov/llama.cpp#1507
And comment: ggerganov/llama.cpp#1507 (comment)
Another thing to be investigated is the usage of
sched_yield()
and potentially making it user configurable:ggerganov/whisper.cpp@09a6325
The text was updated successfully, but these errors were encountered: