-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
callback to abort ggml_graph_compute() #328
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could be wrong, but I think that the introduction of pthread_cancel
is not necessary.
If you resolve this - we can merge
Thanks for the quick response!
|
Really good idea! We can use this quite well in Just weighing in here - the callback should accept some
and
This is important for the case where we need to query some external data (eg, the |
Co-authored-by: Georgi Gerganov <[email protected]>
Adding The overhead from every thread checking the callback should not be too big |
|
Even if
|
I see, @mqy that's also a good way to do it. Unsure which way to go. |
Me too, sorry, I saw your new commit after submitting comment :) The callback is graceful.
I can't say yes or no, it depends. Anyway, please let me ask some questions:
Overall I think it's not good to create So, I suggest adding state management APIs, for example: ggml_graph_compute_abort(ctx_or_plan) or more generic ggml_graph_compute_set_state(ctx_or_plan, ABORT). |
I think using a callback for aborting is pretty standard (e.g. tensorflow/keras). It's also easier for a user to pass in a function rather than having to create a separate thread and modifying some shared state to abort. By his comment earlier, I think @smspillaz plans on using a callback, but I'm unsure.
Agreed! The context is the best place to store it. |
We should wait until ggml: breaking change to ggml_graph_compute(): planning is required before computing llama.cpp#1999 is merged or just add this feature into it. |
Sure, callback is more versatile, and convenient to use, this is why I said Overall, if we consider conditional abort is important, I think the callback design wins. |
After merging ggerganov/llama.cpp#1999 the I think we should add the abort callback to the new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know when to merge 👍
I was waiting for that to be merged so I could add it onto Just made the changes, @ggerganov |
@@ -15977,6 +15981,10 @@ static thread_ret_t ggml_graph_compute_thread(void * data) { | |||
int node_n = -1; | |||
|
|||
while (true) { | |||
if (cplan->abort_callback && cplan->abort_callback(cplan->abort_callback_data)) { | |||
state->shared->node_n += 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found an issue while testing that sometimes all but one thread aborts and that one thread is spinning, waiting for shared->node_n
to be updated. So, I decided to modify node_n
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. Would be nice to have some test program for this functionality added
closes #308
This assumes that once the callback returns true, it will continue to return true