Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

callback to abort ggml_graph_compute() #328

Merged
merged 13 commits into from
Jul 11, 2023
Merged

Conversation

CCLDArjun
Copy link
Contributor

@CCLDArjun CCLDArjun commented Jul 2, 2023

closes #308

This assumes that once the callback returns true, it will continue to return true

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be wrong, but I think that the introduction of pthread_cancel is not necessary.
If you resolve this - we can merge

src/ggml.c Outdated Show resolved Hide resolved
include/ggml/ggml.h Show resolved Hide resolved
src/ggml.c Outdated Show resolved Hide resolved
@CCLDArjun
Copy link
Contributor Author

Thanks for the quick response!

pthread_join waits for the thread to exit, so it would require every thread to check the abort callback. Because that seemed inefficient, i decided to only have the main thread check it and clean everything up.

@smspillaz
Copy link
Contributor

Really good idea! We can use this quite well in ggml-gobject with GCancellable.

Just weighing in here - the callback should accept some user_data pointer. Eg:

void ggml_graph_compute_with_abort(struct ggml_context * ctx, struct ggml_cgraph * cgraph, bool (*abort_callback)(void *), void *abort_callback_data)

and

if (abort_callback(abort_callback_data)) { ... }

This is important for the case where we need to query some external data (eg, the GCancellable state in my case) in order to determine whether to abort computation.

@ggerganov
Copy link
Owner

pthread_join waits for the thread to exit, so it would require every thread to check the abort callback. Because that seemed inefficient, i decided to only have the main thread check it and clean everything up.

Adding pthread_cancel would also need some Windows-specific implementation and as far as I can understand how this call works, it can leave things into intermediate state. I could be misunderstanding, but I prefer to avoid it.

The overhead from every thread checking the callback should not be too big

@CCLDArjun CCLDArjun requested a review from ggerganov July 5, 2023 07:16
@Green-Sky
Copy link
Contributor

pthread_cancel and equivalent are not available on all platforms too (thinking of android here)

@mqy
Copy link

mqy commented Jul 5, 2023

Even if pthread_cancel or sig USER works, that are async way to force stopping running threads.
Async is not easy to control, thinks about not all threads are killed at once, some are stopping, but some are still running.

Graceful stop is preferred if possible, not only because we can control details, but also because we can implement it.

  • ggml_graph_comput() execute node runners one by one, thus we have natural checkpoints between node computing.
    average node computing is fast; there are hundreds of mul_mat in every graph compute, even if single mul_mat is slow, chances we abort at next node.
  • in ggml : change ggml_graph_compute() API to not require context llama.cpp#1999, the ggml_graph_compute_plan is passed into ggml_graph_compute() and ggml_graph_compute_thread(). To notify thread group, we simply add a field abort into ggml_graph_compute_plan.

@CCLDArjun
Copy link
Contributor Author

I see, pthread_cancel seems like a bad idea. Earlier today I changed it so that every thread calls the callback and immediately returns. This might be graceful?

@mqy that's also a good way to do it. Unsure which way to go.

@mqy
Copy link

mqy commented Jul 5, 2023

I see, pthread_cancel seems like a bad idea. Earlier today I changed it so that every thread calls the callback and immediately returns. This might be graceful?

Me too, sorry, I saw your new commit after submitting comment :) The callback is graceful.

Unsure which way to go.

I can't say yes or no, it depends. Anyway, please let me ask some questions:

  1. to determine whether abort or not, compute threads have to call an external function with opaque data, this is unnecessary, a bool flag is enough, right?
  2. why the new ggml_graph_compute_with_abort()? to avoid changing signature of ggml_graph_compute?

Overall I think it's not good to create ggml_graph_compute_with_abort() which changed the compute implementation even if just a thin wrapper. It should be seen as anti-pattern. Suppose in the future we want to support suspend/resume, should we add ggml_graph_compute_with_suspend and ggml_graph_compute_with_resume?

So, I suggest adding state management APIs, for example: ggml_graph_compute_abort(ctx_or_plan) or more generic ggml_graph_compute_set_state(ctx_or_plan, ABORT).

@CCLDArjun
Copy link
Contributor Author

I can't say yes or no, it depends. Anyway, please let me ask some questions:

  1. to determine whether abort or not, compute threads have to call an external function with opaque data, this is unnecessary, a bool flag is enough, right?

I think using a callback for aborting is pretty standard (e.g. tensorflow/keras). It's also easier for a user to pass in a function rather than having to create a separate thread and modifying some shared state to abort. By his comment earlier, I think @smspillaz plans on using a callback, but I'm unsure.

  1. why the new ggml_graph_compute_with_abort()? to avoid changing signature of ggml_graph_compute?

Overall I think it's not good to create ggml_graph_compute_with_abort() which changed the compute implementation even if just a thin wrapper. It should be seen as anti-pattern. Suppose in the future we want to support suspend/resume, should we add ggml_graph_compute_with_suspend and ggml_graph_compute_with_resume?

So, I suggest adding state management APIs, for example: ggml_graph_compute_abort(ctx_or_plan) or more generic ggml_graph_compute_set_state(ctx_or_plan, ABORT).

Agreed! The context is the best place to store it.

@CCLDArjun
Copy link
Contributor Author

We should wait until ggml: breaking change to ggml_graph_compute(): planning is required before computing llama.cpp#1999 is merged or just add this feature into it.

@mqy
Copy link

mqy commented Jul 6, 2023

I think using a callback for aborting is pretty standard

Sure, callback is more versatile, and convenient to use, this is why I said it depends.

Overall, if we consider conditional abort is important, I think the callback design wins.

@ggerganov
Copy link
Owner

@CCLDArjun

After merging ggerganov/llama.cpp#1999 the ggml_graph_compute() API has changed now so the PR needs to be adapted.

I think we should add the abort callback to the new struct ggml_cplan. The user will set the callback and the data by directly accessing the members of the struct. The new ggml_graph_compute_with_ctx() will not support abort for now as we want to eventually move away from it anyway.

Copy link
Owner

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know when to merge 👍

@CCLDArjun
Copy link
Contributor Author

I was waiting for that to be merged so I could add it onto struct ggml_cplan but I didn't realize that it had been merged.

Just made the changes, @ggerganov

@ggerganov ggerganov merged commit aded898 into ggerganov:master Jul 11, 2023
2 checks passed
@@ -15977,6 +15981,10 @@ static thread_ret_t ggml_graph_compute_thread(void * data) {
int node_n = -1;

while (true) {
if (cplan->abort_callback && cplan->abort_callback(cplan->abort_callback_data)) {
state->shared->node_n += 1;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found an issue while testing that sometimes all but one thread aborts and that one thread is spinning, waiting for shared->node_n to be updated. So, I decided to modify node_n

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Would be nice to have some test program for this functionality added

ggerganov added a commit that referenced this pull request Jul 11, 2023
ggerganov added a commit that referenced this pull request Jul 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ggml : add mechanism to abort ggml_graph_compute()
5 participants