Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GGML takes down the whole process even for recoverable errors #123

Open
KerfuffleV2 opened this issue May 1, 2023 · 7 comments
Open

GGML takes down the whole process even for recoverable errors #123

KerfuffleV2 opened this issue May 1, 2023 · 7 comments

Comments

@KerfuffleV2
Copy link

From the perspective of application code consuming the GGML library, it's really not ideal to have the whole process just die even if it's something as simple as not enough memory to create a tensor.

It might be difficult to get rid of all asserts but I think it would be relatively simple to very significantly limit the surface area where that kind of thing can happen. i.e. tensor creation functions could simply return NULL for recoverable errors like wrong dimension, not enough memory, etc.

Most of these operations require a context, so a fairly simple way to communicate more detail about the problem back to the user would be to have something like an errno field in the context which could hold a code for that error. You could also have a buffer there if you wanted to hold a more user-friendly description of what happened (just whatever the assert prints out now). This would require allocating... I don't know, like 128-256 bytes of extra data in the context but that doesn't seem likely to be a big issue.

Through use of a #define toggle it should be possible to preserve the current behavior (if you want the ability to not have to check for NULL from stuff like tensor operations) but allow something like a language binding to handle failure in a more graceful way.

I'm not just creating this issue to complain about the problem. If a change like that would be accepted, I can implement it myself and submit a pull.

(Also, is it preferable to submit GGML changes here or in the llama.cpp repo?)

@ggerganov
Copy link
Owner

The current "error handling" logic in ggml is the following:

  • assert things that are computationally heavy or do not rely on user input
    for example, check that all values of a tensor are not nan or that the tensor shapes during the forward call of certain operator are the expected ones
  • GGML_ASSERT things that are lightweight and/or depend on user input
    for example, the type of a tensor is the expected one

The logic is that when you build in Debug, there will be extra checks via assert, which we can afford to be computationally heavy and will also verify that the internal expectations and assumptions of the processing are correct.
We don't want these to be enabled during Release in order to not slow down the processing, but we still want to "sanitize" the user's input.

I don't think it is a good idea to change this behaviour now.
Error handling is for more "production ready" code, while we are currently in early development stage. If something is incorrect, we just want to crash immediately and fix the issue. Passing errors up the stack will just complicate the code for no reason.

The computation of the required memory size for a context has to be implemented in another way, without relying on error codes. For the moment, I think the no_alloc idea is the way to go, although it seems a bit cumbersome, so I am thinking about some alternative way as well.

@KerfuffleV2
Copy link
Author

GGML_ASSERT things that are lightweight and/or depend on user input
for example, the type of a tensor is the expected one

This is the kind of thing I'm talking about then, not the normal assert.

I'm not talking about something like "What if we're in ggml_compute and we start computing the graph and find some invariant has been violated": crashing there may not be ideal but it makes sense and like you said at this stage in GGML development you want to fail in a way that brings attention to the problem.

So a condition that indicates any sort of bug inside GGML it makes perfect sense to fail in the current way. However:

struct ggml_tensor * ggml_add(ctx, a, b) {
    GGML_ASSERT(ggml_are_same_shape(a, b));
    /* make the tensor */
    return new_tensor;
}

If the application passes in b with a different shape from a that doesn't indicate GGML did anything wrong. There's no internal GGML error. It's not a bug in GGML. Why do we have to abort() here and take down the whole process?

What I'm proposing is changing user-facing functions like that to just return NULL and save some error information in the context object. Not even as the default behavior, just something that could optionally be enabled by changing a #define.

I think this could be done with almost no actual code changes, it would mostly just be a change to the GGML_ASSERT define.

Is that still something you're opposed to?

@ggerganov
Copy link
Owner

I guess the specific example can become:

struct ggml_tensor * ggml_add(ctx, a, b) {
    if (!ggml_are_same_shape(a, b)) {
        GGML_ASSERT(false, "ggml_add: input tensors are not the same shape");
        return nullptr;
    }
    /* make the tensor */
    return new_tensor;
}

And then GGML_ASSERT can be disabled during compilation with an explicit define. This way, a user that want's to handle errors from ggml can do that by compiling with that define.

But by default, i.e. without explicitly disabling it, GGML_ASSERT must crash.
The reason is simple: when implementing a new model, or working on existing one, we write many operations and often we get things wrong. So instead of handling the error for every single ggml_add() call, we rely on GGML_ASSERT to tell us what is wrong.

@KerfuffleV2
Copy link
Author

KerfuffleV2 commented May 3, 2023

Yes, that's pretty similar to what I was thinking of but I think it can be even less invasive of a change.

This is pseudocode and the name of the defines/etc can be whatever you prefer:

#ifndef GGML_RECOVERABLE
// Exactly the same as now
#define GGML_ASSERT(x) \
    do { \
        if (!(x)) { \
            fprintf(stderr, "GGML_ASSERT: %s:%d: %s\n", __FILE__, __LINE__, #x); \
            abort(); \
        } \
    } while (0)
#else
#define GGML_ASSERT(x) \
        if (!(x)) { \
            snprintf(ctx->last_error, \
              "GGML_ASSERT: %s:%d: %s\n", __FILE__, __LINE__, #x); \
            return NULL; \
        }
#endif

In real life you'd probably want to have a slightly different name for the fallible assert, like GGML_CHECK or something like that which would take a ctx argument instead of just magically expecting it to be in scope. If GGML_RECOVERABLE was disabled, then this would just work exactly like GGML_ASSERT does currently. Otherwise it could set a field in the context (maybe an error code also) and return NULL (or nullptr, whatever).

That's just pseudocode of the basic idea I'm talking about: a real implementation would take into account stuff like the buffer not having enough space, etc. (I'm pretty sure that's not a good use of snprintf.)

Those new fields in the context could also be gated by that #define.

The reason is simple: when implementing a new model, or working on existing one, we write many operations and often we get things wrong.

Right, and you also don't want to have to have to change stuff like whisper, llama.cpp, etc to need to check all return values (in addition to breaking everything that currently uses GGML). What I'm talking about would have no changes from an application's perspective unless you turned on the define that made some errors recoverable.

Does this sound like something you'd be okay with in principle? I can spend some time making a proof of concept if so.

@ggerganov
Copy link
Owner

Yes, let's give this idea a try and see how it goes

@KerfuffleV2
Copy link
Author

KerfuffleV2 commented May 4, 2023

Great! Do you want the pull here (as opposed to the llama.cpp repo)? Also, do you have any preference for the names of the defines? Specifically, the one that is used to guard user-facing operations like tensor creation and the one that's used to turn this feature on.

@ggerganov
Copy link
Owner

In the llama.cpp repo it will get more attention, but both places work.
Don't have any preferences in mind - use whatever you think is appropriate and we can discuss in the PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants