-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml assert hook #701
ggml assert hook #701
Conversation
# Conflicts: # src/ggml.c
ggml needs better error handling, but when a If you have a program that absolutely must never end with an |
I don't agree with this. For example, in stable-diffusion, I can try catch each model, and then reallocate memory and initialize it. Compared with abort, we can try catch globally in our own program and restart the program. This is better than abort. |
If we want to maintain the abort logic, I would rather force the abort logic to be maintained in the debug stage. However, once we reach the release stage, we still need to have recovery measures. Otherwise, we will have to spend a lot of energy to open a daemon process, and these performances are originally used for inferential. |
Well, this is not a matter of opinion, a failed To be clear,I agree that we need better error handling in ggml, asserts should be used only to detect logical errors in ggml, but as it is right now, this is just the way it is. |
I have read the previous history, and I know the purpose of ggml's current ASSERT, so I retained the original logic in the code, and this hook is only left to those who think carefully and know how to handle them.If you think it's not possible, do you have any better measures? |
There are a few things you can do to minimize the chance of hitting an assert:
|
I need to show you the hell I encountered. This is very scary for me at the moment. Since golang has a defer recover mechanism, a segmentation fault is received and the program should exit, but it enters suspended animation and does not know how to recover.
|
What's even more helpless is that I can't catch any errors at all. |
@Cyberhan123 What kind of errors do you want to handle in
Think about If Microsoft is scanning for |
It's a very simple idea. I can get the intended error when calling the language through golang. The above error is actually a simple setting problem. I found it through gdb tracking. For stable-diffusion.cpp, it is needed in image generation. When img2img, set |
And when we use common app, there are often many serious errors. As a person with experience in developing for mass users, my usual approach is to |
So the problem is that the assumption of abort makes the program too strict. As a developer, I cannot guarantee that the coding is completely error-free, so that the tensor entered in ggml is also perfect. I and the developers who have invested a lot of energy Including you, there is nothing wrong. If your suggestion is to start a daemon process, then I can only accept it. |
I think that it is very important to do proper error handling for application developers to be able to use ggml in production software. The software should never crash except for purely logical errors within ggml. Currently, ggml does not have a framework to report errors to the user. The only way we have to report an error in some functions is by returning At some point, we will approach a "ggml 1.0" release and we will need to address this and other issues, but we are not there yet. I think that running the ggml code as a daemon in a different process is the best solution at the moment. |
Glad you can reply, I think reporting log is not something done by ggml library, it should be done by llama.cpp, of course it is up to you to decide. I also hope that ggml can have a better error handling mechanism. Thank you for your selfless dedication, which allows me to use ai on the amd gpu of windows. |
Hello, @slaren @ggerganov , if you have time, please give your suggestions.
When I was writing the desktop version of stable-diffusion.cpp, I encountered many situations where the program crashed due to the abort of
GGML_ASSERT
. Even though there was no error in the desktop program, it was just user input errors. I tracked the historical issues: #123I found that this is an ambitious plan, but it is too complicated for ggml. I want to add a behavior hook only for GGML_ASSERT. While ensuring the default behavior, it also allows users to customize the behavior.
For c++ users, they can use throw And catch the exception, release the memory and restore the program.
For C users, they can use the C exception library to hook, or they can implement their own exception mechanism through
by longjump.
It will also be friendly to golang users like me. I can use try catch in stable-diffusion.cpp to manage exceptions, and then pass errors by exposing methods like
sd_get_latest_err()
.To be honest, I am not good at c/c++, so I don’t want to expand the scope of implementation. I want to leave the remaining thinking space to the community. Can we add a GGML_ERROR method, and then refine the error, and Eliminate the coexistence of assert and GGML_ASSERT in the library, and gradually convert from GGML_ASSERT to GGML_ERROR, so that we can avoid hard-coding error messages or having to look at the code and line number of assertions.
I have collected the related questions below:
ggerganov/llama.cpp#4627
ggerganov/llama.cpp#4385
If the program is aborted, it is not allowed to be put on the windows store. see detail:https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/abort?view=msvc-170