Baby-llama.cpp report bus error #4830

YangZyyyy · 2024-01-09T03:16:07Z

System: MacOS Ventura 13.2.1
CPU: M2 Pro

Reproduction process

git clone llama.cpp
cd llama.cpp
make
./baby-llama
And then, the terminal will print:

init model
init_kv_cache
zsh: bus error  ./baby-llama

And similar error on my linux server

System: Ubuntu 22.04.2 LTS
Architecture: x86_64
CPU(s): 128
Model name: Intel(R) Xeon(R) CPU @ 2.90GHz

Under the same steps, the terminal will print the following error：

init model
init_kv_cache
Floating point exception (core dumped)

The text was updated successfully, but these errors were encountered:

NawafAlansari · 2024-02-16T18:21:00Z

I have encountered the same issue while trying to run this example. I am trying to implement something similar to what is described in this issue. If no one else is on it, I'd like to try finding and implementing a fix.

NawafAlansari · 2024-02-16T18:59:14Z

After further investigation, I've pinpointed the issue within the forward_batch function, specifically at line 988, which calls ggml_build_forward_expand(gf, inpL). It seems the root of the problem lies in the size of gf->visited_hash_table being zero at the time of this call.

Here's a breakdown of the call sequence leading to the exception:

ggml_build_forward_expand(gf, inpL) is invoked, passing the computation graph gf as an argument.
Inside ggml_build_forward_expand, the computation graph gf is passed to ggml_visit_parents.
ggml_visit_parents then passes the graph's hash table to ggml_hash_insert.
Finally, ggml_hash_insert calls ggml_hash_find, where the issue manifests. The problematic line in ggml_hash_find is the first one:

   size_t h = ggml_hash(key) % hash_set.size;

it seems that hash_set.size is zero, which results in a division by zero, leading to SIGFPE (Arithmetic exception). So it appears that the division by zero in ggml_hash_find is the main cause.

I'm currently exploring why gf->visited_hash_table's size is zero at this point in the execution and how we can ensure it's properly initialized before reaching this critical operation. Any further suggestions would be greatly appreciated.

NawafAlansari · 2024-02-16T19:53:41Z

Okay I was able to fix the issue by just changing the lines that contain

gf = {}

in the main function to

struct ggml_cgraph * gf = NULL; 
gf = ggml_new_graph_custom(ctx0, LLAMA_TRAIN_MAX_NODES, true);

I am not sure this is the best way to go about it, but it works for now, any suggestions would be appreciated. I can also try to push the fix.

slaren · 2024-02-16T22:29:55Z

@NawafAlansari That looks correct. When the baby-llama example was created, graphs had a fixed size and could be allocated in the stack, but that was changed a while ago and now graphs need to be allocated in a ggml_context (see ggerganov/ggml#567 for more details). A PR to fix the example would be very welcome.

NawafAlansari · 2024-02-18T22:32:45Z

@slaren I just made a PR with a fix for the example.

* Fixed the baby-llama issue (see issue #4830) * minor : fix whitespaces --------- Co-authored-by: Georgi Gerganov <[email protected]>

* Fixed the baby-llama issue (see issue ggerganov#4830) * minor : fix whitespaces --------- Co-authored-by: Georgi Gerganov <[email protected]>

github-actions · 2024-04-04T01:06:53Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

YangZyyyy added the bug-unconfirmed label Jan 9, 2024

NawafAlansari added a commit to NawafAlansari/llama.cpp that referenced this issue Feb 18, 2024

Fixed the baby-llama issue (see issue ggerganov#4830)

21b857e

NawafAlansari mentioned this issue Feb 18, 2024

Fixed the baby-llama issue (see issue #4830) #5573

Merged

ggerganov added a commit that referenced this issue Feb 19, 2024

baby-llama : allocate graphs in ggml_context (#5573)

4480542

* Fixed the baby-llama issue (see issue #4830) * minor : fix whitespaces --------- Co-authored-by: Georgi Gerganov <[email protected]>

github-actions bot added the stale label Mar 20, 2024

github-actions bot closed this as completed Apr 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Baby-llama.cpp report bus error #4830

Baby-llama.cpp report bus error #4830

YangZyyyy commented Jan 9, 2024 •

edited

Loading

NawafAlansari commented Feb 16, 2024

NawafAlansari commented Feb 16, 2024

NawafAlansari commented Feb 16, 2024

slaren commented Feb 16, 2024

NawafAlansari commented Feb 18, 2024

github-actions bot commented Apr 4, 2024

Baby-llama.cpp report bus error #4830

Baby-llama.cpp report bus error #4830

Comments

YangZyyyy commented Jan 9, 2024 • edited Loading

NawafAlansari commented Feb 16, 2024

NawafAlansari commented Feb 16, 2024

NawafAlansari commented Feb 16, 2024

slaren commented Feb 16, 2024

NawafAlansari commented Feb 18, 2024

github-actions bot commented Apr 4, 2024

YangZyyyy commented Jan 9, 2024 •

edited

Loading