Assertion ggml_nelements(a) == ne0ne1ne2 when loading TheBloke/Llama-2-70B-GGML/llama-2-70b.ggmlv3.q2_K.bin #2445

xvolks · 2023-07-29T14:33:35Z

Loading the Llama 2 - 70B model from TheBloke with rustformers/llm seems to work but fails on inference.

llama.cpp raises an assertion regardless of the use_gpu option :

Loading of model complete
Model size = 27262.60 MB / num tensors = 723
[2023-07-29T14:24:19Z INFO  actix_server::builder] starting 10 workers
[2023-07-29T14:24:19Z INFO  actix_server::server] Actix runtime found; starting in Actix runtime
GGML_ASSERT: llama-cpp/ggml.c:6192: ggml_nelements(a) == ne0*ne1*ne2

This might be related to the model files, but the models from TheBloke are usually reliable.

Running on MacBook Pro M1 Max 32 GB RAM.
macOS 14.0.0 23A5301g

The text was updated successfully, but these errors were encountered:

dillfrescott · 2023-07-29T23:10:46Z

Similar error happened to me too. its not the model. Its something with llama cpp. I rolled back to yesterdays commit and it worked fine

github-actions · 2024-04-09T01:07:16Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

oldmanjk · 2024-06-01T06:39:30Z

Same issue when loading DeepSeek-V2-Chat. Reopen? @ggerganov
$ ./imatrix --seed 0 --threads 24 --threads-batch 32 --file [FILE] --flash-attn --model ggml-model-f32.gguf -o [OUTPUT] --no-ppl

GGML_ASSERT: ggml.c:5715: ggml_nelements(a) == ne0*ne1
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

Edit - Trying again as the root user produced this extra output:

[New LWP 1533186]
[New LWP 1533187]
[New LWP 1533188]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007702834ea42f in __GI___wait4 (pid=1535286, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	../sysdeps/unix/sysv/linux/wait4.c: No such file or directory.
#0  0x00007702834ea42f in __GI___wait4 (pid=1535286, stat_loc=0x0, options=0, usage=0x0) at ../sysdeps/unix/sysv/linux/wait4.c:30
30	in ../sysdeps/unix/sysv/linux/wait4.c
#1  0x0000635da3c593fb in ggml_print_backtrace ()
#2  0x0000635da3c83415 in ggml_reshape_2d ()
#3  0x0000635da3ca0ec2 in llm_build_kqv(ggml_context*, llama_model const&, llama_hparams const&, llama_cparams const&, llama_kv_cache const&, ggml_cgraph*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, int, int, float, std::function<void (ggml_tensor*, char const*, int)> const&, int) ()
#4  0x0000635da3ca3797 in llm_build_kv(ggml_context*, llama_model const&, llama_hparams const&, llama_cparams const&, llama_kv_cache const&, ggml_cgraph*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, int, int, int, float, std::function<void (ggml_tensor*, char const*, int)> const&, int) [clone .constprop.0] ()
#5  0x0000635da3d1dfae in llm_build_context::build_deepseek2() ()
#6  0x0000635da3caa8cf in llama_build_graph(llama_context&, llama_batch const&, bool) ()
#7  0x0000635da3cc7f49 in llama_new_context_with_model ()
#8  0x0000635da3d40578 in llama_init_from_gpt_params(gpt_params&) ()
#9  0x0000635da3c55d48 in main ()
[Inferior 1 (process 1533172) detached]
Aborted

Edit - -ngl 0 changes nothing
Edit - -b 256 changes nothing
Edit - disabling flash attention fixed it

jacohend mentioned this issue Aug 28, 2023

WizardCoder llama assert failure rustformers/llm#417

Open

github-actions bot added the stale label Mar 25, 2024

github-actions bot closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion ggml_nelements(a) == ne0ne1ne2 when loading TheBloke/Llama-2-70B-GGML/llama-2-70b.ggmlv3.q2_K.bin #2445

Assertion ggml_nelements(a) == ne0ne1ne2 when loading TheBloke/Llama-2-70B-GGML/llama-2-70b.ggmlv3.q2_K.bin #2445

xvolks commented Jul 29, 2023

dillfrescott commented Jul 29, 2023

github-actions bot commented Apr 9, 2024

oldmanjk commented Jun 1, 2024 •

edited

Assertion ggml_nelements(a) == ne0*ne1*ne2 when loading TheBloke/Llama-2-70B-GGML/llama-2-70b.ggmlv3.q2_K.bin #2445

Assertion ggml_nelements(a) == ne0*ne1*ne2 when loading TheBloke/Llama-2-70B-GGML/llama-2-70b.ggmlv3.q2_K.bin #2445

Comments

xvolks commented Jul 29, 2023

dillfrescott commented Jul 29, 2023

github-actions bot commented Apr 9, 2024

oldmanjk commented Jun 1, 2024 • edited

Assertion ggml_nelements(a) == ne0ne1ne2 when loading TheBloke/Llama-2-70B-GGML/llama-2-70b.ggmlv3.q2_K.bin #2445

Assertion ggml_nelements(a) == ne0ne1ne2 when loading TheBloke/Llama-2-70B-GGML/llama-2-70b.ggmlv3.q2_K.bin #2445

oldmanjk commented Jun 1, 2024 •

edited