Something magic in here... #10
trinhdoduyhungss
started this conversation in
General
Replies: 1 comment 7 replies
-
Please tell model size and data type (FP16, FP32, Q4_0, Q4_1). This is a known issue: #8 I've bumped the memory today morning. Pleast try latest commit and check if it works. If it does not work, then that's a bug in |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I tried to run your code on my computer (Windows OS). Everything's so smooth and it works so well. No matter if my local computer only has 16GB RAM (just 8GB available), 14B raven still works well (even though so slow - 2 minutes for answering "What is your name"). However...The magic started from here when I push the weight into my server (Linux) with 126GB RAM (100GB available) and 32 cores CPU....but I got the error:
Loading 20B tokenizer
System info: AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
Loading RWKV model
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 10662867344, available 10662862009)
Segmentation fault (core dumped)
What happens here?
I tried to download, convert it to the ggml, quantize the model one more time at the server, and run it again but still got the error above. Hmmm, I have no idea...
Beta Was this translation helpful? Give feedback.
All reactions