prompt is too long (539 tokens, max 508) #536

muhammadfhadli1453 · 2023-09-25T00:28:50Z

When i run the llama2 model after quantize i got this following error. I tough max tokens for llama2 is 4096 tokens.

llama_new_context_with_model: kv self size  =  256,00 MB
llama_new_context_with_model: compute buffer total size =   71,97 MB
llama_new_context_with_model: VRAM scratch buffer: 70,50 MB

system_info: n_threads = 20 / 40 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | 
main: error: prompt is too long (539 tokens, max 508)

i follow this tutorial to quantize the model: https://towardsdatascience.com/quantize-llama-models-with-ggml-and-llama-cpp-3612dfbcc172

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompt is too long (539 tokens, max 508) #536

prompt is too long (539 tokens, max 508) #536

muhammadfhadli1453 commented Sep 25, 2023

prompt is too long (539 tokens, max 508) #536

prompt is too long (539 tokens, max 508) #536

Comments

muhammadfhadli1453 commented Sep 25, 2023