Question about prompt formatting issue #1543

kcz358 · 2024-03-08T11:32:43Z

Hi, thank you for such a convenient tool for evaluating llm.

I was testing different models on my own custom tasks and find out that the performance was way lower than I expected. I logged the results and using the same prompt to do a sanity check using Mixtral 8x7b Instruct model. However, I found out that the results are some what different.

I suspect that it is because that some special tokens are not included in the huggingface LMM tok_encode function or tok_batch_encode function because in these functions the strings are directly tokenized by the tokenizer. However, in lots of model, the prompt needs to apply template for example

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")

In Mixtral 8x7b Instruct, the special tokens [INST] will be added. I wonder if there could be a feature similar to custom task that we can add a include_model_path and then we inherit from the hf LLM to overwrite tok_encode or tok_batch_encode function to achieve best performance. I believe this can be more flexible for user to add new models without cloning the whole repo.

Thank you!

The text was updated successfully, but these errors were encountered:

LSinev · 2024-03-08T16:22:02Z

May be connected with
#1098
#1209
#1287
#1490

LSinev · 2024-03-14T13:44:31Z

Check this PR, please, if it is good for you: #1578

kcz358 · 2024-03-15T02:14:43Z

Hi, @LSinev. Currently hardcode my chat template for models I want to test. Would also check this PR and would like to ask if this will be in the next release for pip?

LSinev · 2024-03-15T06:28:40Z

I think, you can add pro and cons for particular PRs in the next release to the discussion here: #1560 (comment)

LSinev mentioned this issue Mar 14, 2024

Create new release and ship to PyPI #1560

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about prompt formatting issue #1543

Question about prompt formatting issue #1543

kcz358 commented Mar 8, 2024

LSinev commented Mar 8, 2024

LSinev commented Mar 14, 2024

kcz358 commented Mar 15, 2024

LSinev commented Mar 15, 2024

Question about prompt formatting issue #1543

Question about prompt formatting issue #1543

Comments

kcz358 commented Mar 8, 2024

LSinev commented Mar 8, 2024

LSinev commented Mar 14, 2024

kcz358 commented Mar 15, 2024

LSinev commented Mar 15, 2024