-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate Gemma with Chat Template #2069
Comments
Hi, this is blocked on #2058 , I'll get that merged ASAP! |
Thanks @haileyschoelkopf for your reply! It seems that we will just remove the system instruction for Gemma-style models. I'm wondering if it hurts the performance of Gemma chat models? I guess many benchmarks provide a brief description at the beginning, but those will be removed. |
how about making an option for these kinds of models to just put the system instruction at the beginning of the first user message instead, something like so originally something like
would become:
or with tags so that the llm can understand that it's like a system instruction:
for now i modified the chat template in the tokenizer config to this:
it would be easier if you can add that option
|
Hi, I'm trying to evaluate
gemma-it
models from Hugging Face on MMLU. When I set--apply_chat_template --fewshot_as_multiturn
, the tokenizer will raise an error below. This is because Gemma does not support system messages: https://huggingface.co/google/gemma-2b-it/blob/main/tokenizer_config.json#L1507What is the best way to evaluate Gemma chat models? Should I use chat templating or not? Should I remove the description for each document or move the description to the first user turn instead of the system prompt? Thank you for any help!
The text was updated successfully, but these errors were encountered: