Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluate Gemma with Chat Template #2069

Open
pyf98 opened this issue Jul 5, 2024 · 3 comments
Open

Evaluate Gemma with Chat Template #2069

pyf98 opened this issue Jul 5, 2024 · 3 comments

Comments

@pyf98
Copy link

pyf98 commented Jul 5, 2024

Hi, I'm trying to evaluate gemma-it models from Hugging Face on MMLU. When I set --apply_chat_template --fewshot_as_multiturn, the tokenizer will raise an error below. This is because Gemma does not support system messages: https://huggingface.co/google/gemma-2b-it/blob/main/tokenizer_config.json#L1507

jinja2.exceptions.TemplateError: System role not supported

What is the best way to evaluate Gemma chat models? Should I use chat templating or not? Should I remove the description for each document or move the description to the first user turn instead of the system prompt? Thank you for any help!

@haileyschoelkopf
Copy link
Collaborator

Hi, this is blocked on #2058 , I'll get that merged ASAP!

@haileyschoelkopf haileyschoelkopf linked a pull request Jul 8, 2024 that will close this issue
@pyf98
Copy link
Author

pyf98 commented Jul 9, 2024

Thanks @haileyschoelkopf for your reply! It seems that we will just remove the system instruction for Gemma-style models. I'm wondering if it hurts the performance of Gemma chat models? I guess many benchmarks provide a brief description at the beginning, but those will be removed.

@KT313
Copy link

KT313 commented Sep 5, 2024

how about making an option for these kinds of models to just put the system instruction at the beginning of the first user message instead, something like --system-as-user?

so originally something like

<bos><role>system
This is a system message.
<role>user
This is a user message.

would become:

<bos><role>user
This is a system message.
This is a user message.

or with tags so that the llm can understand that it's like a system instruction:

<bos><role>user
<system>This is a system message.</system>
This is a user message.

for now i modified the chat template in the tokenizer config to this:

"chat_template": "{{ bos_token }}{% if messages[0]['role'] == 'system' %}{% for message in messages[1:] %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if (message['role'] == 'assistant') %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{% if (loop.index0  == 0)  %}{{ '<start_of_turn>' + role + '\n' + '<system>' + messages[0]['content'] + '</system>' + '\n'+ message['content'] | trim + '<end_of_turn>\n' }}{% endif %}{% if (loop.index0  != 0)  %}{{ '<start_of_turn>' + role + '\n' + message['content'] | trim + '<end_of_turn>\n' }}{% endif %}{% endfor %}{% endif %}{% if messages[0]['role'] != 'system' %}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if (message['role'] == 'assistant') %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{{ '<start_of_turn>' + role + '\n' + message['content'] | trim + '<end_of_turn>\n' }}{% endfor %}{% endif %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}"

it would be easier if you can add that option --system-as-user in lm_eval which does something like:

if messages[0]['role'] == "system":
    messages[1]['content'] = "<system>" + messages[0]['content'] + "</system>\n" + messages[1]['content']
    messages = messages[1:]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants