Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add chat template #1873

Merged
merged 30 commits into from
Jun 3, 2024
Merged

Conversation

KonradSzafer
Copy link
Contributor

This pull request adds a chat template that allows a few examples to be provided as a conversation between user and assistant, or as a single message from the user, as well as providing a system prompt.

@clefourrier @NathanHB

@clefourrier
Copy link
Contributor

@KonradSzafer fix the linting please

Copy link
Contributor

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much @KonradSzafer for bearing with me!

had a couple other small comments about edge cases, but in general, looks really great!

lm_eval/api/model.py Outdated Show resolved Hide resolved
lm_eval/api/task.py Show resolved Hide resolved
lm_eval/api/task.py Outdated Show resolved Hide resolved
Copy link
Contributor

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final round of comments! Will also add a docs update to docs/model_guide.md imminently.

lm_eval/api/model.py Outdated Show resolved Hide resolved
docs/interface.md Outdated Show resolved Hide resolved
lm_eval/api/model.py Show resolved Hide resolved
lm_eval/api/model.py Outdated Show resolved Hide resolved
@LSinev
Copy link
Contributor

LSinev commented May 31, 2024

May I suggest template actually used to be reported in some artifacts from run (with or without --log-samples used) and also the system instruction.

lm_eval/evaluator.py Outdated Show resolved Hide resolved
lm_eval/evaluator.py Outdated Show resolved Hide resolved
@KonradSzafer
Copy link
Contributor Author

May I suggest template actually used to be reported in some artifacts from run (with or without --log-samples used) and also the system instruction.

Hi @LSinev! Good point, updated the branch with the changes!

Copy link
Contributor

@haileyschoelkopf haileyschoelkopf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for all your work on this :)

docs/model_guide.md Outdated Show resolved Hide resolved
docs/model_guide.md Outdated Show resolved Hide resolved
Co-authored-by: Hailey Schoelkopf <[email protected]>
@haileyschoelkopf haileyschoelkopf merged commit 070d31d into EleutherAI:main Jun 3, 2024
8 checks passed
@djstrong
Copy link
Contributor

djstrong commented Jun 8, 2024

I have tested our model with the template. Without template:
image
image
With chat template:
image

Avg g - average of generate_until versions of tasks
Avg mc - average of multiple_choice versions of tasks

I will evaluate more models.

@djstrong
Copy link
Contributor

What do you think, models should be tested with argument fewshot_as_multiturn or without?

@KonradSzafer
Copy link
Contributor Author

Hi @djstrong!
fewshot_as_multiturn tends to result in lower scores, but is more reflective of the real-world use of the model.
For learning more about the differences in results when using the chat template, I recommend reading this disscusion.

@djstrong
Copy link
Contributor

I tested this only for one model and I confirm that with fewshot_as_multiturn scores are mostly lower.

I have tested dozen of models with and without chat templates but need to analyze results. Available here: https://huggingface.co/spaces/speakleash/open_pl_llm_leaderboard with ,chat sufix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants