-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot have both a group list and task list #1767
Comments
Hi! Could you elaborate more about what the particular use case of defining both multiple tasks and multiple groups in a given YAML file would be? Currently, we operate under the assumption that if a list of tasks is specified, that the YAML is being used to define a single new group/benchmark which will aggregate over that list of tasks. If a list of tasks is not specified, then the file is a task YAML and is only meant to define a single task for now. Would setting things up similar to MMLU solve your use case, with one group containing a few subgroups and their subtasks? https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu/default/_mmlu.yaml |
I ended up doing that (multiple files like mmlu) to fix the problem. I viewed a group as a way to call evaluating tasks. A list of groups was a way to call this task or set of tasks from multiple different names. I didn't see anywhere in the docs that you were only supposed to have either a group list or a task list but not both hence I made this issue. If this intended behavior happy to make a PR to update the docs with the implicit assumptions made explicit. I wanted it mostly for convenience to create this task list itself as a group and be able to call this task list from multiple different group names ala |
Makes sense! A docs update would be greatly appreciated! We are planning on moving toward differentiating "groups"/tags (a convenient name/shortcut to call a group of tasks) and "groups" / benchmarks which require a separate YAML file and should explicitly define how to aggregate a single score over a task grouping, which I hope will add clarity here too. |
@haileyschoelkopf |
Hi @vovoluck , this seems like a separate issue from the one here. However, |
I think this is a bug.
I'm trying to add decoding trust like so
However I encounter an error here:
The text was updated successfully, but these errors were encountered: