Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot have both a group list and task list #1767

Open
steven-basart opened this issue Apr 29, 2024 · 5 comments
Open

Cannot have both a group list and task list #1767

steven-basart opened this issue Apr 29, 2024 · 5 comments
Labels
asking questions For asking for clarification / support on library usage. bug Something isn't working.

Comments

@steven-basart
Copy link

steven-basart commented Apr 29, 2024

I think this is a bug.
I'm trying to add decoding trust like so

group:
  - decoding_trust
  - dt_adv_demonstration
  - dt_adv_demonstration_spurious
task: 
  - task: spurious_entail
    test_split: spurious.PP.entailBias
    doc_to_text: "Here are some examples for the task {{examples}}. Here's the question: {{input}}"
    doc_to_choice: ['no', 'yes']
  - task: spurious_nonentail
    test_split: spurious.PP.nonEntailBias
    doc_to_text: "Here are some examples for the task {{examples}}. Here's the question: {{input}}"
    doc_to_choice: ['no', 'yes']
  - task: spurious_underverb
    test_split: spurious.embeddedUnderVerb.entailBias
    doc_to_text: "Here are some examples for the task {{examples}}. Here's the question: {{input}}"
    doc_to_choice: ['no', 'yes']
  - task: spurious_underverb_nonentail
    test_split: spurious.embeddedUnderVerb.nonEntailBias
    doc_to_text: "Here are some examples for the task {{examples}}. Here's the question: {{input}}"
    doc_to_choice: ['no', 'yes']
  - task: spurious_relclause_entail
    test_split: spurious.lRelativeClause.entailBias
    doc_to_text: "Here are some examples for the task {{examples}}. Here's the question: {{input}}"
    doc_to_choice: ['no', 'yes']
  - task: spurious_relclause_nonentail
    test_split: spurious.lRelativeClause.nonEntailBias
    doc_to_text: "Here are some examples for the task {{examples}}. Here's the question: {{input}}"
    doc_to_choice: ['no', 'yes']
  - task: spurious_passive_entail
    test_split: spurious.passive.entailBias
    doc_to_text: "Here are some examples for the task {{examples}}. Here's the question: {{input}}"
    doc_to_choice: ['no', 'yes']
  - task: spurious_passive_nonentail
    test_split: spurious.passive.nonEntailBias
    doc_to_text: "Here are some examples for the task {{examples}}. Here's the question: {{input}}"
    doc_to_choice: ['no', 'yes']
  - task: spurious_passive_nonentail
    test_split: spurious.sRelativeClause.entailBias
    doc_to_text: "Here are some examples for the task {{examples}}. Here's the question: {{input}}"
    doc_to_choice: ['no', 'yes']
  - task: spurious_passive_nonentail
    test_split: spurious.sRelativeClause.nonEntailBias
    doc_to_text: "Here are some examples for the task {{examples}}. Here's the question: {{input}}"
    doc_to_choice: ['no', 'yes']
dataset_path: AI-Secure/DecodingTrust
dataset_name: adv_demonstration
output_type: multiple_choice
doc_to_target: label
metric_list:
  - metric: acc
metadata:
  version: 1.0

However I encounter an error here:

File "lm-evaluation-harness/lm_eval/tasks/__init__.py", line 314, in _get_task_and_group
    tasks_and_groups[config["group"]] = {
TypeError: unhashable type: 'list'
@haileyschoelkopf
Copy link
Contributor

Hi!

Could you elaborate more about what the particular use case of defining both multiple tasks and multiple groups in a given YAML file would be?

Currently, we operate under the assumption that if a list of tasks is specified, that the YAML is being used to define a single new group/benchmark which will aggregate over that list of tasks. If a list of tasks is not specified, then the file is a task YAML and is only meant to define a single task for now.

Would setting things up similar to MMLU solve your use case, with one group containing a few subgroups and their subtasks? https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/mmlu/default/_mmlu.yaml

@haileyschoelkopf haileyschoelkopf added bug Something isn't working. asking questions For asking for clarification / support on library usage. labels May 1, 2024
@steven-basart
Copy link
Author

I ended up doing that (multiple files like mmlu) to fix the problem.

I viewed a group as a way to call evaluating tasks. A list of groups was a way to call this task or set of tasks from multiple different names. I didn't see anywhere in the docs that you were only supposed to have either a group list or a task list but not both hence I made this issue. If this intended behavior happy to make a PR to update the docs with the implicit assumptions made explicit.

I wanted it mostly for convenience to create this task list itself as a group and be able to call this task list from multiple different group names ala decoding_trust or dt_subset

@haileyschoelkopf
Copy link
Contributor

Makes sense!

A docs update would be greatly appreciated! We are planning on moving toward differentiating "groups"/tags (a convenient name/shortcut to call a group of tasks) and "groups" / benchmarks which require a separate YAML file and should explicitly define how to aggregate a single score over a task grouping, which I hope will add clarity here too.

@vovoluck
Copy link

vovoluck commented Jun 3, 2024

@haileyschoelkopf
If I have firewall issues, and want to load the dataset from disk, not from online downloading, where should I use the load_from_disk function from datasets lib, in lm_eval codes?

@haileyschoelkopf
Copy link
Contributor

Hi @vovoluck , this seems like a separate issue from the one here. However, Task (and particularly ConfigurableTask objects) have a download() method, which you could edit to use load_from_disk to your liking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asking questions For asking for clarification / support on library usage. bug Something isn't working.
Projects
None yet
Development

No branches or pull requests

3 participants