Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Group agg rework #1741

Open
wants to merge 84 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
84 commits
Select commit Hold shift + click to select a range
2a2566e
add greoup_config arg
lintangsutawika Apr 23, 2024
6da6d18
add a group config that allows disabling table for group score and gr…
lintangsutawika Apr 23, 2024
5a98162
fixed size configuration
lintangsutawika Apr 24, 2024
cd0ad1b
adjust config
lintangsutawika Apr 24, 2024
eb9c6a5
add group config
lintangsutawika Apr 24, 2024
4dd6906
adjust mmlu to use group_config
lintangsutawika Apr 24, 2024
9551bbf
fixed args input in aggregate_subtask_metrics
lintangsutawika Apr 25, 2024
bcc887a
fixed issues related to printing alias of group and updated yaml
lintangsutawika Apr 25, 2024
939a0cb
Merge branch 'main' of https://github.com/EleutherAI/lm-evaluation-ha…
lintangsutawika Apr 25, 2024
9ccec64
update all mmlu variants to include group_config
lintangsutawika Apr 25, 2024
0579b30
edit format
lintangsutawika Apr 25, 2024
9633619
modify mmlu tasks
lintangsutawika May 6, 2024
3473e19
adjust group to also be a configurable group
lintangsutawika May 7, 2024
86039e8
add configurable group
lintangsutawika May 7, 2024
fe2a147
simplify get_task_list
lintangsutawika May 7, 2024
62572f0
adjust group scoring with using ConfigurableGroup
lintangsutawika May 7, 2024
09bc7c6
adjust args
lintangsutawika May 7, 2024
cb085b0
update mmlu
lintangsutawika May 7, 2024
c23c930
update mmlu
lintangsutawika May 7, 2024
ad70d20
update to work with new group and task configuration
lintangsutawika May 7, 2024
03982e0
readd group_agg
lintangsutawika May 7, 2024
75dfac4
readd files
lintangsutawika May 7, 2024
110e5a2
move prepare_print_tasks to evaluator_utils
lintangsutawika May 7, 2024
8d59330
resolved merge conflict
lintangsutawika May 7, 2024
9f698c2
sort set to False by default, fix predict_only arg
lintangsutawika May 7, 2024
2a81753
add version for groups
lintangsutawika May 7, 2024
5637397
reversed task list
lintangsutawika May 7, 2024
44d7039
update additional condition when loading a group in a group yaml
lintangsutawika May 8, 2024
9f06432
update truthfulqa
lintangsutawika May 8, 2024
3f770bb
add description regarding tags replacing group
lintangsutawika May 8, 2024
2f2322b
replace group to tag
lintangsutawika May 8, 2024
c6b8132
Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-e…
lintangsutawika May 8, 2024
a88886a
fixed conditional statement
lintangsutawika May 8, 2024
1fae728
remove warning
lintangsutawika May 10, 2024
f36fb47
update loading of task group and newly added tags
lintangsutawika May 10, 2024
1320394
reformat with pre-commit
lintangsutawika May 10, 2024
7350958
fixed info log
lintangsutawika May 10, 2024
719fa9b
update
lintangsutawika May 10, 2024
5c289f9
fix bug
lintangsutawika May 10, 2024
5a3a957
fix bug
lintangsutawika May 10, 2024
2f3d427
use task id to differentiate tasks
lintangsutawika May 10, 2024
e098647
convert all groups to configurable groups
lintangsutawika May 10, 2024
39c4027
use task_id
lintangsutawika May 10, 2024
0905615
reformat
lintangsutawika May 10, 2024
78c3f7d
add task_id for python tasks as well
lintangsutawika May 11, 2024
1ef5b0b
add task_id for python tasks as well
lintangsutawika May 11, 2024
f4d2e6e
add task_id for python tasks as well
lintangsutawika May 11, 2024
3924608
revert truthfulqa
lintangsutawika May 11, 2024
0ffdee0
revert mmlu tasks
lintangsutawika May 11, 2024
1960eb9
new mmlu config
lintangsutawika May 11, 2024
41e64b2
new group config parameter `tag_to_task`
lintangsutawika May 11, 2024
79ce346
Update truthfulqa_mc2.yaml
lintangsutawika May 11, 2024
aef7028
reformate
lintangsutawika May 11, 2024
6d1753d
add _process_group_config
lintangsutawika May 15, 2024
c90655d
adjust task_id
lintangsutawika May 15, 2024
88fea8a
add get_subtask_list function to get proper subtask list
lintangsutawika May 16, 2024
e66c5f5
group config to_dict update
lintangsutawika May 16, 2024
923f3e8
remove tag check
lintangsutawika May 16, 2024
104292f
update mmlu
lintangsutawika May 16, 2024
3be0916
fix config passing issues
lintangsutawika May 17, 2024
fd9cd80
add test yaml
lintangsutawika May 17, 2024
3e1301b
resolved merge conflict from latest version
lintangsutawika Jun 4, 2024
4eeb871
format fix
lintangsutawika Jun 4, 2024
be8b547
add documentation
lintangsutawika Jun 6, 2024
4140dd9
corner case for single tag being called
lintangsutawika Jun 6, 2024
5032eba
fix indentation
lintangsutawika Jun 7, 2024
e6b1581
formatting
lintangsutawika Jun 7, 2024
9a30374
update all mmlu variants
lintangsutawika Jun 7, 2024
ed1f757
Update docs/task_guide.md
lintangsutawika Jun 10, 2024
e8f4918
remove group_alias
lintangsutawika Jun 10, 2024
016615f
Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-e…
lintangsutawika Jun 10, 2024
26578d2
Update docs/task_guide.md
lintangsutawika Jun 10, 2024
1848d66
remove version for metadata
lintangsutawika Jun 10, 2024
b002849
Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-e…
lintangsutawika Jun 10, 2024
9e940f3
Update docs/task_guide.md
lintangsutawika Jun 10, 2024
ac1a1ce
update mmlu/
lintangsutawika Jun 10, 2024
e184c50
removed " " in make_table
lintangsutawika Jun 10, 2024
0f095f7
Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-e…
lintangsutawika Jun 10, 2024
80d0f41
change how aggregate_metric is loaded
lintangsutawika Jun 10, 2024
9fa3b3f
change how aggregate_metric is loaded
lintangsutawika Jun 10, 2024
5b64fb5
update aggregate_metric arg
lintangsutawika Jun 10, 2024
83c070d
update format
lintangsutawika Jun 10, 2024
5b527a7
update format
lintangsutawika Jun 10, 2024
4376566
update task_id to be updateable and uses group:task format
lintangsutawika Jun 27, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fixed issues related to printing alias of group and updated yaml
  • Loading branch information
lintangsutawika committed Apr 25, 2024
commit bcc887adc73bf2fab3835346e9f9e519cddb1d71
32 changes: 19 additions & 13 deletions lm_eval/evaluator_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,10 @@ def get_sample_size(task, limit: Optional[int]) -> Union[int, None]:


def prepare_print_tasks(
task_hierarchy: dict, results: dict, tab=0
task_hierarchy: dict,
results: dict,
tab=0,
group_tab=0,
) -> Tuple[dict, dict]:
"""
@param task_hierarchy: Dictionary representing the group hierarchy of tasks. Each key is a group name and its
Expand Down Expand Up @@ -197,17 +200,20 @@ def prepare_print_tasks(
results_agg[group_name]["alias"] = tab_string + group_name

if len(task_list) > 0:
groups_agg[group_name] = results[group_name].copy()
# groups_agg[group_name]["tab"] = tab
if "samples" in groups_agg[group_name]:
groups_agg[group_name].pop("samples")

if "alias" in groups_agg[group_name]:
groups_agg[group_name]["alias"] = (
tab_string + groups_agg[group_name]["alias"]
)
else:
groups_agg[group_name]["alias"] = tab_string + group_name
if " " not in results[group_name]:
group_tab_string = " " * group_tab + "- " if group_tab > 0 else ""
groups_agg[group_name] = results[group_name].copy()
group_tab += 1

if "samples" in groups_agg[group_name]:
groups_agg[group_name].pop("samples")

if "alias" in groups_agg[group_name]:
groups_agg[group_name]["alias"] = (
group_tab_string + groups_agg[group_name]["alias"]
)
else:
groups_agg[group_name]["alias"] = group_tab_string + group_name

for task_name in task_list:
if task_name in task_hierarchy:
Expand All @@ -222,7 +228,7 @@ def prepare_print_tasks(
}

_results_agg, _groups_agg = prepare_print_tasks(
_task_hierarchy, results, tab + 1
_task_hierarchy, results, tab + 1, group_tab
)
results_agg = {**results_agg, **_results_agg}
groups_agg = {**groups_agg, **_groups_agg}
Expand Down
4 changes: 2 additions & 2 deletions lm_eval/tasks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,6 @@ def load_task(config, task, group=None, yaml_path=None):
yaml_path = self._get_yaml_path(group_name)

if (update_config is not None) and ("group_alias" in update_config):
group_name = update_config["group_alias"]
update_config.pop("group_alias")

if isinstance(name_or_config, dict):
Expand Down Expand Up @@ -240,8 +239,9 @@ def load_task(config, task, group=None, yaml_path=None):

all_subtasks = {}
if parent_name is not None:
# all_subtasks = {group_name: (parent_name, None)}
parent_group_config = self._get_config(parent_name)
if "group_alias" in parent_group_config:
parent_name = parent_group_config["group_alias"]
all_subtasks = {group_name: (parent_name, parent_group_config)}

fn = partial(
Expand Down
14 changes: 14 additions & 0 deletions lm_eval/tasks/benchmarks/flan/flan_held_in.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ task:
# ANLI R1
- group: anli_r1_flan
group_alias: ANLI R1
group_config:
aggregate_metric: True
task:
- task: anli_r1
task_alias: prompt-0
Expand Down Expand Up @@ -53,6 +55,8 @@ task:
# ANLI R2
- group: anli_r2_flan
group_alias: ANLI R2
group_config:
aggregate_metric: True
task:
- task: anli_r2
task_alias: prompt-0
Expand Down Expand Up @@ -102,6 +106,8 @@ task:
# ANLI R3
- group: anli_r3_flan
group_alias: ANLI R3
group_config:
aggregate_metric: True
task:
- task: anli_r3
task_alias: prompt-0
Expand Down Expand Up @@ -151,6 +157,8 @@ task:
# Arc Easy
- group: arc_easy_flan
group_alias: Arc Easy
group_config:
aggregate_metric: True
task:
- task: arc_easy
task_alias: prompt-0
Expand Down Expand Up @@ -190,6 +198,8 @@ task:
# Arc Challenge
- group: arc_challenge_flan
group_alias: Arc Challenge
group_config:
aggregate_metric: True
task:
- task: arc_challenge
task_alias: prompt-0
Expand Down Expand Up @@ -229,6 +239,8 @@ task:
# BoolQ
- group: boolq_flan
group_alias: BoolQ
group_config:
aggregate_metric: True
task:
- task: boolq
task_alias: prompt-0
Expand Down Expand Up @@ -283,6 +295,8 @@ task:
# RTE
- group: rte_flan
group_alias: RTE
group_config:
aggregate_metric: True
task:
- task: rte
task_alias: prompt-0
Expand Down
Loading