Group agg rework #1741

lintangsutawika · 2024-04-23T17:12:08Z

By default, group will not feature the aggregate scores of their subtasks.
To show an aggregate, a group_config will need to be defined in the yaml that consists of aggregate_metric (True/False, default False) and weight_by_size (True/False, default False).
Use task_id in ConfigurableGroup and ConfigurableTask to be used as identifier in lieu of task name/group name.

…oup aggregate in general

…rness into group-agg-rework

lintangsutawika · 2024-04-25T18:08:05Z

@haileyschoelkopf I've only added the group_config to MMLU tasks and flan_held_in. Let me know what other benchmarks that would need the aggregation to be added back in.

lm_eval/evaluator.py

lm_eval/api/task.py

haileyschoelkopf

Very close to done!

Left some comments, and messaged you re: how we should perhaps change the GroupConfig interface

Other than that:

we should go through and determine which of existing groups should be aggregating and which should not. I can do this
I need to go through again more finely the evaluator.py changes, there's quite a lot of logic in there at this point that I want to look more closely at
We should couple this with a readme update mentioning this change + referring people to Discord and the docs files for better understanding what's affected.

Will try to aim for having this merged by end of next week. Sorry for all the delays!

lm_eval/utils.py

haileyschoelkopf · 2024-05-13T13:12:32Z

lm_eval/utils.py

@@ -242,7 +242,7 @@ def get_original(self, newarr):
 return res


-def make_table(result_dict, column: str = "results", sort_results: bool = True):
+def make_table(result_dict, column: str = "results", sort_results: bool = False):


reason for not sorting by default?

It messes up the group name, as in the group name also gets sorted.

lm_eval/tasks/mmlu/flan_n_shot/loglikelihood/_mmlu.yaml

docs/task_guide.md

lm_eval/api/task.py

haileyschoelkopf · 2024-06-07T18:07:18Z

tests/testyamls/test-01.yaml

@@ -0,0 +1,45 @@
+group: test-1


is there an assorted test for this? Note also #1916 should help with PRs like this

Co-authored-by: Hailey Schoelkopf <[email protected]>

…valuation-harness into group-agg-rework

Co-authored-by: Hailey Schoelkopf <[email protected]>

…valuation-harness into group-agg-rework

Co-authored-by: Hailey Schoelkopf <[email protected]>

…valuation-harness into group-agg-rework

lintangsutawika added 11 commits April 23, 2024 17:09

add greoup_config arg

2a2566e

add a group config that allows disabling table for group score and gr…

6da6d18

…oup aggregate in general

fixed size configuration

5a98162

adjust config

cd0ad1b

add group config

eb9c6a5

adjust mmlu to use group_config

4dd6906

fixed args input in aggregate_subtask_metrics

9551bbf

fixed issues related to printing alias of group and updated yaml

bcc887a

Merge branch 'main' of https://github.com/EleutherAI/lm-evaluation-ha…

939a0cb

…rness into group-agg-rework

update all mmlu variants to include group_config

9ccec64

edit format

0579b30

lintangsutawika marked this pull request as ready for review April 25, 2024 18:05

lintangsutawika requested a review from haileyschoelkopf as a code owner April 25, 2024 18:05

haileyschoelkopf reviewed May 2, 2024

View reviewed changes

lm_eval/evaluator.py Outdated Show resolved Hide resolved

lm_eval/api/task.py Outdated Show resolved Hide resolved

lm_eval/api/task.py Outdated Show resolved Hide resolved

lintangsutawika added 15 commits May 6, 2024 14:04

modify mmlu tasks

9633619

adjust group to also be a configurable group

3473e19

add configurable group

86039e8

simplify get_task_list

fe2a147

adjust group scoring with using ConfigurableGroup

62572f0

adjust args

09bc7c6

update mmlu

cb085b0

update mmlu

c23c930

update to work with new group and task configuration

ad70d20

readd group_agg

03982e0

readd files

75dfac4

move prepare_print_tasks to evaluator_utils

110e5a2

resolved merge conflict

8d59330

sort set to False by default, fix predict_only arg

9f698c2

add version for groups

2a81753

lintangsutawika added 3 commits May 16, 2024 17:36

update mmlu

104292f

fix config passing issues

3be0916

add test yaml

fd9cd80

haileyschoelkopf mentioned this pull request May 30, 2024

higher_is_better tickers in output table #1893

Merged

lintangsutawika added 2 commits June 4, 2024 17:48

resolved merge conflict from latest version

3e1301b

format fix

4eeb871

haileyschoelkopf mentioned this pull request Jun 5, 2024

[New Task] Add Paloma benchmark #1928

Merged

lintangsutawika added 5 commits June 6, 2024 20:41

add documentation

be8b547

corner case for single tag being called

4140dd9

fix indentation

5032eba

formatting

e6b1581

update all mmlu variants

9a30374

haileyschoelkopf requested changes Jun 7, 2024

View reviewed changes

lintangsutawika and others added 15 commits June 10, 2024 14:35

Update docs/task_guide.md

ed1f757

Co-authored-by: Hailey Schoelkopf <[email protected]>

remove group_alias

e8f4918

Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-e…

016615f

…valuation-harness into group-agg-rework

Update docs/task_guide.md

26578d2

Co-authored-by: Hailey Schoelkopf <[email protected]>

remove version for metadata

1848d66

Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-e…

b002849

…valuation-harness into group-agg-rework

Update docs/task_guide.md

9e940f3

Co-authored-by: Hailey Schoelkopf <[email protected]>

update mmlu/

ac1a1ce

removed " " in make_table

e184c50

Merge branch 'group-agg-rework' of https://github.com/EleutherAI/lm-e…

0f095f7

…valuation-harness into group-agg-rework

change how aggregate_metric is loaded

80d0f41

change how aggregate_metric is loaded

9fa3b3f

update aggregate_metric arg

5b64fb5

update format

83c070d

update format

5b527a7

haileyschoelkopf mentioned this pull request Jun 12, 2024

Wandb logger can't handle groups with heterogenous metrics #1958

Open

update task_id to be updateable and uses group:task format

4376566

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Group agg rework #1741

Group agg rework #1741

lintangsutawika commented Apr 23, 2024 •

edited

Loading

lintangsutawika commented Apr 25, 2024

haileyschoelkopf left a comment

haileyschoelkopf May 13, 2024

lintangsutawika Jun 10, 2024

haileyschoelkopf Jun 7, 2024

Group agg rework #1741

Are you sure you want to change the base?

Group agg rework #1741

Conversation

lintangsutawika commented Apr 23, 2024 • edited Loading

lintangsutawika commented Apr 25, 2024

haileyschoelkopf left a comment

Choose a reason for hiding this comment

haileyschoelkopf May 13, 2024

Choose a reason for hiding this comment

lintangsutawika Jun 10, 2024

Choose a reason for hiding this comment

haileyschoelkopf Jun 7, 2024

Choose a reason for hiding this comment

lintangsutawika commented Apr 23, 2024 •

edited

Loading