Proper way to add arguments to chosen metrics? #1483

LSinev · 2024-02-27T06:42:32Z

This may turn out into a discussion of PR or just pointer to a line in documentation.
For example, I want to use F1 macro averaged for benchmarking on a dataset. Seems there is no way other than custom function definition as there is no support for extra arguments throughout code.
So developers discover their solutions like:

lm-evaluation-harness/lm_eval/tasks/kobest/utils.py

Line 47 in 96d185f

fscore = f1_score(golds, preds, average="macro")

(with usage seems to be like

lm-evaluation-harness/lm_eval/tasks/kobest/kobest_wic.yaml

Lines 17 to 21 in 96d185f

 - metric: f1 

 aggregation: !function utils.macro_f1_score 

 average: macro 

 hf_evaluate: true 

 higher_is_better: True

)
or

lm-evaluation-harness/lm_eval/tasks/super_glue/cb/t5_utils.py

Lines 19 to 23 in 96d185f

 metric_fn_kwargs = { 

 "beta": 1, 

 "labels": range(3), 

 "average": "macro", 

 }

(not sure, but this may become some decorator inspiration for use in metrics)

Are there any proper solutions with no custom metrics definition in code as written in docs?

All metrics supported in HuggingFace Evaluate can also be used

Metrics there support arguments, so how to use arguments?

haileyschoelkopf · 2024-02-27T15:15:25Z

All metrics supported in HuggingFace Evaluate can also be used

Metrics there support arguments, so how to use arguments?

I was under the impression we do support kwargs passed to HF Evaluate metrics via putting them into the YAML: example here with exact_match

but if I have misunderstood your use case, or this is not working as expected, let me know!

haileyschoelkopf · 2024-02-29T00:00:31Z

@LSinev did this help to resolve your issue, or are you facing problems with this?

(Want to close the issue to keep total issues manageable, if it did, but if not no worries happy to keep open and field more questions!)

LSinev · 2024-03-03T10:25:00Z

Thanks for your help. I will try to use the solution described while experimenting with moving custom python Task to yaml form using ConfigurableTask. Not sure about time frame of this, so I will probably bookmark this issue for myself to be able to find it when it is closed.

As a suggestion to keep the total number of issues manageable, it may help to close issues that started in 2022 from the 5th page of the issue list (these issues may lose their relevance to the current codebase).

g8a9 · 2024-04-26T09:30:42Z

Hey, I'm chiming in here since I think my use case is kind of related. I'm trying to add a custom metric evaluation where I need to use some information from the dataset docs. To be more concrete, I would like to compute per-group F1 macro scores and then the Gini-index across groups on this dataset, where "group" here is the target_ident column.

In other words, I can't simply compute F1 and then aggregate -- as the only way to compute the metric is using the predictions and the target_ident column value across all entries. What would be the best way to implement this?

(In general, group-based aggregation is something very relevant in several fairness-related evaluation scenarios)

haileyschoelkopf added the asking questions For asking for clarification / support on library usage. label Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper way to add arguments to chosen metrics? #1483

Proper way to add arguments to chosen metrics? #1483

LSinev commented Feb 27, 2024

haileyschoelkopf commented Feb 27, 2024

haileyschoelkopf commented Feb 29, 2024

LSinev commented Mar 3, 2024 •

edited

Loading

g8a9 commented Apr 26, 2024

Proper way to add arguments to chosen metrics? #1483

Proper way to add arguments to chosen metrics? #1483

Comments

LSinev commented Feb 27, 2024

haileyschoelkopf commented Feb 27, 2024

haileyschoelkopf commented Feb 29, 2024

LSinev commented Mar 3, 2024 • edited Loading

g8a9 commented Apr 26, 2024

LSinev commented Mar 3, 2024 •

edited

Loading