-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper way to add arguments to chosen metrics? #1483
Comments
Metrics there support arguments, so how to use arguments? I was under the impression we do support kwargs passed to HF Evaluate metrics via putting them into the YAML: example here with exact_match but if I have misunderstood your use case, or this is not working as expected, let me know! |
@LSinev did this help to resolve your issue, or are you facing problems with this? (Want to close the issue to keep total issues manageable, if it did, but if not no worries happy to keep open and field more questions!) |
Thanks for your help. I will try to use the solution described while experimenting with moving custom python Task to yaml form using ConfigurableTask. Not sure about time frame of this, so I will probably bookmark this issue for myself to be able to find it when it is closed. As a suggestion to keep the total number of issues manageable, it may help to close issues that started in 2022 from the 5th page of the issue list (these issues may lose their relevance to the current codebase). |
Hey, I'm chiming in here since I think my use case is kind of related. I'm trying to add a custom metric evaluation where I need to use some information from the dataset docs. To be more concrete, I would like to compute per-group F1 macro scores and then the Gini-index across groups on this dataset, where "group" here is the target_ident column. In other words, I can't simply compute F1 and then aggregate -- as the only way to compute the metric is using the predictions and the target_ident column value across all entries. What would be the best way to implement this? (In general, group-based aggregation is something very relevant in several fairness-related evaluation scenarios) |
This may turn out into a discussion of PR or just pointer to a line in documentation.
For example, I want to use F1 macro averaged for benchmarking on a dataset. Seems there is no way other than custom function definition as there is no support for extra arguments throughout code.
So developers discover their solutions like:
lm-evaluation-harness/lm_eval/tasks/kobest/utils.py
Line 47 in 96d185f
lm-evaluation-harness/lm_eval/tasks/kobest/kobest_wic.yaml
Lines 17 to 21 in 96d185f
or
lm-evaluation-harness/lm_eval/tasks/super_glue/cb/t5_utils.py
Lines 19 to 23 in 96d185f
Are there any proper solutions with no custom metrics definition in code as written in docs?
Metrics there support arguments, so how to use arguments?
The text was updated successfully, but these errors were encountered: