Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper way to add arguments to chosen metrics? #1483

Open
LSinev opened this issue Feb 27, 2024 · 4 comments
Open

Proper way to add arguments to chosen metrics? #1483

LSinev opened this issue Feb 27, 2024 · 4 comments
Labels
asking questions For asking for clarification / support on library usage.

Comments

@LSinev
Copy link
Contributor

LSinev commented Feb 27, 2024

This may turn out into a discussion of PR or just pointer to a line in documentation.
For example, I want to use F1 macro averaged for benchmarking on a dataset. Seems there is no way other than custom function definition as there is no support for extra arguments throughout code.
So developers discover their solutions like:

fscore = f1_score(golds, preds, average="macro")
(with usage seems to be like
- metric: f1
aggregation: !function utils.macro_f1_score
average: macro
hf_evaluate: true
higher_is_better: True
)
or
metric_fn_kwargs = {
"beta": 1,
"labels": range(3),
"average": "macro",
}
(not sure, but this may become some decorator inspiration for use in metrics)

Are there any proper solutions with no custom metrics definition in code as written in docs?

All metrics supported in HuggingFace Evaluate can also be used

Metrics there support arguments, so how to use arguments?

@haileyschoelkopf haileyschoelkopf added the asking questions For asking for clarification / support on library usage. label Feb 27, 2024
@haileyschoelkopf
Copy link
Contributor

All metrics supported in HuggingFace Evaluate can also be used

Metrics there support arguments, so how to use arguments?

I was under the impression we do support kwargs passed to HF Evaluate metrics via putting them into the YAML: example here with exact_match

but if I have misunderstood your use case, or this is not working as expected, let me know!

@haileyschoelkopf
Copy link
Contributor

@LSinev did this help to resolve your issue, or are you facing problems with this?

(Want to close the issue to keep total issues manageable, if it did, but if not no worries happy to keep open and field more questions!)

@LSinev
Copy link
Contributor Author

LSinev commented Mar 3, 2024

Thanks for your help. I will try to use the solution described while experimenting with moving custom python Task to yaml form using ConfigurableTask. Not sure about time frame of this, so I will probably bookmark this issue for myself to be able to find it when it is closed.

As a suggestion to keep the total number of issues manageable, it may help to close issues that started in 2022 from the 5th page of the issue list (these issues may lose their relevance to the current codebase).

@g8a9
Copy link

g8a9 commented Apr 26, 2024

Hey, I'm chiming in here since I think my use case is kind of related. I'm trying to add a custom metric evaluation where I need to use some information from the dataset docs. To be more concrete, I would like to compute per-group F1 macro scores and then the Gini-index across groups on this dataset, where "group" here is the target_ident column.

In other words, I can't simply compute F1 and then aggregate -- as the only way to compute the metric is using the predictions and the target_ident column value across all entries. What would be the best way to implement this?

(In general, group-based aggregation is something very relevant in several fairness-related evaluation scenarios)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
asking questions For asking for clarification / support on library usage.
Projects
None yet
Development

No branches or pull requests

3 participants