-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standardize metrics #1167
base: main
Are you sure you want to change the base?
Standardize metrics #1167
Conversation
…rness into standardize_metrics
I don't quite get the distinction between |
Hmmm, I think I’d prefer to leave metrics as solo functions rather than objects, and just have say accuracy implemented as acc(items): What do you think? This reduces the cognitive load needed for a user to make their own custom metric in a task’s utils.py since they don’t need to be super acquainted with our library, just with the type signature of results per doc which they’d need to know anyway. Won’t this class make it harder for us to load arbitrary HF evaluate metrics as desired? |
@baberabb yes, it's the same. The condition is meant so that we could call a function like @haileyschoelkopf I get what you mean. I think it might be better to push metrics to where aggregation is done where the score is calculated dataset-wise. I think this is pretty much the standard for all metric functions like |
Agreed, I think this makes sense! (to have all metrics basically perform just the previous “aggregation” duty all in one go along with the per-sample calcs bundled into that one function) |
I think I'm a little confused where this PR is currently supposed to leave us in terms of changing how metrics work. People still provide metric + aggregation function, but the actual metric function is now silently more of an aggregation, sometimes? I think we should go with one of the following maybe:
What do you think? |
self._aggregation_list[metric_name] = metric_config[ | ||
"aggregation" | ||
] | ||
assert isinstance(metric_name, str) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to minimize the amount of metric handling code that is in the task.py file if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have strong preference for it to be in task.py
but currently I think atm it's the most relevant to be here since metric is attached to the task through it's config. Maybe an intermediate process between loading config and task could work?
This PR becomes mostly tidying up the metrics and aggregation. For the metrics part, I ended up making it so that most of the pass through functions could be removed. And the how it works is that by default metrics are run at the aggregation level but also backwards compatible with current implementation of metric being run in
I think this could work but at the risk of over abstraction?
I think this would be more inline with how most metrics work. But the issue is what parameters do we want to share from the |
Standardize Metrics