-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementing Anthropic's discrimination evaluation #2072
Comments
Based on my understanding, LM-eval-harness is not able to do the cross-group analysis required for Anthropic’s discrim eval. Each metric is applied to each individual prompt, and then it is aggregated in a way that doesn’t account for differences in prompts. For now, I'm “rescuing” all the results and saving them so I can process them outside lm-eval-harness (shown in code above). |
A bit hacky, but maybe you can pass your results from metric computation as a tuple (logit_diff, doc grouping id, {any other info?}) and have a custom aggregation aggregate across each group and report the final aggregated score? I'd like to make this possible to implement--will try to take a closer look asap. |
Makes sense. I may also want to report different results for various groups (e.g. race, gender, etc.), while my impression was that an aggregation returns a single number |
Implementing Anthropic's discrimination evaluation requires evaluating logit differences among groups like age, gender, race, etc. This seems difficult to implement with "metrics" and "aggregation", as there doesn't seem to be a way to have the age/gender/race information leak through.
Is there something I'm missing about lm-eval-harness's features that would allow for an easier implementation?
YAML file:
Utils.py file
The text was updated successfully, but these errors were encountered: