Skip to content

Issues: EleutherAI/lm-evaluation-harness

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Run Gemma LM in Huggingface (simple patch) feature request A feature that isn't implemented yet.
#1455 by haileyschoelkopf was closed Feb 26, 2024
2 tasks
Display n-shot better for groups and for hardcoded-fewshot tasks feature request A feature that isn't implemented yet.
#1360 by haileyschoelkopf was closed Feb 1, 2024
[New Task] Upstream remaining Okapi multilingual tasks feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#1244 by haileyschoelkopf was closed Feb 21, 2024
Add Logits to OpenAI ChatCompletions model declined A proposed dataset or feature request that will not be implemented. feature request A feature that isn't implemented yet. help wanted Contributors and extra help welcome.
#1196 by haileyschoelkopf was closed May 23, 2024
[New Task] Paloma Eval Suite feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#1176 by haileyschoelkopf was closed Jun 19, 2024
Unify "metric" and "aggregation" abstractions feature request A feature that isn't implemented yet.
#1158 by haileyschoelkopf was closed Mar 15, 2024
Print "higher_is_better" in results table feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#1153 by haileyschoelkopf was closed Jun 3, 2024
Add --predict_only mode (run without scoring outputs) feature request A feature that isn't implemented yet. help wanted Contributors and extra help welcome.
#1152 by haileyschoelkopf was closed Jan 31, 2024
Support wrapping prompts with a given Chat Template feature request A feature that isn't implemented yet. help wanted Contributors and extra help welcome. opinions wanted For discussing open questions.
#1098 by haileyschoelkopf was closed Jun 11, 2024 v0.4.3
Upstream Mamba integration feature request A feature that isn't implemented yet.
#1085 by haileyschoelkopf was closed Dec 22, 2023
[New Task] SIQA feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#1027 by haileyschoelkopf was closed Nov 28, 2023
[New Task Request] IFEval / Instruction-Following Eval feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#1012 by haileyschoelkopf was closed Feb 9, 2024
[New Task] Implement GPQA dataset feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#1010 by haileyschoelkopf was closed Mar 5, 2024
[Refactor] Allow for some tasks to force zero-shot feature request A feature that isn't implemented yet.
#962 by haileyschoelkopf was closed Nov 29, 2023 v0.4.0
[Refactor] Revamp Testing / CI pipeline feature request A feature that isn't implemented yet. help wanted Contributors and extra help welcome.
#656 by haileyschoelkopf was closed Mar 4, 2024
2 of 6 tasks
Temperature Sampling + Maj@K
#387 by haileyschoelkopf was closed Feb 9, 2023
Add Anthropic Model-written Eval datasets to harness? feature request A feature that isn't implemented yet. good first issue Good for newcomers
#375 by haileyschoelkopf was closed Nov 8, 2023
2 tasks
ProTip! Follow long discussions with comments:>50.