Skip to content

Issues: EleutherAI/lm-evaluation-harness

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

Organize / Cleanup Logging + Levels documentation Improvements or additions to documentation. feature request A feature that isn't implemented yet.
#1192 opened Dec 21, 2023 by haileyschoelkopf
Update Zeno Integration feature request A feature that isn't implemented yet.
#1175 opened Dec 20, 2023 by haileyschoelkopf
4 tasks
process_results() operating on the dataset, not example, level feature request A feature that isn't implemented yet. good first issue Good for newcomers
#1163 opened Dec 18, 2023 by StellaAthena
Revamp + automate Task Table documentation feature documentation Improvements or additions to documentation. feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#1160 opened Dec 18, 2023 by haileyschoelkopf
3 tasks
[Discussion] Add Major Code Benchmarks opinions wanted For discussing open questions.
#1157 opened Dec 18, 2023 by haileyschoelkopf
6 tasks
Upstream Llemma Math Task Suite feature request A feature that isn't implemented yet.
#1151 opened Dec 18, 2023 by haileyschoelkopf
Support for top-k metrics with num_return_sequences>1 feature request A feature that isn't implemented yet.
#1117 opened Dec 13, 2023 by yuyemin
Refactor main evaluate() loop into more readable sub-functions documentation Improvements or additions to documentation. feature request A feature that isn't implemented yet.
#1100 opened Dec 11, 2023 by haileyschoelkopf
Add ZeroScrolls Benchmark feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#1083 opened Dec 8, 2023 by haileyschoelkopf
Add task variants replicating Llama 1 / 2 evaluation numbers feature request A feature that isn't implemented yet.
#1078 opened Dec 7, 2023 by haileyschoelkopf
A new DROP benchmark is needed opinions wanted For discussing open questions.
#1050 opened Nov 30, 2023 by StellaAthena
HuggingFace model prompt formatting feature request A feature that isn't implemented yet.
#1209 opened Dec 24, 2023 by daniel-furman v0.4.3
Stability Upstream translated task feature request A feature that isn't implemented yet.
#1006 opened Nov 20, 2023 by StellaAthena
chatglm2 acc=0 on lambada_openai dataset, is it correct? bug Something isn't working.
#959 opened Nov 2, 2023 by changwangss
TGI support - API evaluation of HF models feature request A feature that isn't implemented yet. help wanted Contributors and extra help welcome.
#869 opened Sep 19, 2023 by ManuelFay
RACE dataset error? bug Something isn't working.
#835 opened Sep 6, 2023 by RanchiZhao
Pile tasks on big-refactor use dataset_names from old dataset loader that don't exist on HF bug Something isn't working. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#731 opened Aug 3, 2023 by yeoedward
Issue when Evaluating ChatGLM
#705 opened Jul 26, 2023 by Tardissss
ProTip! Find all open issues with in progress development work with linked:pr.