-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: EleutherAI/lm-evaluation-harness
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
assert len(continuation_enc) error in _loglikelihood_tokens for certain (but not all) tasks?
#1053
opened Dec 2, 2023 by
lhl
Organize / Cleanup Logging + Levels
documentation
Improvements or additions to documentation.
feature request
A feature that isn't implemented yet.
#1192
opened Dec 21, 2023 by
haileyschoelkopf
Update Zeno Integration
feature request
A feature that isn't implemented yet.
#1175
opened Dec 20, 2023 by
haileyschoelkopf
4 tasks
process_results()
operating on the dataset, not example, level
feature request
#1163
opened Dec 18, 2023 by
StellaAthena
Revamp + automate Task Table documentation feature
documentation
Improvements or additions to documentation.
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1160
opened Dec 18, 2023 by
haileyschoelkopf
3 tasks
[Discussion] Add Major Code Benchmarks
opinions wanted
For discussing open questions.
#1157
opened Dec 18, 2023 by
haileyschoelkopf
6 tasks
Upstream Llemma Math Task Suite
feature request
A feature that isn't implemented yet.
#1151
opened Dec 18, 2023 by
haileyschoelkopf
Support for top-k metrics with num_return_sequences>1
feature request
A feature that isn't implemented yet.
#1117
opened Dec 13, 2023 by
yuyemin
Refactor main Improvements or additions to documentation.
feature request
A feature that isn't implemented yet.
evaluate()
loop into more readable sub-functions
documentation
#1100
opened Dec 11, 2023 by
haileyschoelkopf
Add ZeroScrolls Benchmark
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1083
opened Dec 8, 2023 by
haileyschoelkopf
Add task variants replicating Llama 1 / 2 evaluation numbers
feature request
A feature that isn't implemented yet.
#1078
opened Dec 7, 2023 by
haileyschoelkopf
FileNotFoundError: Couldn't find a module script at exact_match.py. Module 'exact_match' doesn't exist on the Hugging Face Hub either.
bug
Something isn't working.
#1071
opened Dec 6, 2023 by
xinghuang2050
A new DROP benchmark is needed
opinions wanted
For discussing open questions.
#1050
opened Nov 30, 2023 by
StellaAthena
The tokenizer add_special_tokens parameter for t5 model lambada task
#1017
opened Nov 22, 2023 by
daisyden
Stability Upstream translated task
feature request
A feature that isn't implemented yet.
#1006
opened Nov 20, 2023 by
StellaAthena
toxigen task measures toxicity classification rather than whether generations are toxic?
#974
opened Nov 8, 2023 by
laphang
chatglm2 acc=0 on lambada_openai dataset, is it correct?
bug
Something isn't working.
#959
opened Nov 2, 2023 by
changwangss
TGI support - API evaluation of HF models
feature request
A feature that isn't implemented yet.
help wanted
Contributors and extra help welcome.
#869
opened Sep 19, 2023 by
ManuelFay
Pile tasks on big-refactor use dataset_names from old dataset loader that don't exist on HF
bug
Something isn't working.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#731
opened Aug 3, 2023 by
yeoedward
ProTip!
Find all open issues with in progress development work with linked:pr.