-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: EleutherAI/lm-evaluation-harness
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
MNLI task giving (very) different results than the HuggingFace task accuracy metric
bug
Something isn't working.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#320
opened May 8, 2022 by
JunShern
updated Apr 30, 2023
Central repository for results from running the evaluations
#662
opened Jul 6, 2023 by
c1505
updated Sep 27, 2023
Should num_fewshot be type list?
feature request
A feature that isn't implemented yet.
#837
opened Sep 6, 2023 by
Wehzie
updated Oct 14, 2023
"RuntimeError: CUDA out of memory" on lm-eval 0.3.0 through GPT-NeoX evaluate past a certain number of nodes
bug
Something isn't working.
duplicate
This issue or pull request already exists.
help wanted
Contributors and extra help welcome.
#884
opened Sep 23, 2023 by
AIproj
updated Oct 17, 2023
RACE dataset error?
bug
Something isn't working.
#835
opened Sep 6, 2023 by
RanchiZhao
updated Nov 2, 2023
"Please select a token to use as
pad_token
" error for alpaca-lora-7b
model
#434
opened Apr 24, 2023 by
oshev
updated Nov 8, 2023
TGI support - API evaluation of HF models
feature request
A feature that isn't implemented yet.
help wanted
Contributors and extra help welcome.
#869
opened Sep 19, 2023 by
ManuelFay
updated Nov 8, 2023
[New Task] COLLIE
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1013
opened Nov 21, 2023 by
haileyschoelkopf
updated Nov 21, 2023
A new DROP benchmark is needed
opinions wanted
For discussing open questions.
#1050
opened Nov 30, 2023 by
StellaAthena
updated Dec 8, 2023
Add ZeroScrolls Benchmark
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1083
opened Dec 8, 2023 by
haileyschoelkopf
updated Dec 8, 2023
Verify Stopsequences Don't Impact Scores
validation
For validation of task implementations.
#1086
opened Dec 9, 2023 by
haileyschoelkopf
updated Dec 9, 2023
Async support for OpenAI ChatCompletions
feature request
A feature that isn't implemented yet.
#1095
opened Dec 11, 2023 by
haileyschoelkopf
updated Dec 12, 2023
Implement TyDiQA
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#193
opened Jun 10, 2021 by
sdtblck
updated Dec 14, 2023
Is it possible to use BLEU with multiple references?
#1125
opened Dec 14, 2023 by
juliafalcao
updated Dec 14, 2023
Upstream Llemma Math Task Suite
feature request
A feature that isn't implemented yet.
#1151
opened Dec 18, 2023 by
haileyschoelkopf
updated Dec 18, 2023
process_results()
operating on the dataset, not example, level
feature request
#1163
opened Dec 18, 2023 by
StellaAthena
updated Dec 18, 2023
chatglm2 acc=0 on lambada_openai dataset, is it correct?
bug
Something isn't working.
#959
opened Nov 2, 2023 by
changwangss
updated Dec 22, 2023
Revamp + automate Task Table documentation feature
documentation
Improvements or additions to documentation.
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1160
opened Dec 18, 2023 by
haileyschoelkopf
updated Dec 22, 2023
3 tasks
Documentation about tasks <> request types mapping
#1202
opened Dec 22, 2023 by
anjor
updated Dec 22, 2023
Update Zeno Integration
feature request
A feature that isn't implemented yet.
#1175
opened Dec 20, 2023 by
haileyschoelkopf
updated Dec 28, 2023
4 tasks
More Flexible Answer Extraction Code
feature request
A feature that isn't implemented yet.
#1159
opened Dec 18, 2023 by
haileyschoelkopf
updated Jan 1, 2024
CoQA's implementation only predicts the last answer of each text
bug
Something isn't working.
good first issue
Good for newcomers
#1231
opened Jan 1, 2024 by
glerzing
updated Jan 1, 2024
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.