-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: EleutherAI/lm-evaluation-harness
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[New Task] COLLIE
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1013
opened Nov 21, 2023 by
haileyschoelkopf
Implement TyDiQA
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#193
opened Jun 10, 2021 by
sdtblck
MNLI task giving (very) different results than the HuggingFace task accuracy metric
bug
Something isn't working.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#320
opened May 8, 2022 by
JunShern
Quac Dataset
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
#827
opened Sep 4, 2023 by
RanchiZhao
Should num_fewshot be type list?
feature request
A feature that isn't implemented yet.
#837
opened Sep 6, 2023 by
Wehzie
TGI support - API evaluation of HF models
feature request
A feature that isn't implemented yet.
help wanted
Contributors and extra help welcome.
#869
opened Sep 19, 2023 by
ManuelFay
chatglm2 acc=0 on lambada_openai dataset, is it correct?
bug
Something isn't working.
#959
opened Nov 2, 2023 by
changwangss
toxigen task measures toxicity classification rather than whether generations are toxic?
#974
opened Nov 8, 2023 by
laphang
Implement XQuAD
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#191
opened Jun 10, 2021 by
sdtblck
The tokenizer add_special_tokens parameter for t5 model lambada task
#1017
opened Nov 22, 2023 by
daisyden
A new DROP benchmark is needed
opinions wanted
For discussing open questions.
#1050
opened Nov 30, 2023 by
StellaAthena
Add ZeroScrolls Benchmark
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1083
opened Dec 8, 2023 by
haileyschoelkopf
Verify Stopsequences Don't Impact Scores
validation
For validation of task implementations.
#1086
opened Dec 9, 2023 by
haileyschoelkopf
Async support for OpenAI ChatCompletions
feature request
A feature that isn't implemented yet.
#1095
opened Dec 11, 2023 by
haileyschoelkopf
[Discussion] Add Major Code Benchmarks
opinions wanted
For discussing open questions.
#1157
opened Dec 18, 2023 by
haileyschoelkopf
6 tasks
Revamp + automate Task Table documentation feature
documentation
Improvements or additions to documentation.
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1160
opened Dec 18, 2023 by
haileyschoelkopf
3 tasks
process_results()
operating on the dataset, not example, level
feature request
#1163
opened Dec 18, 2023 by
StellaAthena
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.