-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: EleutherAI/lm-evaluation-harness
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
always get acc,acc_norm, perplexity =1 on triviaqa task based on llama2 model
bug
Something isn't working.
#1239
opened Jan 3, 2024 by
learner-crapy
updated Jan 3, 2024
Organize / Cleanup Logging + Levels
documentation
Improvements or additions to documentation.
feature request
A feature that isn't implemented yet.
#1192
opened Dec 21, 2023 by
haileyschoelkopf
updated Jan 5, 2024
Only a single Something isn't working.
filtered_resps
is logged for repeat > 1 for each sample
bug
#1232
opened Jan 1, 2024 by
baberabb
updated Jan 8, 2024
Implement MLQA
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#192
opened Jun 10, 2021 by
sdtblck
updated Jan 15, 2024
Implement XQuAD
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#191
opened Jun 10, 2021 by
sdtblck
updated Jan 15, 2024
wrong regular expression for exact match scoring
#1303
opened Jan 17, 2024 by
Hannibal046
updated Jan 17, 2024
AssertionError: len(continuation_enc) > 0
bug
Something isn't working.
#1297
opened Jan 16, 2024 by
pszemraj
updated Jan 19, 2024
assert len(continuation_enc) error in _loglikelihood_tokens for certain (but not all) tasks?
#1053
opened Dec 2, 2023 by
lhl
updated Jan 21, 2024
NAN value for truthfulqa_mc2 on full finetuned model TinyLlama
#1340
opened Jan 23, 2024 by
hahmad2008
updated Jan 24, 2024
Different fewshot with and w/o
accelerate
DDP
#1308
opened Jan 17, 2024 by
baberabb
updated Jan 28, 2024
Speed up + streamline prompt template rendering runtime
feature request
A feature that isn't implemented yet.
help wanted
Contributors and extra help welcome.
#1286
opened Jan 15, 2024 by
haileyschoelkopf
updated Jan 29, 2024
How to compute the perplexity only on the answer?
asking questions
For asking for clarification / support on library usage.
#1370
opened Jan 30, 2024 by
Luobots
updated Jan 30, 2024
Request for files to be placed in 'path/containing/training/set/ngrams'.
#1375
opened Jan 31, 2024 by
dsdanielpark
updated Jan 31, 2024
Hello, I would like to know if there is a method to use "generate_until" to evaluate on the ceval or cmmlu dataset. I'm using a chat model, which adds a prompt template to make it answer questions. However, the model's answer choices (like A, B, C, D) may not necessarily be the first generated token.
#1362
opened Jan 27, 2024 by
noforit
updated Feb 1, 2024
Refactor main Improvements or additions to documentation.
feature request
A feature that isn't implemented yet.
evaluate()
loop into more readable sub-functions
documentation
#1100
opened Dec 11, 2023 by
haileyschoelkopf
updated Feb 6, 2024
refactored v0.4 version shows differences to the existing harness in Japanese.
#1392
opened Feb 4, 2024 by
leocnj
updated Feb 7, 2024
KeyError on some metrics from huggingface/evaluate
bug
Something isn't working.
#1302
opened Jan 17, 2024 by
alexrs
updated Feb 7, 2024
NotADirectoryError
on dataset headqa_en
bug
#1428
opened Feb 13, 2024 by
RylanSchaeffer
updated Feb 14, 2024
Speed up openai API calls
feature request
A feature that isn't implemented yet.
#1410
opened Feb 8, 2024 by
Some-random
updated Feb 15, 2024
how to add tasks with requests based on the answers for the previous requests?
#1432
opened Feb 16, 2024 by
artemorloff
updated Feb 17, 2024
Support for top-k metrics with num_return_sequences>1
feature request
A feature that isn't implemented yet.
#1117
opened Dec 13, 2023 by
yuyemin
updated Feb 18, 2024
using additional scoring model to process results
#1439
opened Feb 18, 2024 by
artemorloff
updated Feb 18, 2024
Pile tasks on big-refactor use dataset_names from old dataset loader that don't exist on HF
bug
Something isn't working.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#731
opened Aug 3, 2023 by
yeoedward
updated Feb 19, 2024
ProTip!
Updated in the last three days: updated:>2024-06-17.