Skip to content

Issues: EleutherAI/lm-evaluation-harness

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Label
Filter by label
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Milestones
Filter by milestone
Assignee
Filter by who’s assigned
Sort

Issues list

always get acc,acc_norm, perplexity =1 on triviaqa task based on llama2 model bug Something isn't working.
#1239 opened Jan 3, 2024 by learner-crapy updated Jan 3, 2024
Organize / Cleanup Logging + Levels documentation Improvements or additions to documentation. feature request A feature that isn't implemented yet.
#1192 opened Dec 21, 2023 by haileyschoelkopf updated Jan 5, 2024
Only a single filtered_resps is logged for repeat > 1 for each sample bug Something isn't working.
#1232 opened Jan 1, 2024 by baberabb updated Jan 8, 2024
Implement MLQA feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#192 opened Jun 10, 2021 by sdtblck updated Jan 15, 2024
Implement XQuAD feature request A feature that isn't implemented yet. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#191 opened Jun 10, 2021 by sdtblck updated Jan 15, 2024
wrong regular expression for exact match scoring
#1303 opened Jan 17, 2024 by Hannibal046 updated Jan 17, 2024
Can't find task table in docs?
#1319 opened Jan 19, 2024 by macabdul9 updated Jan 19, 2024
AssertionError: len(continuation_enc) > 0 bug Something isn't working.
#1297 opened Jan 16, 2024 by pszemraj updated Jan 19, 2024
NAN value for truthfulqa_mc2 on full finetuned model TinyLlama
#1340 opened Jan 23, 2024 by hahmad2008 updated Jan 24, 2024
Different fewshot with and w/o accelerate DDP
#1308 opened Jan 17, 2024 by baberabb updated Jan 28, 2024
Speed up + streamline prompt template rendering runtime feature request A feature that isn't implemented yet. help wanted Contributors and extra help welcome.
#1286 opened Jan 15, 2024 by haileyschoelkopf updated Jan 29, 2024
How to compute the perplexity only on the answer? asking questions For asking for clarification / support on library usage.
#1370 opened Jan 30, 2024 by Luobots updated Jan 30, 2024
Refactor main evaluate() loop into more readable sub-functions documentation Improvements or additions to documentation. feature request A feature that isn't implemented yet.
#1100 opened Dec 11, 2023 by haileyschoelkopf updated Feb 6, 2024
Low results on TriviaQA
#1292 opened Jan 16, 2024 by yafuly updated Feb 7, 2024
KeyError on some metrics from huggingface/evaluate bug Something isn't working.
#1302 opened Jan 17, 2024 by alexrs updated Feb 7, 2024
NotADirectoryError on dataset headqa_en bug Something isn't working.
#1428 opened Feb 13, 2024 by RylanSchaeffer updated Feb 14, 2024
Speed up openai API calls feature request A feature that isn't implemented yet.
#1410 opened Feb 8, 2024 by Some-random updated Feb 15, 2024
Support for top-k metrics with num_return_sequences>1 feature request A feature that isn't implemented yet.
#1117 opened Dec 13, 2023 by yuyemin updated Feb 18, 2024
using additional scoring model to process results
#1439 opened Feb 18, 2024 by artemorloff updated Feb 18, 2024
Pile tasks on big-refactor use dataset_names from old dataset loader that don't exist on HF bug Something isn't working. good first issue Good for newcomers help wanted Contributors and extra help welcome.
#731 opened Aug 3, 2023 by yeoedward updated Feb 19, 2024
ProTip! Updated in the last three days: updated:>2024-06-17.