-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Issues: EleutherAI/lm-evaluation-harness
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
filter
<class 'utils.WordSortFilter'>
is not registered!
#2108
opened Jul 16, 2024 by
Mr-Nobody1
updated Jul 16, 2024
The evaluation results are inconsistent across different GPUs
#2077
opened Jul 8, 2024 by
DonteFlynn
updated Jul 15, 2024
Task description newline characters removed by Jinja templating, affecting generated requests and performance
#1817
opened May 9, 2024 by
ma0li
updated Jul 15, 2024
Question: Realtoxicityprompts takes >10 seconds per query, is this expected behavior?
#2096
opened Jul 12, 2024 by
meg-huggingface
updated Jul 12, 2024
How to RUN BENCHMARK on GGUF Models ?
#2086
opened Jul 10, 2024 by
RakshitAralimatti
updated Jul 10, 2024
evaluation extremely slow with llama_cpp/gguf
bug
Something isn't working.
#1472
opened Feb 26, 2024 by
mobicham
updated Jul 9, 2024
Implementing Anthropic's discrimination evaluation
#2072
opened Jul 5, 2024 by
notrichardren
updated Jul 9, 2024
max_new_tokens and max_length conflict
#2070
opened Jul 5, 2024 by
meg-huggingface
updated Jul 8, 2024
The response is too short to extract answer on GPQA. What should I set to extend it?
#2081
opened Jul 8, 2024 by
URRealHero
updated Jul 8, 2024
Using Language Models as Evaluators
feature request
A feature that isn't implemented yet.
#1831
opened May 13, 2024 by
lintangsutawika
updated Jul 6, 2024
Running evaluation on Gemma-2 27B model
#2063
opened Jul 4, 2024 by
zeynepgulhanuslu
updated Jul 4, 2024
coqa not working
bug
Something isn't working.
#1529
opened Mar 5, 2024 by
lchu-ibm
updated Jun 29, 2024
Inconsistent evaluation results with Chat Template
#1841
opened May 14, 2024 by
shiweijiezero
updated Jun 28, 2024
Long time testing Qwen2-72B
bug
Something isn't working.
#1984
opened Jun 18, 2024 by
djstrong
updated Jun 28, 2024
Multiple issues Encountered During Tasks Verification
#1885
opened May 25, 2024 by
zhabuye
updated Jun 28, 2024
Previous Next
ProTip!
Follow long discussions with comments:>50.