-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Issues: EleutherAI/lm-evaluation-harness
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
What is the output_type in the metric for?
#1976
opened Jun 17, 2024 by
dennisrall
updated Jun 19, 2024
Incorrect Multilingual arc implementation
#2000
opened Jun 19, 2024 by
hynky1999
updated Jun 19, 2024
Add HuggingFace Text-Generation-Interface Support
#2001
opened Jun 19, 2024 by
taoari
updated Jun 19, 2024
Add support to azure-openai deployed models
#1733
opened Apr 22, 2024 by
bcarvalho-via
updated Jun 23, 2024
Wrong calculation of score when there are ties?
#2007
opened Jun 21, 2024 by
apohllo
updated Jun 24, 2024
Ubelievable long time when host the gguf mode ?
#1971
opened Jun 16, 2024 by
hzgdeerHo
updated Jun 25, 2024
Compatibility with Models from PyReft Library
#2012
opened Jun 23, 2024 by
crux82
updated Jun 25, 2024
toxigen task measures toxicity classification rather than whether generations are toxic?
#974
opened Nov 8, 2023 by
laphang
updated Jun 26, 2024
Multiple issues Encountered During Tasks Verification
#1885
opened May 25, 2024 by
zhabuye
updated Jun 28, 2024
Long time testing Qwen2-72B
bug
Something isn't working.
#1984
opened Jun 18, 2024 by
djstrong
updated Jun 28, 2024
Inconsistent evaluation results with Chat Template
#1841
opened May 14, 2024 by
shiweijiezero
updated Jun 28, 2024
coqa not working
bug
Something isn't working.
#1529
opened Mar 5, 2024 by
lchu-ibm
updated Jun 29, 2024
Running evaluation on Gemma-2 27B model
#2063
opened Jul 4, 2024 by
zeynepgulhanuslu
updated Jul 4, 2024
Using Language Models as Evaluators
feature request
A feature that isn't implemented yet.
#1831
opened May 13, 2024 by
lintangsutawika
updated Jul 6, 2024
The response is too short to extract answer on GPQA. What should I set to extend it?
#2081
opened Jul 8, 2024 by
URRealHero
updated Jul 8, 2024
max_new_tokens and max_length conflict
#2070
opened Jul 5, 2024 by
meg-huggingface
updated Jul 8, 2024
Implementing Anthropic's discrimination evaluation
#2072
opened Jul 5, 2024 by
notrichardren
updated Jul 9, 2024
ProTip!
Adding no:label will show everything without a label.