-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Issues: EleutherAI/lm-evaluation-harness
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Stability Upstream translated task
feature request
A feature that isn't implemented yet.
#1006
opened Nov 20, 2023 by
StellaAthena
updated Feb 19, 2024
Task assigned to only one group when multiple groups are run
bug
Something isn't working.
#1436
opened Feb 17, 2024 by
baberabb
updated Feb 19, 2024
janitor_util C++ splits multibyte characters into non-UTF bytes(?)
#1452
opened Feb 21, 2024 by
mycoalchen
updated Feb 21, 2024
log_samples File name too long. Need truncation or override
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1454
opened Feb 21, 2024 by
ryxli
updated Feb 23, 2024
Issue with
bigbench_gender_inclusive_sentences_german_multiple_choice
#1473
opened Feb 26, 2024 by
ayulockin
updated Feb 26, 2024
Truncation
asking questions
For asking for clarification / support on library usage.
#1426
opened Feb 13, 2024 by
mdocekal
updated Feb 26, 2024
Acc vs acc_norm
asking questions
For asking for clarification / support on library usage.
#1396
opened Feb 5, 2024 by
sam-paech
updated Feb 26, 2024
How can we do evaluation few-shot in Chat-template?
#1490
opened Feb 28, 2024 by
srn-source
updated Feb 28, 2024
Output format of samples has been changed
bug
Something isn't working.
#1493
opened Feb 28, 2024 by
christyler3030
updated Feb 28, 2024
concurrent api request to accelerate evaluation
feature request
A feature that isn't implemented yet.
#1504
opened Mar 1, 2024 by
jordane95
updated Mar 1, 2024
ValueError: Tasks not found: persona_desire-for-acquiring-eval-results.
#1512
opened Mar 3, 2024 by
RylanSchaeffer
updated Mar 3, 2024
Run pawsx task got "TypeError: 'NoneType' object cannot be interpreted as an integer" error.
bug
Something isn't working.
#1539
opened Mar 7, 2024 by
weizhixiaoyi
updated Mar 7, 2024
Whitespace before label in MultipleChoiceTask causes wrong label probability prediction
#1556
opened Mar 11, 2024 by
RibinMTC
updated Mar 11, 2024
Implement the SuperGLUE evaluation
feature request
A feature that isn't implemented yet.
#22
opened Sep 16, 2020 by
StellaAthena
updated Mar 11, 2024
1 of 2 tasks
Expose Configuration Options for Perplexity calculations
feature request
A feature that isn't implemented yet.
#1565
opened Mar 12, 2024 by
haileyschoelkopf
updated Mar 12, 2024
(Question) How can I fully utilize the number of cores in my CPU?
#1576
opened Mar 14, 2024 by
WCSY-YG
updated Mar 14, 2024
Sanity checking the semantic meaning of "perplexity" in code
asking questions
For asking for clarification / support on library usage.
#1581
opened Mar 15, 2024 by
RylanSchaeffer
updated Mar 15, 2024
Make Adding New MCQA Metrics Easier
feature request
A feature that isn't implemented yet.
#1585
opened Mar 15, 2024 by
haileyschoelkopf
updated Mar 15, 2024
Add task variants replicating Llama 1 / 2 evaluation numbers
feature request
A feature that isn't implemented yet.
#1078
opened Dec 7, 2023 by
haileyschoelkopf
updated Mar 16, 2024
ProTip!
Adding no:label will show everything without a label.