-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Issues: EleutherAI/lm-evaluation-harness
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Check compatibility of Something isn't working.
local-completions
with VLLM (returns logits) for multiple_choice
tasks
bug
#1949
opened Jun 11, 2024 by
haileyschoelkopf
updated Jun 11, 2024
Add MMLU-Pro Dataset
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1947
opened Jun 11, 2024 by
haileyschoelkopf
updated Jun 11, 2024
Add Regression Testing
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1883
opened May 24, 2024 by
haileyschoelkopf
updated May 31, 2024
[Discussion] Add Major Code Benchmarks
opinions wanted
For discussing open questions.
#1157
opened Dec 18, 2023 by
haileyschoelkopf
updated May 28, 2024
6 tasks
Add a docs FAQ section
documentation
Improvements or additions to documentation.
#1676
opened Apr 5, 2024 by
haileyschoelkopf
updated May 27, 2024
Add New Lambada Translations
good first issue
Good for newcomers
#1501
opened Feb 29, 2024 by
haileyschoelkopf
updated May 27, 2024
Allow Task objects to defer dataset download
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1558
opened Mar 11, 2024 by
haileyschoelkopf
updated May 22, 2024
Allow A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
--include_path
to import an externally-defined LM subclass
feature request
#1457
opened Feb 22, 2024 by
haileyschoelkopf
updated May 15, 2024
[Discussion/Feedback] VLM + Multimodal benchmarking
opinions wanted
For discussing open questions.
#1155
opened Dec 18, 2023 by
haileyschoelkopf
updated May 13, 2024
Add More Tests
feature request
A feature that isn't implemented yet.
#1827
opened May 12, 2024 by
haileyschoelkopf
updated May 12, 2024
New Task Request: LegalBench
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1754
opened Apr 26, 2024 by
haileyschoelkopf
updated Apr 26, 2024
[New Task] CommonsenseQA
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1026
opened Nov 27, 2023 by
haileyschoelkopf
updated Apr 17, 2024
Better Document Data-Parallel interface / clean it up
feature request
A feature that isn't implemented yet.
#1684
opened Apr 7, 2024 by
haileyschoelkopf
updated Apr 7, 2024
Cleanup Dependencies Further
feature request
A feature that isn't implemented yet.
#1683
opened Apr 7, 2024 by
haileyschoelkopf
updated Apr 7, 2024
Add better test coverage for models
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1613
opened Mar 20, 2024 by
haileyschoelkopf
updated Apr 7, 2024
Add docstring for HFLM's many keyword args
documentation
Improvements or additions to documentation.
feature request
A feature that isn't implemented yet.
good first issue
Good for newcomers
help wanted
Contributors and extra help welcome.
#1682
opened Apr 7, 2024 by
haileyschoelkopf
updated Apr 7, 2024
Add Improvements or additions to documentation.
nemo
LM class to table of supported models / libraries
documentation
#1681
opened Apr 7, 2024 by
haileyschoelkopf
updated Apr 7, 2024
Add alternate (configurable) launcher / orchestration + sweep functionality
#1622
opened Mar 22, 2024 by
haileyschoelkopf
updated Mar 22, 2024
Make managing task variants / subversions easier
feature request
A feature that isn't implemented yet.
#1602
opened Mar 18, 2024 by
haileyschoelkopf
updated Mar 18, 2024
Add task variants replicating Llama 1 / 2 evaluation numbers
feature request
A feature that isn't implemented yet.
#1078
opened Dec 7, 2023 by
haileyschoelkopf
updated Mar 16, 2024
Make Adding New MCQA Metrics Easier
feature request
A feature that isn't implemented yet.
#1585
opened Mar 15, 2024 by
haileyschoelkopf
updated Mar 15, 2024
Expose Configuration Options for Perplexity calculations
feature request
A feature that isn't implemented yet.
#1565
opened Mar 12, 2024 by
haileyschoelkopf
updated Mar 12, 2024
Refactor main Improvements or additions to documentation.
feature request
A feature that isn't implemented yet.
evaluate()
loop into more readable sub-functions
documentation
#1100
opened Dec 11, 2023 by
haileyschoelkopf
updated Feb 6, 2024
Speed up + streamline prompt template rendering runtime
feature request
A feature that isn't implemented yet.
help wanted
Contributors and extra help welcome.
#1286
opened Jan 15, 2024 by
haileyschoelkopf
updated Jan 29, 2024
Organize / Cleanup Logging + Levels
documentation
Improvements or additions to documentation.
feature request
A feature that isn't implemented yet.
#1192
opened Dec 21, 2023 by
haileyschoelkopf
updated Jan 5, 2024
Previous Next
ProTip!
Follow long discussions with comments:>50.