EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 1.6k
Star 6.2k

Code
Issues 258
Pull requests 79
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

[Discussion] Add Major Code Benchmarks

#1157 opened Dec 18, 2023 by haileyschoelkopf

Open 4

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

37 Open 46 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

How to evaluate very large models (>= 70b) ?

#2193 opened Aug 6, 2024 by eldarkurtic

After executing tmmluplus, there are no group scores displayed, only the scores for each individual task are shown.

#2164 opened Jul 31, 2024 by zhuangyuan123

API model: Evaluation fails when all samples are cached

#2141 opened Jul 27, 2024 by baberabb

ValueError when task name collide with local directory names

#2122 opened Jul 20, 2024 by alat-rights

Issues with BBH benchmark

#2095 opened Jul 12, 2024 by berkatil

Having issues with MMLU benchmark

#2094 opened Jul 12, 2024 by berkatil

Implementing Anthropic's discrimination evaluation

#2072 opened Jul 5, 2024 by notrichardren

max_new_tokens and max_length conflict

#2070 opened Jul 5, 2024 by meg-huggingface

Fix partial caching of openai models

#1997 opened Jun 19, 2024 by ciaranby

Loading…

Multiple issues Encountered During Tasks Verification

#1885 opened May 25, 2024 by zhabuye

How to use Zeno

#1842 opened May 14, 2024 by DavidAdamczyk

eval gsm8k from local dataset folder with the bug info "ValueError: BuilderConfig 'main' not found."

#1829 opened May 12, 2024 by Jp-17

TypeError: 'NoneType' object is not iterable when using cache and loglikelihood_rolling

#1821 opened May 10, 2024 by mdocekal

ValueError: BuilderConfig 'pile_freelaw' not found., issue on running PILE eval

#1714 opened Apr 16, 2024 by Harryalways317

Addition of BedrockChatModel

#1708 opened Apr 16, 2024 by jacquelinegarrahan

Loading…

List of until tokens in cli args not supported feature request

A feature that isn't implemented yet.

#1677 opened Apr 5, 2024 by tianyi-chen

Add a docs FAQ section documentation

Improvements or additions to documentation.

#1676 opened Apr 5, 2024 by haileyschoelkopf

Clarification on API Endpoint: /v1/completions vs /v1/chat/completions

#1637 opened Mar 26, 2024 by gerayking

Allow registering custom LM implementations without requiring loading and modifying lm_eval code bug

Something isn't working.

#1621 opened Mar 22, 2024 by apetrov-msk

Negative perplexity values asking questions

For asking for clarification / support on library usage.

#1595 opened Mar 17, 2024 by shikhar-srivastava

Make Adding New MCQA Metrics Easier feature request

A feature that isn't implemented yet.

#1585 opened Mar 15, 2024 by haileyschoelkopf

When using parallelize=True, raise Runtime Error: expected all tensors to be on the same device bug

Something isn't working.

#1575 opened Mar 14, 2024 by feiba54

coqa not working bug

Something isn't working.

#1529 opened Mar 5, 2024 by lchu-ibm

Add New Lambada Translations good first issue

Good for newcomers

#1501 opened Feb 29, 2024 by haileyschoelkopf

llama / gguf interface broken? bug

Something isn't working.

#1437 opened Feb 17, 2024 by Nold360

Previous 1 2 Next

Previous Next

ProTip! no:milestone will show everything without a milestone.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly