EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 1.6k
Star 6k

Code
Issues 225
Pull requests 73
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

[Discussion] Add Major Code Benchmarks

#1157 opened Dec 18, 2023 by haileyschoelkopf

Open 4

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

0 Open 718 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Add Logits to OpenAI ChatCompletions model declined

A proposed dataset or feature request that will not be implemented.

feature request

A feature that isn't implemented yet.

help wanted

Contributors and extra help welcome.

#1196 by haileyschoelkopf was closed May 23, 2024

Support wrapping prompts with a given Chat Template feature request

A feature that isn't implemented yet.

help wanted

Contributors and extra help welcome.

opinions wanted

For discussing open questions.

#1098 by haileyschoelkopf was closed Jun 11, 2024 v0.4.3

pubmedqa task data fails to download

#312 by stas00 was closed May 11, 2022

Does lm-eval support models like OPT or LLama?

#401 by Jeffwan was closed Mar 25, 2023

Implement the Natural Questions evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#9 by StellaAthena was closed Aug 21, 2023

1 of 2 tasks

Support for ggml good first issue

Good for newcomers

help wanted

Contributors and extra help welcome.

#417 by philwee was closed Nov 3, 2023

FileNotFoundError: Couldn't find a module script at exact_match.py. Module 'exact_match' doesn't exist on the Hugging Face Hub either. bug

Something isn't working.

#1071 by xinghuang2050 was closed Jul 1, 2024

Add --predict_only mode (run without scoring outputs) feature request

A feature that isn't implemented yet.

help wanted

Contributors and extra help welcome.

#1152 by haileyschoelkopf was closed Jan 31, 2024

About the results of WizardMath on GSM8K

#1274 by tianshuocong was closed Feb 7, 2024

Local dataset or model path support bug

Something isn't working.

#1224 by ycsong1212 was closed Jan 2, 2024

Add a way to instantiate from HF.AutoModel

#521 by svenhendrikx was closed Jun 28, 2023

I get this error whenever I try to run an eval: ImportError: cannot import name 'HfApi' from 'huggingface_hub'

#1826 by menhguin was closed May 26, 2024

Revert PR 497 for MMLU/hendrycksTest to be compatible with Open LLM Leaderboard

#614 by taoari was closed Nov 8, 2023

Dummy perplexity on LAMBADA good first issue

Good for newcomers

help wanted

Contributors and extra help welcome.

#350 by lostmsu was closed Nov 8, 2023

KeyError: 'Cache only has 0 layers, attempted to access layer with index 0' bug

Something isn't working.

#1250 by kirayomato was closed Jan 31, 2024

Can not add mmlu task inside benchmark?

#1301 by fahadh4ilyas was closed Feb 1, 2024

Error when running request generate_until

#1310 by fahadh4ilyas was closed Feb 2, 2024

Pass the model as .pt from args!

#535 by NamburiSrinath was closed Jun 4, 2023

Bad results for LLaMA bug

Something isn't working.

good first issue

Good for newcomers

help wanted

Contributors and extra help welcome.

#443 by juletx was closed Aug 8, 2023

GGUF Local Model bug

Something isn't working.

#1254 by kolbeuk was closed Jan 8, 2024

Security features from the Hugging Face datasets library feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

help wanted

Contributors and extra help welcome.

#1135 by lhoestq was closed Mar 3, 2024

Inverse Scaling Tasks? feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

help wanted

Contributors and extra help welcome.

#1442 by RylanSchaeffer was closed Jul 3, 2024

Implement GPT-3 style contamination study feature request

A feature that isn't implemented yet.

#231 by StellaAthena was closed Nov 1, 2023

RecursionError: maximum recursion depth exceeded bug

Something isn't working.

#442 by philwee was closed Nov 8, 2023

Winogrande Performance Discrepency bug

Something isn't working.

#1249 by lintangsutawika was closed Jan 8, 2024

Previous 1 2 3 4 5 … 28 29 Next

Previous Next

ProTip! Type g i on any issue or pull request to go back to the issue listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly