EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 1.6k
Star 5.9k

Code
Issues 214
Pull requests 69
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

[Discussion] Add Major Code Benchmarks

#1157 opened Dec 18, 2023 by haileyschoelkopf

Open 4

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

214 Open 727 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

AssertionError: aggregation named 'mean' conflicts with existing registered aggregation!

#1839 opened May 14, 2024 by hunter2009pf

Bug: wrong until default value for chat based model

#1837 opened May 14, 2024 by YilunZhou

Evaluation results of llama2 with lm-evaluation-harness using wikitext-2

#1833 opened May 13, 2024 by l2002924700

Using Language Models as Evaluators feature request

A feature that isn't implemented yet.

#1831 opened May 13, 2024 by lintangsutawika

eval gsm8k from local dataset folder with the bug info "ValueError: BuilderConfig 'main' not found."

#1829 opened May 12, 2024 by Jp-17

Add More Tests feature request

A feature that isn't implemented yet.

#1827 opened May 12, 2024 by haileyschoelkopf

Avoid slow testing due to network issues.

#1824 opened May 11, 2024 by pixeli99

The input format for XNLI seems wired?

#1822 opened May 10, 2024 by SefaZeng

TypeError: 'NoneType' object is not iterable when using cache and loglikelihood_rolling

#1821 opened May 10, 2024 by mdocekal

Task description newline characters removed by Jinja templating, affecting generated requests and performance

#1817 opened May 9, 2024 by ma0li

Multi-round evaluation for chat models

#1816 opened May 9, 2024 by YilunZhou

Multi Label Classification

#1814 opened May 9, 2024 by IsraelAbebe

how to run all the bigbench tasks at once?

#1809 opened May 8, 2024 by kbmlcoding

Gemini 1.5/Ultra support

#1808 opened May 8, 2024 by notrichardren

Support loading slices of a split from a dataset

#1788 opened May 6, 2024 by alexrs

openai.InternalServerError: the model generated invalid Unicode output

#1783 opened May 4, 2024 by djstrong

How to filter to see only generate_until: lm-eval --tasks list

#1772 opened May 2, 2024 by chigkim

Support OpenAI's Batch API

#1770 opened May 2, 2024 by djstrong

IndexError: list index out of range when running benchmark on gguf model

#1768 opened Apr 30, 2024 by fherrmannsdoerfer

Cannot have both a group list and task list asking questions

For asking for clarification / support on library usage.

bug

Something isn't working.

#1767 opened Apr 29, 2024 by steven-basart

Bug in yaml parsing

#1762 opened Apr 28, 2024 by jordane95

Does this support the model to use generate functions to eval not likelihood?

#1761 opened Apr 28, 2024 by Juhywcy

Output constrained support

#1759 opened Apr 27, 2024 by Mihaiii

HellaSwag with UnicodeDecodeError

#1757 opened Apr 27, 2024 by Hua-rookie

New Task Request: LegalBench feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

help wanted

Contributors and extra help welcome.

#1754 opened Apr 26, 2024 by haileyschoelkopf

Previous 1 2 3 4 5 … 8 9 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly