EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 1.5k
Star 5.6k

Code
Issues 209
Pull requests 68
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

[Discussion] Add Major Code Benchmarks

#1157 opened Dec 18, 2023 by haileyschoelkopf

Open 4

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

209 Open 686 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

Inconsistent evaluation results with Chat Template

#1841 opened May 14, 2024 by shiweijiezero

👍

Add support to azure-openai deployed models

#1733 opened Apr 22, 2024 by bcarvalho-via

👍

Integrate Semantic Answer Similarity (SAS) into the evaluation metrics.

#1703 opened Apr 15, 2024 by gonzalo-santamaria-iic

👍

Avoid slow testing due to network issues.

#1824 opened May 11, 2024 by pixeli99

👍

ValueError: BuilderConfig 'pile_freelaw' not found., issue on running PILE eval

#1714 opened Apr 16, 2024 by Harryalways317

👍

Should num_fewshot be type list? feature request

A feature that isn't implemented yet.

#837 opened Sep 6, 2023 by Wehzie

👍

How to interpret generated results for truthful_qa test

#993 opened Nov 15, 2023 by Joetib

👍

How to filter to see only generate_until: lm-eval --tasks list

#1772 opened May 2, 2024 by chigkim

👍

Allow --include_path to import an externally-defined LM subclass feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

help wanted

Contributors and extra help welcome.

#1457 opened Feb 22, 2024 by haileyschoelkopf

👍

Save fewshot_as_multiturn argument in results.json

#1941 opened Jun 10, 2024 by djstrong

👍

Sanity checking the semantic meaning of "perplexity" in code asking questions

For asking for clarification / support on library usage.

#1581 opened Mar 15, 2024 by RylanSchaeffer

👍

Whitespace before label in MultipleChoiceTask causes wrong label probability prediction

#1556 opened Mar 11, 2024 by RibinMTC

👍

Add a way to instantiate from HF.AutoModel (again)

#1978 opened Jun 17, 2024 by dmitrii-palisaderesearch

👍

Quac Dataset feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#827 opened Sep 4, 2023 by RanchiZhao

👍

OpenaiCompletionsLM invokes the completions API with max_tokens set to 0

#1903 opened May 29, 2024 by chimezie

👍

[TruthfulQA] update rouge-score version or add a way to suppress tokenizer logging

#1692 opened Apr 9, 2024 by skramer-dev

👍

Support loading slices of a split from a dataset

#1788 opened May 6, 2024 by alexrs

👍

Allow Task objects to defer dataset download feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

help wanted

Contributors and extra help welcome.

#1558 opened Mar 11, 2024 by haileyschoelkopf

👍

Gemini 1.5/Ultra support

#1808 opened May 8, 2024 by notrichardren

👍

[New Task] CommonsenseQA feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

help wanted

Contributors and extra help welcome.

#1026 opened Nov 27, 2023 by haileyschoelkopf

👍

Request for files to be placed in 'path/containing/training/set/ngrams'.

#1375 opened Jan 31, 2024 by dsdanielpark

👍

OpenAI models not working with truthfulqa_* tasks

#1704 opened Apr 15, 2024 by ichitaka

👍

Add tasks for performance on long context lengths feature request

A feature that isn't implemented yet.

#1748 opened Apr 25, 2024 by nairbv

👍

wrong regular expression for exact match scoring

#1303 opened Jan 17, 2024 by Hannibal046

👍

globally normalized models

#960 opened Nov 3, 2023 by denizyuret

👍

Previous 1 2 3 4 5 … 8 9 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly