EleutherAI / lm-evaluation-harness Public

Notifications You must be signed in to change notification settings
Fork 1.6k
Star 6.2k

Code
Issues 262
Pull requests 77
Actions
Projects 1
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: EleutherAI/lm-evaluation-harness

[Discussion] Add Major Code Benchmarks

#1157 opened Dec 18, 2023 by haileyschoelkopf

Open 4

Labels 10 Milestones 1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear current search query, filters, and sorts

262 Open 758 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

RACE: nlp -> datasets bug

Something isn't working.

#44 by cfoster0 was closed Oct 22, 2020 updated Oct 22, 2020

Add flag to allow the evaluations to be carried out on a subset of the eval tasks feature request

A feature that isn't implemented yet.

#60 by StellaAthena was closed Nov 23, 2020 updated Nov 23, 2020

Support richer example-packing functionality. feature request

A feature that isn't implemented yet.

#31 by zphang was closed Jan 4, 2021 updated Jan 4, 2021

Support writing out predictions feature request

A feature that isn't implemented yet.

#32 by zphang was closed Jan 4, 2021 updated Jan 4, 2021

Double check all of the zero/few-shot formats documentation

Improvements or additions to documentation.

#34 by leogao2 was closed Jan 4, 2021 updated Jan 4, 2021

Make the eval_harness talk to the server feature request

A feature that isn't implemented yet.

#62 by StellaAthena was closed Jan 4, 2021 updated Jan 4, 2021

Implement arithmetic evaluations feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#25 by StellaAthena was closed Jan 28, 2021 updated Jan 28, 2021

2 tasks done

Possible Bug?: argmax in sat.py comparison bug

Something isn't working.

#83 by nicholaskross was closed Jan 28, 2021 updated Jan 28, 2021

Implement all GLUE evaluations

#92 by leogao2 was closed Jan 28, 2021 updated Jan 28, 2021

Implement the LAMBADA evaluation feature request

A feature that isn't implemented yet.

#6 by StellaAthena was closed Jan 29, 2021 updated Jan 30, 2021

Implement the SAT evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#27 by StellaAthena was closed Jan 8, 2021 updated Feb 3, 2021

2 tasks done

Implement the TriviaQA evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#11 by StellaAthena was closed Jan 30, 2021 updated Feb 3, 2021

2 tasks done

Implement the Adversarial Natural Language Inference (ANLI) evaluation feature request

A feature that isn't implemented yet.

#24 by StellaAthena was closed Jan 30, 2021 updated Feb 3, 2021

1 of 2 tasks

Implement the RACE evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#21 by StellaAthena was closed Jan 30, 2021 updated Feb 3, 2021

2 tasks done

Implement the PhysicalQA evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#14 by StellaAthena was closed Feb 3, 2021 updated Feb 3, 2021

1 of 2 tasks

Implement the WSC273 Winograd Schemas Challenge evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#12 by StellaAthena was closed Feb 3, 2021 updated Feb 3, 2021

2 tasks done

Implement the ARC Challenge evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#15 by StellaAthena was closed Feb 5, 2021 updated Feb 5, 2021

2 tasks done

Implement the PubMedQA Evaluation

#125 by leogao2 was closed Feb 6, 2021 updated Feb 6, 2021

Implement the SciQ evaluation

#118 by leogao2 was closed Feb 6, 2021 updated Feb 6, 2021

Implement the adversarially-mined Winogrande evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#13 by StellaAthena was closed Feb 3, 2021 updated Feb 8, 2021

2 tasks done

Implement the WebQuestions evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#10 by StellaAthena was closed Feb 8, 2021 updated Feb 8, 2021

1 of 2 tasks

Implement the OpenBookQA evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#16 by StellaAthena was closed Feb 9, 2021 updated Feb 9, 2021

2 tasks done

Implement the HellaSwag evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#7 by StellaAthena was closed Feb 8, 2021 updated Feb 11, 2021

2 tasks done

Implement the QA4MRE evaluation

#113 by leogao2 was closed Feb 12, 2021 updated Feb 12, 2021

Implement the Natural Language Inference (NLI) evaluation feature request

A feature that isn't implemented yet.

good first issue

Good for newcomers

#23 by StellaAthena was closed Feb 12, 2021 updated Feb 12, 2021

1 of 2 tasks

Previous 1 2 3 4 5 … 30 31 Next

Previous Next

ProTip! Updated in the last three days: updated:>2024-08-20.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly