Skip to content

Issues: EleutherAI/lm-evaluation-harness

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Sort

Issues list

RACE: nlp -> datasets bug Something isn't working.
#44 by cfoster0 was closed Oct 22, 2020 updated Oct 22, 2020
Add flag to allow the evaluations to be carried out on a subset of the eval tasks feature request A feature that isn't implemented yet.
#60 by StellaAthena was closed Nov 23, 2020 updated Nov 23, 2020
Support richer example-packing functionality. feature request A feature that isn't implemented yet.
#31 by zphang was closed Jan 4, 2021 updated Jan 4, 2021
Support writing out predictions feature request A feature that isn't implemented yet.
#32 by zphang was closed Jan 4, 2021 updated Jan 4, 2021
Double check all of the zero/few-shot formats documentation Improvements or additions to documentation.
#34 by leogao2 was closed Jan 4, 2021 updated Jan 4, 2021
Make the eval_harness talk to the server feature request A feature that isn't implemented yet.
#62 by StellaAthena was closed Jan 4, 2021 updated Jan 4, 2021
Implement arithmetic evaluations feature request A feature that isn't implemented yet. good first issue Good for newcomers
#25 by StellaAthena was closed Jan 28, 2021 updated Jan 28, 2021
2 tasks done
Possible Bug?: argmax in sat.py comparison bug Something isn't working.
#83 by nicholaskross was closed Jan 28, 2021 updated Jan 28, 2021
Implement all GLUE evaluations
#92 by leogao2 was closed Jan 28, 2021 updated Jan 28, 2021
Implement the LAMBADA evaluation feature request A feature that isn't implemented yet.
#6 by StellaAthena was closed Jan 29, 2021 updated Jan 30, 2021
Implement the SAT evaluation feature request A feature that isn't implemented yet. good first issue Good for newcomers
#27 by StellaAthena was closed Jan 8, 2021 updated Feb 3, 2021
2 tasks done
Implement the TriviaQA evaluation feature request A feature that isn't implemented yet. good first issue Good for newcomers
#11 by StellaAthena was closed Jan 30, 2021 updated Feb 3, 2021
2 tasks done
Implement the Adversarial Natural Language Inference (ANLI) evaluation feature request A feature that isn't implemented yet.
#24 by StellaAthena was closed Jan 30, 2021 updated Feb 3, 2021
1 of 2 tasks
Implement the RACE evaluation feature request A feature that isn't implemented yet. good first issue Good for newcomers
#21 by StellaAthena was closed Jan 30, 2021 updated Feb 3, 2021
2 tasks done
Implement the PhysicalQA evaluation feature request A feature that isn't implemented yet. good first issue Good for newcomers
#14 by StellaAthena was closed Feb 3, 2021 updated Feb 3, 2021
1 of 2 tasks
Implement the WSC273 Winograd Schemas Challenge evaluation feature request A feature that isn't implemented yet. good first issue Good for newcomers
#12 by StellaAthena was closed Feb 3, 2021 updated Feb 3, 2021
2 tasks done
Implement the ARC Challenge evaluation feature request A feature that isn't implemented yet. good first issue Good for newcomers
#15 by StellaAthena was closed Feb 5, 2021 updated Feb 5, 2021
2 tasks done
Implement the PubMedQA Evaluation
#125 by leogao2 was closed Feb 6, 2021 updated Feb 6, 2021
Implement the SciQ evaluation
#118 by leogao2 was closed Feb 6, 2021 updated Feb 6, 2021
Implement the adversarially-mined Winogrande evaluation feature request A feature that isn't implemented yet. good first issue Good for newcomers
#13 by StellaAthena was closed Feb 3, 2021 updated Feb 8, 2021
2 tasks done
Implement the WebQuestions evaluation feature request A feature that isn't implemented yet. good first issue Good for newcomers
#10 by StellaAthena was closed Feb 8, 2021 updated Feb 8, 2021
1 of 2 tasks
Implement the OpenBookQA evaluation feature request A feature that isn't implemented yet. good first issue Good for newcomers
#16 by StellaAthena was closed Feb 9, 2021 updated Feb 9, 2021
2 tasks done
Implement the HellaSwag evaluation feature request A feature that isn't implemented yet. good first issue Good for newcomers
#7 by StellaAthena was closed Feb 8, 2021 updated Feb 11, 2021
2 tasks done
Implement the QA4MRE evaluation
#113 by leogao2 was closed Feb 12, 2021 updated Feb 12, 2021
Implement the Natural Language Inference (NLI) evaluation feature request A feature that isn't implemented yet. good first issue Good for newcomers
#23 by StellaAthena was closed Feb 12, 2021 updated Feb 12, 2021
1 of 2 tasks
ProTip! Updated in the last three days: updated:>2024-08-20.