Implement the HellaSwag evaluation #7

StellaAthena · 2020-09-16T16:33:32Z

From the GPT-3 paper

The HellaSwag dataset [ZHB+19] involves picking the best ending to a story or set of instructions. The examples were adversarially mined to be difficult for language models while remaining easy for humans (who achieve 95.6% accuracy). GPT-3 achieves 78.1% accuracy in the one-shot setting and 79.3% accuracy in the few-shot setting, outperforming the 75.4% accuracy of a fine-tuned 1.5B parameter language model [ZHR+19] but still a fair amount lower than the overall SOTA of 85.6% achieved by the fine-tuned multi-task model ALUM.

Data processing code implemented
Evaluation implemented

The evaluation code should be modeled after the interface in lm_eval/base.py and the example of the BoolQ task in lm_eval/tasks/suerglue.py

The text was updated successfully, but these errors were encountered:

cfoster0 · 2020-10-01T04:25:42Z

Note: HuggingFace includes this in its datasets package.

https://huggingface.co/datasets/hellaswag

cfoster0 · 2020-10-05T06:01:06Z

Working on adding this for dedupe.

Fork update and long-overdue SQuAD fixes

Add GEM/xsum

Fork update and long-overdue SQuAD fixes

…_xsum Add GEM/xsum

Fork update and long-overdue SQuAD fixes

…_xsum Add GEM/xsum

update prompt translations

StellaAthena added the feature request A feature that isn't implemented yet. label Sep 16, 2020

StellaAthena added this to To do in Implementing Evaluations via automation Sep 16, 2020

StellaAthena changed the title ~~Implement the HellaSwag test~~ Implement the HellaSwag evaluation Sep 16, 2020

StellaAthena assigned cfoster0 Oct 5, 2020

StellaAthena moved this from To do to In progress in Implementing Evaluations Oct 5, 2020

cfoster0 mentioned this issue Oct 22, 2020

Add HellaSwag dataset #43

Merged

StellaAthena linked a pull request Oct 23, 2020 that will close this issue

Add HellaSwag dataset #43

Merged

StellaAthena added Eval Set and removed feature request A feature that isn't implemented yet. labels Oct 23, 2020

StellaAthena moved this from In progress to Data integrated, Eval not done in Implementing Evaluations Oct 23, 2020

StellaAthena closed this as completed Oct 23, 2020

StellaAthena reopened this Jan 5, 2021

StellaAthena unassigned cfoster0 Jan 5, 2021

StellaAthena added feature request A feature that isn't implemented yet. good first issue Good for newcomers labels Jan 5, 2021

leogao2 moved this from In Progress to To do in Implementing Evaluations Jan 28, 2021

jon-tow mentioned this issue Jan 28, 2021

Add HellaSwag evaluation implementation #99

Merged

anishthite self-assigned this Feb 7, 2021

leogao2 moved this from To do, Evaluations to Implement to Done, evaluations in Implementing Evaluations Feb 8, 2021

leogao2 assigned jon-tow and unassigned anishthite Feb 8, 2021

leogao2 closed this as completed Feb 8, 2021

leogao2 pushed a commit that referenced this issue Mar 28, 2021

Merge pull request #7 from cfoster0/greedyuntil

538be6d

Fork update and long-overdue SQuAD fixes

StellaAthena pushed a commit that referenced this issue Apr 29, 2022

Merge pull request #7 from bigscience-workshop/kkawamu1/gem_xsum

6e56cd0

Add GEM/xsum

qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023

Merge pull request EleutherAI#7 from cfoster0/greedyuntil

a3d17be

Fork update and long-overdue SQuAD fixes

qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023

Merge pull request EleutherAI#7 from bigscience-workshop/kkawamu1/gem…

538f50b

…_xsum Add GEM/xsum

LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023

Merge pull request EleutherAI#7 from cfoster0/greedyuntil

9d211f4

Fork update and long-overdue SQuAD fixes

LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023

Merge pull request EleutherAI#7 from bigscience-workshop/kkawamu1/gem…

9f0c51f

…_xsum Add GEM/xsum

lintangsutawika pushed a commit that referenced this issue Jul 8, 2024

Merge pull request #7 from JessicaOjo/africanli

02f7435

update prompt translations

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the HellaSwag evaluation #7

Implement the HellaSwag evaluation #7

StellaAthena commented Sep 16, 2020 •

edited by jon-tow

Loading

cfoster0 commented Oct 1, 2020

cfoster0 commented Oct 5, 2020

Implement the HellaSwag evaluation #7

Implement the HellaSwag evaluation #7

Comments

StellaAthena commented Sep 16, 2020 • edited by jon-tow Loading

cfoster0 commented Oct 1, 2020

cfoster0 commented Oct 5, 2020

StellaAthena commented Sep 16, 2020 •

edited by jon-tow

Loading