Implement the StoryCloze evaluation #8

StellaAthena · 2020-09-16T16:35:06Z

From the GPT-3 paper

We next evaluate GPT-3 on the StoryCloze 2016 dataset [MCH+16], which involves selecting the correct ending sentence for five-sentence long stories. Here GPT-3 achieves 83.2% in the zero-shot setting and 87.7% in the few-shot setting (with K = 70). This is still 4.1% lower than the fine-tuned SOTA using a BERT based model [LDL19] but improves over previous zero-shot results by roughly 10%.

Data processing code implemented
Evaluation implemented

The evaluation code should be modeled after the interface in lm_eval/base.py and the example of the BoolQ task in lm_eval/tasks/suerglue.py

The text was updated successfully, but these errors were encountered:

cfoster0 · 2020-10-05T02:43:06Z

Form to get access is here:

https://cs.rochester.edu/nlp/rocstories/

StellaAthena · 2020-10-05T13:29:15Z

I have the download links and can provide them to anyone who DM’s me on Discord.

jon-tow · 2022-04-01T23:04:27Z

Implemented in #300

Pytest update

StellaAthena added the feature request A feature that isn't implemented yet. label Sep 16, 2020

StellaAthena added this to To do in Implementing Evaluations via automation Sep 16, 2020

StellaAthena added Eval Set and removed feature request A feature that isn't implemented yet. labels Oct 23, 2020

anishthite self-assigned this Oct 24, 2020

anishthite moved this from To do to In progress in Implementing Evaluations Oct 24, 2020

anishthite linked a pull request Oct 24, 2020 that will close this issue

Update drop to be consistent with gpt3 paper #53

Merged

StellaAthena closed this as completed in #53 Oct 24, 2020

Implementing Evaluations automation moved this from In progress to Data integrated, Eval not done Oct 24, 2020

StellaAthena reopened this Jan 5, 2021

StellaAthena unassigned anishthite Jan 5, 2021

StellaAthena added feature request A feature that isn't implemented yet. good first issue Good for newcomers labels Jan 5, 2021

leogao2 moved this from In Progress to To do in Implementing Evaluations Jan 28, 2021

leogao2 moved this from To do, Evaluations to Implement to Deferred in Implementing Evaluations Feb 8, 2021

leogao2 moved this from Deferred to To do, Evaluations to Implement in Implementing Evaluations Feb 11, 2021

leogao2 moved this from To do, Evaluations to Implement to Deferred in Implementing Evaluations Jun 12, 2021

jon-tow closed this as completed Apr 1, 2022

Implementing Evaluations automation moved this from Deferred to Done, evaluations Apr 1, 2022

StellaAthena added a commit that referenced this issue Apr 29, 2022

Merge pull request #8 from dirkgr/PytestUpdate

1cd4ec0

Pytest update

qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023

Merge pull request EleutherAI#8 from dirkgr/PytestUpdate

640b4b9

Pytest update

LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023

Merge pull request EleutherAI#8 from dirkgr/PytestUpdate

75435b6

Pytest update

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the StoryCloze evaluation #8

Implement the StoryCloze evaluation #8

StellaAthena commented Sep 16, 2020 •

edited

Loading

cfoster0 commented Oct 5, 2020

StellaAthena commented Oct 5, 2020

jon-tow commented Apr 1, 2022

Implement the StoryCloze evaluation #8

Implement the StoryCloze evaluation #8

Comments

StellaAthena commented Sep 16, 2020 • edited Loading

cfoster0 commented Oct 5, 2020

StellaAthena commented Oct 5, 2020

jon-tow commented Apr 1, 2022

StellaAthena commented Sep 16, 2020 •

edited

Loading