Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the StoryCloze evaluation #8

Closed
1 of 2 tasks
StellaAthena opened this issue Sep 16, 2020 · 3 comments · Fixed by #53
Closed
1 of 2 tasks

Implement the StoryCloze evaluation #8

StellaAthena opened this issue Sep 16, 2020 · 3 comments · Fixed by #53
Labels
feature request A feature that isn't implemented yet. good first issue Good for newcomers

Comments

@StellaAthena
Copy link
Member

StellaAthena commented Sep 16, 2020

From the GPT-3 paper

We next evaluate GPT-3 on the StoryCloze 2016 dataset [MCH+16], which involves selecting the correct ending sentence for five-sentence long stories. Here GPT-3 achieves 83.2% in the zero-shot setting and 87.7% in the few-shot setting (with K = 70). This is still 4.1% lower than the fine-tuned SOTA using a BERT based model [LDL19] but improves over previous zero-shot results by roughly 10%.

  • Data processing code implemented
  • Evaluation implemented

The evaluation code should be modeled after the interface in lm_eval/base.py and the example of the BoolQ task in lm_eval/tasks/suerglue.py

@StellaAthena StellaAthena added the feature request A feature that isn't implemented yet. label Sep 16, 2020
@StellaAthena StellaAthena added this to To do in Implementing Evaluations via automation Sep 16, 2020
@cfoster0
Copy link
Contributor

cfoster0 commented Oct 5, 2020

Form to get access is here:

https://cs.rochester.edu/nlp/rocstories/

@StellaAthena
Copy link
Member Author

I have the download links and can provide them to anyone who DM’s me on Discord.

@StellaAthena StellaAthena added Eval Set and removed feature request A feature that isn't implemented yet. labels Oct 23, 2020
@anishthite anishthite self-assigned this Oct 24, 2020
@anishthite anishthite moved this from To do to In progress in Implementing Evaluations Oct 24, 2020
@anishthite anishthite linked a pull request Oct 24, 2020 that will close this issue
Implementing Evaluations automation moved this from In progress to Data integrated, Eval not done Oct 24, 2020
@StellaAthena StellaAthena reopened this Jan 5, 2021
@StellaAthena StellaAthena added feature request A feature that isn't implemented yet. good first issue Good for newcomers labels Jan 5, 2021
@leogao2 leogao2 moved this from In Progress to To do in Implementing Evaluations Jan 28, 2021
@leogao2 leogao2 moved this from To do, Evaluations to Implement to Deferred in Implementing Evaluations Feb 8, 2021
@leogao2 leogao2 moved this from Deferred to To do, Evaluations to Implement in Implementing Evaluations Feb 11, 2021
@leogao2 leogao2 moved this from To do, Evaluations to Implement to Deferred in Implementing Evaluations Jun 12, 2021
@jon-tow
Copy link
Member

jon-tow commented Apr 1, 2022

Implemented in #300

@jon-tow jon-tow closed this as completed Apr 1, 2022
Implementing Evaluations automation moved this from Deferred to Done, evaluations Apr 1, 2022
StellaAthena added a commit that referenced this issue Apr 29, 2022
qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023
LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A feature that isn't implemented yet. good first issue Good for newcomers
Projects
No open projects
Implementing Evaluations
  
Done, evaluations
Development

Successfully merging a pull request may close this issue.

4 participants