Implement the SAT evaluation #27

StellaAthena · 2020-09-16T17:14:22Z

From the GPT-3 paper:

To test GPT-3 on another task that is somewhat unusual relative to the typical distribution of text, we collected a set of 374 “SAT analogy” problems [TLBS03]. Analogies are a style of multiple choice question that constituted a section of the SAT college entrance exam before 2005. A typical example is “audacious is to boldness as (a) sanctimonious is to hypocrisy, (b) anonymous is to identity, (c) remorseful is to misdeed, (d) deleterious is to result, (e) impressionable is to temptation”. The student is expected to choose which of the five word pairs has the same relationship as the original word pair; in this example the answer is “sanctimonious is to hypocrisy”. On this task GPT-3 achieves 65.2% in the few-shot setting, 59.1% in the one-shot setting, and 53.7% in the zero-shot setting, whereas the average score among college applicants was 57% [TL05] (random guessing yields 20%). As shown in Figure 3.12, the results improve with scale, with the the full 175 billion model improving by over 10% compared to the 13 billion parameter model.

Data processing code implemented
Evaluation implemented

The evaluation code should be modeled after the interface in lm_eval/base.py and the example of the BoolQ task in lm_eval/tasks/suerglue.py

The text was updated successfully, but these errors were encountered:

cfoster0 · 2020-10-01T04:32:59Z

Should be available on request from Peter Turney.

https://www.apperceptual.com/

cfoster0 · 2020-10-24T18:09:14Z

Will post here if/when we get a response.

cfoster0 · 2020-10-25T01:47:34Z

Got a response. PM me on the Discord if you need access.

nicholaskross · 2021-01-06T01:48:39Z

I could do the eval here

Add HuffPost

add manual accuracy scores

StellaAthena added the feature request A feature that isn't implemented yet. label Sep 16, 2020

StellaAthena added Eval Set and removed feature request A feature that isn't implemented yet. labels Oct 23, 2020

cfoster0 mentioned this issue Oct 25, 2020

Add SAT Analogy dataset #57

Merged

StellaAthena linked a pull request Oct 26, 2020 that will close this issue

Add SAT Analogy dataset #57

Merged

StellaAthena closed this as completed in #57 Oct 26, 2020

StellaAthena reopened this Nov 18, 2020

StellaAthena removed the Eval Set label Dec 23, 2020

StellaAthena closed this as completed Jan 4, 2021

StellaAthena reopened this Jan 5, 2021

StellaAthena added feature request A feature that isn't implemented yet. good first issue Good for newcomers labels Jan 5, 2021

nicholaskross mentioned this issue Jan 6, 2021

Started SAT eval #80

Merged

StellaAthena assigned nicholaskross Jan 6, 2021

StellaAthena linked a pull request Jan 6, 2021 that will close this issue

Started SAT eval #80

Merged

leogao2 closed this as completed in #80 Jan 8, 2021

StellaAthena linked a pull request Jan 9, 2021 that will close this issue

Refactor and (re)implement SAT evaluation #82

Merged

leogao2 assigned cfoster0 and unassigned nicholaskross Feb 3, 2021

leogao2 self-assigned this Feb 3, 2021

StellaAthena added a commit that referenced this issue Apr 29, 2022

Merge pull request #27 from bigscience-workshop/add_HuffPost

2e0b659

Add HuffPost

qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023

Merge pull request EleutherAI#27 from bigscience-workshop/add_HuffPost

445b0b1

Add HuffPost

LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023

Merge pull request EleutherAI#27 from bigscience-workshop/add_HuffPost

f276f19

Add HuffPost

lintangsutawika pushed a commit that referenced this issue Jul 8, 2024

Merge pull request #27 from JessicaOjo/africamgsm

692510c

add manual accuracy scores

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the SAT evaluation #27

Implement the SAT evaluation #27

StellaAthena commented Sep 16, 2020 •

edited

Loading

cfoster0 commented Oct 1, 2020

cfoster0 commented Oct 24, 2020

cfoster0 commented Oct 25, 2020

nicholaskross commented Jan 6, 2021

Implement the SAT evaluation #27

Implement the SAT evaluation #27

Comments

StellaAthena commented Sep 16, 2020 • edited Loading

cfoster0 commented Oct 1, 2020

cfoster0 commented Oct 24, 2020

cfoster0 commented Oct 25, 2020

nicholaskross commented Jan 6, 2021

StellaAthena commented Sep 16, 2020 •

edited

Loading