Implement `RACE` evaluation #104

jon-tow · 2021-01-29T21:15:32Z

Changes

Adds RACE evaluation.

Notes

RACE is one of those datasets that gpt3 performs better at with proper normalization. See Section 2.4 Evaluation, Language Models are Few-Shot Learners:
... on a small number of datasets (ARC, OpenBookQA, and RACE) we gain additional benefit as measured on the development set by normalizing by the unconditional probability of each completion, by computing P(completion|context) / P(completion|answer_context) , where P(completion|answer context) answer context is the string "Answer: " or "A: " and is used to prompt that the completion should be an answer but is otherwise generic.

Issue Reference: #21

Implement `RACE` evaluation

jon-tow and others added 2 commits January 29, 2021 16:06

Implement RACE evaluation

6e1a76f

Merge branch 'master' into race-evaluation

4835835

leogao2 self-requested a review January 29, 2021 22:17

leogao2 approved these changes Jan 30, 2021

View reviewed changes

leogao2 merged commit 0f30237 into EleutherAI:master Jan 30, 2021

jon-tow deleted the race-evaluation branch January 30, 2021 20:49

This was linked to issues Jan 30, 2021

Implement the RACE evaluation #21

Closed

RACE: nlp -> datasets #44

Closed

qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this pull request Aug 17, 2023

Merge pull request EleutherAI#104 from jon-tow/race-evaluation

7a86142

Implement `RACE` evaluation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `RACE` evaluation #104

Implement `RACE` evaluation #104

jon-tow commented Jan 29, 2021

Implement RACE evaluation #104

Implement RACE evaluation #104

Conversation

jon-tow commented Jan 29, 2021

Changes

Notes

Implement `RACE` evaluation #104

Implement `RACE` evaluation #104