Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement RACE evaluation #104

Merged
merged 2 commits into from
Jan 30, 2021
Merged

Conversation

jon-tow
Copy link
Member

@jon-tow jon-tow commented Jan 29, 2021

Changes

  • Adds RACE evaluation.

Notes

  • RACE is one of those datasets that gpt3 performs better at with proper normalization. See Section 2.4 Evaluation, Language Models are Few-Shot Learners:
    ... on a small number of datasets (ARC, OpenBookQA, and RACE) we gain additional benefit as measured on the development set by normalizing by the unconditional probability of each completion, by computing P(completion|context) / P(completion|answer_context) , where P(completion|answer context) answer context is the string "Answer: " or "A: " and is used to prompt that the completion should be an answer but is otherwise generic.

Issue Reference: #21

@leogao2 leogao2 self-requested a review January 29, 2021 22:17
@leogao2 leogao2 merged commit 0f30237 into EleutherAI:master Jan 30, 2021
@jon-tow jon-tow deleted the race-evaluation branch January 30, 2021 20:49
This was linked to issues Jan 30, 2021
qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this pull request Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RACE: nlp -> datasets Implement the RACE evaluation
2 participants