Implement the TriviaQA evaluation #11

StellaAthena · 2020-09-16T16:37:51Z

From the GPT-3 paper:

In this section we measure GPT-3’s ability to answer questions about broad factual knowledge. Due to the immense amount of possible queries, this task has normally been approached by using an information retrieval system to find relevant text in combination with a model which learns to generate an answer given the question and the retrieved text. Since this setting allows a system to search for and condition on text which potentially contains the answer it is denoted “open-book”. [RRS20] recently demonstrated that a large language model can perform surprisingly well directly answering the questions without conditioning on auxilliary information. They denote this more restrictive evaluation setting as “closed-book”. Their work suggests that even higher-capacity models could perform even better and we test this hypothesis with GPT-3. We evaluate GPT-3 on the 3 datasets in [RRS20]: Natural Questions [KPR+19], WebQuestions [BCFL13], and TriviaQA [JCWZ17], using the same splits. Note that in addition to all results being in the closed-book setting, our use of few-shot, one-shot, and zero-shot evaluations represent an even stricter setting than previous closed-book QA work: in addition to external content not being allowed, fine-tuning on the Q&A dataset itself is also not permitted.

Data processing code implemented
Evaluation implemented

The evaluation code should be modeled after the interface in lm_eval/base.py and the example of the BoolQ task in lm_eval/tasks/suerglue.py

The text was updated successfully, but these errors were encountered:

cfoster0 · 2020-10-01T04:27:25Z

Note: HuggingFace includes this in its datasets package.

https://huggingface.co/datasets/trivia_qa

Add `"\n###\n"` example separator

…ext-separator Add `"\n###\n"` example separator

StellaAthena added the feature request A feature that isn't implemented yet. label Sep 16, 2020

StellaAthena added this to To do in Implementing Evaluations via automation Sep 16, 2020

anishthite moved this from To do to In progress in Implementing Evaluations Oct 4, 2020

anishthite self-assigned this Oct 4, 2020

StellaAthena added Eval Set and removed feature request A feature that isn't implemented yet. labels Oct 23, 2020

anishthite linked a pull request Oct 24, 2020 that will close this issue

Update drop to be consistent with gpt3 paper #53

Merged

StellaAthena closed this as completed in #53 Oct 24, 2020

Implementing Evaluations automation moved this from In progress to Data integrated, Eval not done Oct 24, 2020

StellaAthena reopened this Jan 5, 2021

StellaAthena unassigned anishthite Jan 5, 2021

StellaAthena added feature request A feature that isn't implemented yet. good first issue Good for newcomers labels Jan 5, 2021

leogao2 moved this from In Progress to To do in Implementing Evaluations Jan 28, 2021

anthony-dipofi mentioned this issue Jan 30, 2021

add evaluation for TriviaQA dataset based on loglikelihood method #107

Merged

leogao2 moved this from To do to Done in Implementing Evaluations Jan 30, 2021

leogao2 closed this as completed Jan 30, 2021

StellaAthena linked a pull request Jan 30, 2021 that will close this issue

add evaluation for TriviaQA dataset based on loglikelihood method #107

Merged

leogao2 assigned anthony-dipofi and leogao2 Feb 3, 2021

StellaAthena pushed a commit that referenced this issue Apr 29, 2022

Merge pull request #11 from bigscience-workshop/update-context-separator

20820c3

Add `"\n###\n"` example separator

qmdnls pushed a commit to qmdnls/lm-evaluation-harness that referenced this issue Aug 17, 2023

Merge pull request EleutherAI#11 from bigscience-workshop/update-cont…

eb68610

…ext-separator Add `"\n###\n"` example separator

LZY-the-boys pushed a commit to LZY-the-boys/lm-evaluation-harness-fast that referenced this issue Sep 12, 2023

Merge pull request EleutherAI#11 from bigscience-workshop/update-cont…

dba4c69

…ext-separator Add `"\n###\n"` example separator

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the TriviaQA evaluation #11

Implement the TriviaQA evaluation #11

StellaAthena commented Sep 16, 2020 •

edited

Loading

cfoster0 commented Oct 1, 2020

Implement the TriviaQA evaluation #11

Implement the TriviaQA evaluation #11

Comments

StellaAthena commented Sep 16, 2020 • edited Loading

cfoster0 commented Oct 1, 2020

StellaAthena commented Sep 16, 2020 •

edited

Loading