Implement the Adversarial Natural Language Inference (ANLI) evaluation #24

StellaAthena · 2020-09-16T17:01:33Z

From the GPT-3 paper:

Natural Language Inference (NLI) [Fyo00] concerns the ability to understand the relationship between two sentences. In practice, this task is usually structured as a two or three class classification problem where the model classifies whether the second sentence logically follows from the first, contradicts the first santence, or is possibly true (neutral).
SuperGLUE includes an NLI dataset, RTE, which evaluates the binary version of the task. On RTE, only the largest version of GPT-3 performs convincingly better than random (56%) in any evaluation setting, but in a few-shot setting GPT-3 performs similarly to a single-task fine-tuned BERT Large. We also evaluate on the recently introduced Adversarial Natural Language Inference (ANLI) dataset [NWD+19]. ANLI is a difficult dataset employing a series of adversarially mined natural language inference questions in three rounds (R1, R2, and R3). Similar to RTE, all of our models smaller than GPT-3 perform at almost exactly random chance on ANLI, even in the few-shot setting (∼ 33%), whereas GPT-3 itself shows signs of life on Round 3. Results for ANLI R3 are highlighted in Figure 3.9 and full results for all rounds can be found in Appendix H. These results on both RTE and ANLI suggest that NLI is still a very difficult task for language models and they are only just beginning to show signs of progress.

Data processing code implemented
Evaluation implemented

This should be modeled after the BoolQ task in lm_eval/tasks/suerglue.py

The text was updated successfully, but these errors were encountered:

cfoster0 · 2020-10-01T04:35:10Z

Note: HuggingFace includes this in its datasets package.

https://huggingface.co/datasets/anli

…stc8 Add the Schema Guided Dialogue (DSTC8) - Response generation

xnli changes for open models

StellaAthena added the feature request A feature that isn't implemented yet. label Sep 16, 2020

StellaAthena added this to To do in Implementing Evaluations via automation Sep 16, 2020

leogao2 moved this from To do to In progress in Implementing Evaluations Oct 5, 2020

leogao2 moved this from In progress to Data integrated, Eval not done in Implementing Evaluations Oct 6, 2020

StellaAthena added Eval Set and removed feature request A feature that isn't implemented yet. labels Oct 23, 2020

StellaAthena closed this as completed Oct 23, 2020

StellaAthena reopened this Jan 5, 2021

StellaAthena added feature request A feature that isn't implemented yet. good first issue Good for newcomers labels Jan 5, 2021

StellaAthena removed the good first issue Good for newcomers label Jan 21, 2021

leogao2 moved this from In Progress to To do in Implementing Evaluations Jan 28, 2021

leogao2 changed the title ~~Implement the Adversarial Natural Language Inference (ANIL) evaluation~~ Implement the Adversarial Natural Language Inference (ANLI) evaluation Jan 28, 2021

jon-tow mentioned this issue Jan 30, 2021

Implement ANLI evaluations #106

Merged

leogao2 moved this from To do to In Progress in Implementing Evaluations Jan 30, 2021

leogao2 moved this from In Progress to Done in Implementing Evaluations Jan 30, 2021

leogao2 closed this as completed Jan 30, 2021

leogao2 assigned jon-tow Feb 3, 2021

StellaAthena pushed a commit to asas-lab/lm-evaluation-harness that referenced this issue May 29, 2022

Merge pull request EleutherAI#24 from kasnerz/kasnerz/schema_guided_d…

9d825e3

…stc8 Add the Schema Guided Dialogue (DSTC8) - Response generation

lintangsutawika pushed a commit that referenced this issue Jul 8, 2024

Merge pull request #24 from JessicaOjo/africamgsm

21fbd5b

xnli changes for open models

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement the Adversarial Natural Language Inference (ANLI) evaluation #24

Implement the Adversarial Natural Language Inference (ANLI) evaluation #24

StellaAthena commented Sep 16, 2020 •

edited

Loading

cfoster0 commented Oct 1, 2020

Implement the Adversarial Natural Language Inference (ANLI) evaluation #24

Implement the Adversarial Natural Language Inference (ANLI) evaluation #24

Comments

StellaAthena commented Sep 16, 2020 • edited Loading

cfoster0 commented Oct 1, 2020

StellaAthena commented Sep 16, 2020 •

edited

Loading