CommonsenseQA 2.0: Exposing the limits of AI through Gamification

CommonsenseQA is a yes/no question answering challange set which was collected using a game called "Teach-Your-AI"

At a high-level, a player is asked to author a yes/no question, is then shown the answer from the AI, and then marks whether the AI was correct or not. The goal of the player is to earn points, which are used as a flexible vehicle for steering the behaviour of the player. First, points are given for beating the AI, that is, authoring questions where the AI is incorrect. This incentivizes the player to ask difficult questions, conditioned on its understanding of the AI capabilities. Second, the player gets points for using particular phrases in the question. This provides the game designer control to skew the distribution of questions towards topics or other phenomena they are interested in. Last, questions are validated by humans, and points are deducted for questions that do not pass validation. This pushes players to author questions with broad agreement among people.

For more details check out our NeurIPS-21 benchmark submission "CommonsenseQA 2.0: Exposing the Limits of AI through Gamification", and website.

Changelog

07/06/2021 Version 2.01 is out.

CommonsenseQA 2.0 Dataset

In the dataset contains all dataset files:

CSQA2_train.jsonl.gz - all training examples
CSQA2_dev.jsonl.gz - all development set examples
CSQA2_test_no_answers.jsonl.gz - all test set examples without answer or validations

Dataset Format

The dataset is provided in jsonl format such that each line is a single example with the following format.

{
 "id": "Unique identifier for the example (5454c14ad01e722c2619b66778daa98b)",
 "question": "Natural language question or assertion to which the answer is yes or no (for assertions: yes is considered true, and no is considered false)",
 "answer": [
  "answer1",
  "answer2"
 ],
 "confidence": "A number between 0 and 1.0 related to the quality of the question as produced by the Automatic question verification model (see section 2.2 in the main paper)",
 "relational_prompt": "The relational prompt as displayed to the player (see section 2.1 in the main paper for details)",
 "relational_prompt_used": "True/False, indicates whether the composing player has chosen to use the relational prompt",
 "topic_prompt": "The topic prompt as displayed to the player (see section 2.1 in the main paper for details)",
 "topic_prompt_used": "True/False, indicates whether the composing player has chosen to use the topic prompt",
 "validations" : ["yes", "no", "bad question", "sensitive", "(A list of player validations for the question that can take the values)"] 
}

Supplementary Material

The supplementary material document provides additional information about the Dataset, the Data Collection through Gamification, Dataset Analysis, the Experimental Evaluation, and the Model Analysis. The supplementary material zipfile contains GPT3 predictions as well as contrast set and qualitative reasoning skills information.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
baselines		baselines
dataset		dataset
figures		figures
supplementary_material		supplementary_material
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
davinci_expt.sh		davinci_expt.sh
davinci_instruct_expt.sh		davinci_instruct_expt.sh
index.md		index.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CommonsenseQA 2.0: Exposing the limits of AI through Gamification

Changelog

CommonsenseQA 2.0 Dataset

Dataset Format

Supplementary Material

About

Releases

Packages

Contributors 3

Languages

License

allenai/csqa2

Folders and files

Latest commit

History

Repository files navigation

CommonsenseQA 2.0: Exposing the limits of AI through Gamification

Changelog

CommonsenseQA 2.0 Dataset

Dataset Format

Supplementary Material

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages