UnifiedQA: Crossing Format Boundaries With a Single QA System

Khashabi, Daniel; Min, Sewon; Khot, Tushar; Sabharwal, Ashish; Tafjord, Oyvind; Clark, Peter; Hajishirzi, Hannaneh

Computer Science > Computation and Language

arXiv:2005.00700 (cs)

[Submitted on 2 May 2020 (v1), last revised 7 Oct 2020 (this version, v3)]

Title:UnifiedQA: Crossing Format Boundaries With a Single QA System

Authors:Daniel Khashabi, Sewon Min, Tushar Khot, Ashish Sabharwal, Oyvind Tafjord, Peter Clark, Hannaneh Hajishirzi

View PDF

Abstract:Question answering (QA) tasks have been posed using a variety of formats, such as extractive span selection, multiple choice, etc. This has led to format-specialized models, and even to an implicit division in the QA community. We argue that such boundaries are artificial and perhaps unnecessary, given the reasoning abilities we seek to teach are not governed by the format. As evidence, we use the latest advances in language modeling to build a single pre-trained QA model, UnifiedQA, that performs surprisingly well across 17 QA datasets spanning 4 diverse formats. UnifiedQA performs on par with 9 different models that were trained on individual datasets themselves. Even when faced with 12 unseen datasets of observed formats, UnifiedQA performs surprisingly well, showing strong generalization from its out-of-format training data. Finally, simply fine-tuning this pre-trained QA model into specialized models results in a new state of the art on 6 datasets, establishing UnifiedQA as a strong starting point for building QA systems.

Comments:	EMNLP 2020 (Findings)
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2005.00700 [cs.CL]
	(or arXiv:2005.00700v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2005.00700

Submission history

From: Daniel Khashabi Mr. [view email]
[v1] Sat, 2 May 2020 04:42:14 UTC (941 KB)
[v2] Tue, 6 Oct 2020 07:46:48 UTC (3,922 KB)
[v3] Wed, 7 Oct 2020 03:12:45 UTC (7,232 KB)

Computer Science > Computation and Language

Title:UnifiedQA: Crossing Format Boundaries With a Single QA System

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:UnifiedQA: Crossing Format Boundaries With a Single QA System

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators