Not Enough Data? Deep Learning to the Rescue!

Anaby-Tavor, Ateret; Carmeli, Boaz; Goldbraich, Esther; Kantor, Amir; Kour, George; Shlomov, Segev; Tepper, Naama; Zwerdling, Naama

Computer Science > Computation and Language

arXiv:1911.03118 (cs)

[Submitted on 8 Nov 2019 (v1), last revised 27 Nov 2019 (this version, v2)]

Title:Not Enough Data? Deep Learning to the Rescue!

Authors:Ateret Anaby-Tavor, Boaz Carmeli, Esther Goldbraich, Amir Kantor, George Kour, Segev Shlomov, Naama Tepper, Naama Zwerdling

View PDF

Abstract:Based on recent advances in natural language modeling and those in text generation capabilities, we propose a novel data augmentation method for text classification tasks. We use a powerful pre-trained neural network model to artificially synthesize new labeled data for supervised learning. We mainly focus on cases with scarce labeled data. Our method, referred to as language-model-based data augmentation (LAMBADA), involves fine-tuning a state-of-the-art language generator to a specific task through an initial training phase on the existing (usually small) labeled data. Using the fine-tuned model and given a class label, new sentences for the class are generated. Our process then filters these new sentences by using a classifier trained on the original data. In a series of experiments, we show that LAMBADA improves classifiers' performance on a variety of datasets. Moreover, LAMBADA significantly improves upon the state-of-the-art techniques for data augmentation, specifically those applicable to text classification tasks with little data.

Comments:	20 pages
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:1911.03118 [cs.CL]
	(or arXiv:1911.03118v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.1911.03118

Submission history

From: Segev Shlomov [view email]
[v1] Fri, 8 Nov 2019 08:30:22 UTC (339 KB)
[v2] Wed, 27 Nov 2019 12:15:52 UTC (133 KB)

Computer Science > Computation and Language

Title:Not Enough Data? Deep Learning to the Rescue!

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Not Enough Data? Deep Learning to the Rescue!

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators