Tailor: Generating and Perturbing Text with Semantic Controls

Ross, Alexis; Wu, Tongshuang; Peng, Hao; Peters, Matthew E.; Gardner, Matt

Computer Science > Computation and Language

arXiv:2107.07150 (cs)

[Submitted on 15 Jul 2021 (v1), last revised 17 Mar 2022 (this version, v2)]

Title:Tailor: Generating and Perturbing Text with Semantic Controls

Authors:Alexis Ross, Tongshuang Wu, Hao Peng, Matthew E. Peters, Matt Gardner

View PDF

Abstract:Controlled text perturbation is useful for evaluating and improving model generalizability. However, current techniques rely on training a model for every target perturbation, which is expensive and hard to generalize. We present Tailor, a semantically-controlled text generation system. Tailor builds on a pretrained seq2seq model and produces textual outputs conditioned on control codes derived from semantic representations. We craft a set of operations to modify the control codes, which in turn steer generation towards targeted attributes. These operations can be further composed into higher-level ones, allowing for flexible perturbation strategies. We demonstrate the effectiveness of these perturbations in multiple applications. First, we use Tailor to automatically create high-quality contrast sets for four distinct natural language processing (NLP) tasks. These contrast sets contain fewer spurious artifacts and are complementary to manually annotated ones in their lexical diversity. Second, we show that Tailor perturbations can improve model generalization through data augmentation. Perturbing just 2% of training data leads to a 5.8-point gain on an NLI challenge set measuring reliance on syntactic heuristics.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2107.07150 [cs.CL]
	(or arXiv:2107.07150v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2107.07150

Submission history

From: Tongshuang Wu [view email]
[v1] Thu, 15 Jul 2021 06:38:59 UTC (2,827 KB)
[v2] Thu, 17 Mar 2022 20:02:12 UTC (2,752 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2021-07

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Alexis Ross
Tongshuang Wu
Hao Peng
Matthew E. Peters
Matt Gardner

export BibTeX citation

Computer Science > Computation and Language

Title:Tailor: Generating and Perturbing Text with Semantic Controls

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Tailor: Generating and Perturbing Text with Semantic Controls

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators