Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Wang, Boxin; Ping, Wei; Xiao, Chaowei; Xu, Peng; Patwary, Mostofa; Shoeybi, Mohammad; Li, Bo; Anandkumar, Anima; Catanzaro, Bryan

Computer Science > Computation and Language

arXiv:2202.04173 (cs)

[Submitted on 8 Feb 2022 (v1), last revised 21 Oct 2022 (this version, v3)]

Title:Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Authors:Boxin Wang, Wei Ping, Chaowei Xiao, Peng Xu, Mostofa Patwary, Mohammad Shoeybi, Bo Li, Anima Anandkumar, Bryan Catanzaro

View PDF

Abstract:Pre-trained language models (LMs) are shown to easily generate toxic language. In this work, we systematically explore domain-adaptive training to reduce the toxicity of language models. We conduct this study on three dimensions: training corpus, model size, and parameter efficiency. For the training corpus, we propose to leverage the generative power of LMs and generate nontoxic datasets for domain-adaptive training, which mitigates the exposure bias and is shown to be more data-efficient than using a curated pre-training corpus. We demonstrate that the self-generation method consistently outperforms the existing baselines across various model sizes on both automatic and human evaluations, even when it uses a 1/3 smaller training corpus. We then comprehensively study detoxifying LMs with parameter sizes ranging from 126M up to 530B (3x larger than GPT-3), a scale that has never been studied before. We find that i) large LMs have similar toxicity levels as smaller ones given the same pre-training corpus, and ii) large LMs require more endeavor to detoxify. We also explore parameter-efficient training methods for detoxification. We demonstrate that adding and training adapter-only layers in LMs not only saves a lot of parameters but also achieves a better trade-off between toxicity and perplexity than whole model adaptation for the large-scale models.

Comments:	NeurIPS 2022
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Cite as:	arXiv:2202.04173 [cs.CL]
	(or arXiv:2202.04173v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2202.04173

Submission history

From: Wei Ping [view email]
[v1] Tue, 8 Feb 2022 22:10:40 UTC (374 KB)
[v2] Fri, 7 Oct 2022 06:32:54 UTC (367 KB)
[v3] Fri, 21 Oct 2022 23:01:28 UTC (368 KB)

Computer Science > Computation and Language

Title:Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Exploring the Limits of Domain-Adaptive Training for Detoxifying Large-Scale Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators