R-Drop: Regularized Dropout for Neural Networks

Liang, Xiaobo; Wu, Lijun; Li, Juntao; Wang, Yue; Meng, Qi; Qin, Tao; Chen, Wei; Zhang, Min; Liu, Tie-Yan

Computer Science > Machine Learning

arXiv:2106.14448 (cs)

[Submitted on 28 Jun 2021 (v1), last revised 29 Oct 2021 (this version, v2)]

Title:R-Drop: Regularized Dropout for Neural Networks

Authors:Xiaobo Liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, Tie-Yan Liu

View PDF

Abstract:Dropout is a powerful and widely used technique to regularize the training of deep neural networks. In this paper, we introduce a simple regularization strategy upon dropout in model training, namely R-Drop, which forces the output distributions of different sub models generated by dropout to be consistent with each other. Specifically, for each training sample, R-Drop minimizes the bidirectional KL-divergence between the output distributions of two sub models sampled by dropout. Theoretical analysis reveals that R-Drop reduces the freedom of the model parameters and complements dropout. Experiments on $\bf{5}$ widely used deep learning tasks ($\bf{18}$ datasets in total), including neural machine translation, abstractive summarization, language understanding, language modeling, and image classification, show that R-Drop is universally effective. In particular, it yields substantial improvements when applied to fine-tune large-scale pre-trained models, e.g., ViT, RoBERTa-large, and BART, and achieves state-of-the-art (SOTA) performances with the vanilla Transformer model on WMT14 English$\to$German translation ($\bf{30.91}$ BLEU) and WMT14 English$\to$French translation ($\bf{43.95}$ BLEU), even surpassing models trained with extra large-scale data and expert-designed advanced variants of Transformer models. Our code is available at GitHub{\url{this https URL}}.

Comments:	Accepted by NeurIPS 2021
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2106.14448 [cs.LG]
	(or arXiv:2106.14448v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2106.14448

Submission history

From: Lijun Wu [view email]
[v1] Mon, 28 Jun 2021 08:01:26 UTC (788 KB)
[v2] Fri, 29 Oct 2021 13:02:31 UTC (287 KB)

Computer Science > Machine Learning

Title:R-Drop: Regularized Dropout for Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:R-Drop: Regularized Dropout for Neural Networks

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators