Unsupervised Cross-lingual Representation Learning for Speech Recognition

Conneau, Alexis; Baevski, Alexei; Collobert, Ronan; Mohamed, Abdelrahman; Auli, Michael

Computer Science > Computation and Language

arXiv:2006.13979 (cs)

[Submitted on 24 Jun 2020 (v1), last revised 15 Dec 2020 (this version, v2)]

Title:Unsupervised Cross-lingual Representation Learning for Speech Recognition

Authors:Alexis Conneau, Alexei Baevski, Ronan Collobert, Abdelrahman Mohamed, Michael Auli

View PDF

Abstract:This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages. We build on wav2vec 2.0 which is trained by solving a contrastive task over masked latent speech representations and jointly learns a quantization of the latents shared across languages. The resulting model is fine-tuned on labeled data and experiments show that cross-lingual pretraining significantly outperforms monolingual pretraining. On the CommonVoice benchmark, XLSR shows a relative phoneme error rate reduction of 72% compared to the best known results. On BABEL, our approach improves word error rate by 16% relative compared to a comparable system. Our approach enables a single multilingual speech recognition model which is competitive to strong individual models. Analysis shows that the latent discrete speech representations are shared across languages with increased sharing for related languages. We hope to catalyze research in low-resource speech understanding by releasing XLSR-53, a large model pretrained in 53 languages.

Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2006.13979 [cs.CL]
	(or arXiv:2006.13979v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2006.13979

Submission history

From: Alexis Conneau [view email]
[v1] Wed, 24 Jun 2020 18:25:05 UTC (282 KB)
[v2] Tue, 15 Dec 2020 23:19:19 UTC (660 KB)

Computer Science > Computation and Language

Title:Unsupervised Cross-lingual Representation Learning for Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Unsupervised Cross-lingual Representation Learning for Speech Recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators