A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Boito, Marcely Zanon; Besacier, Laurent; Tomashenko, Natalia; Estève, Yannick

Computer Science > Computation and Language

arXiv:2204.01397 (cs)

[Submitted on 4 Apr 2022 (v1), last revised 5 Jul 2022 (this version, v2)]

Title:A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Authors:Marcely Zanon Boito, Laurent Besacier, Natalia Tomashenko, Yannick Estève

View PDF

Abstract:Self-supervised models for speech processing emerged recently as popular foundation blocks in speech processing pipelines. These models are pre-trained on unlabeled audio data and then used in speech processing downstream tasks such as automatic speech recognition (ASR) or speech translation (ST). Since these models are now used in research and industrial systems alike, it becomes necessary to understand the impact caused by some features such as gender distribution within pre-training data. Using French as our investigation language, we train and compare gender-specific wav2vec 2.0 models against models containing different degrees of gender balance in their pre-training data. The comparison is performed by applying these models to two speech-to-text downstream tasks: ASR and ST. Results show the type of downstream integration matters. We observe lower overall performance using gender-specific pre-training before fine-tuning an end-to-end ASR system. However, when self-supervised models are used as feature extractors, the overall ASR and ST results follow more complex patterns in which the balanced pre-trained model does not necessarily lead to the best results. Lastly, our crude 'fairness' metric, the relative performance difference measured between female and male test sets, does not display a strong variation from balanced to gender-specific pre-trained wav2vec 2.0 models.

Comments:	Accepted to INTERSPEECH 2022 (Special session Inclusive and Fair Speech Technologies)
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2204.01397 [cs.CL]
	(or arXiv:2204.01397v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2204.01397

Submission history

From: Marcely Zanon Boito [view email]
[v1] Mon, 4 Apr 2022 11:28:19 UTC (40 KB)
[v2] Tue, 5 Jul 2022 11:20:47 UTC (39 KB)

Computer Science > Computation and Language

Title:A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:A Study of Gender Impact in Self-supervised Models for Speech-to-Text Systems

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators