Release 22.10 (Hydref / October 2022) · techiaith/docker-huggingface-stt-cy

Read this release note in English

Dyma ein modelau a sgriptiau ym mis Hydref 2022 (22.10) ar gyfer adnabod lleferydd Cymraeg effeithiol ar sail y ddull wav2vec2. Yn newydd yn y cyhoeddiad yma o'r gwaith yw:

sgriptiau cychwynnol i rhag-hyfforddi modelau gyda rhagor o sain leferydd Cymraeg ac yna i'w fireinio ar gyfer wireddu adnabod lleferydd Cymraeg gwell
o ganlyniad, model arbrofol newydd ('wav2vec2-base-cy') sydd wedi ei rhag-hyfforddi gyda dros 180 awr o leferydd Cymraeg o amrywiaeth o fideos YouTube.
modd i hyfforddi gydag is-setiau ein hunain a fwy defnyddiol o fersiwn 11 o Common Voice Cymraeg a Saesneg (gweler https://github.com/techiaith/docker-commonvoice-custom-splits-builder) a chyhoeddwyd ym mis Medi 2022.
o ganlyniad, model acwstig adnabod lleferydd dwyieithog Cymraeg a Saesneg newydd ('wav2vec2-xlsr-ft-en-cy') gyda WER o 17.07% ar set profi ddilys o Common Voice
a model adnabod lleferydd Cymraeg ('wav2vec2-xlsr-ft-cy') lawer mwy effeithiol a chywir gyda gostyngiad o 67% yn y WER o 12.38% i 4.05% ar gyfer adnabod lleferydd Cymraeg yn unig ar set profi ddilys o Common Voice.
seilwaith gweinydd API trawsgrifio newydd gyda'r modd i gysylltu ag API sy'n atgyweirio atalnodi a chyfalafu mewn testunau Cymraeg (gweler https://github.com/techiaith/docker-atalnodi-server)

D.S. er bod y WER wedi gwella i 4.05% ar set brofi o Common Voice bellach, ond promptiau wedi eu darllen yn bwyllog sydd yn y set brofi honno. Gyda sgyrsiau naturiol, digymell, mae’r WER yn agosach at 30% ac angen rhagor o waith hyfforddi a gwerthuso.

Ceir ffeiliau modelau ar wefan HuggingFace:

in English

These are our models and scripts in October 2022 (22.10) for effective Welsh speech recognition based on wav2vec2. New in this release of the work are:

initial scripts to pre-train models with more Welsh speech audio and then to fine-tune to experiment with improving Welsh speech recognition results.
as a result, a new experimental model ('wav2vec2-base-cy') which has been pre-trained with over 180 hours of Welsh speech collected from a variety of videos on YouTube.
a means to train with our own custom splits of version 11 of Common Voice Welsh and English (see https://github.com/techiaith/docker-commonvoice-custom-splits-builder) published in September 2022 .
as a result, a new Welsh and English bilingual speech recognition acoustic model ('wav2vec2-xlsr-ft-en-cy') with a WER of 17.07% when evaluated on a test set from Common Voice.
and a much more accurate speech recognition model ('wav2vec2-xlsr-ft-cy'), with a 67% reduction in the WER from 12.38% to 4.05%, for Welsh only when evaluated with a test set from Common Voice.
new transcription API server infrastructure with supports connecting to an API that can restore punctuation and capitalization in Welsh texts (see https://github.com/techiaith/docker-atalnodi-server)

N.B. although the WER has now improved to 4.05% on a test set from Common Voice, this test set contains prompts that have been read carefully and calmly. With natural, spontaneous or conversational speech, the WER is believed to be closer to 30% and thus needs more training and evaluation.

Model files can be found on the HuggingFace website:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

22.10 (Hydref / October 2022)

in English