Skip to content

21.08 (Awst / August 2021)

Compare
Choose a tag to compare
@DewiBrynJones DewiBrynJones released this 26 Aug 15:04
· 14 commits to main since this release

Read this release note in English

Dyma ein sgriptiau ym mis Awst 2021 (21.08) ar gyfer hyfforddi, gwerthuso, defnyddio a chynnal API adnabod lleferydd Cymraeg eich hunain ar sail wav2vec2 gan Facebook AI ac HuggingFace, a KenLM gan Kenneth Heafield ac eraill.

Rydym hefyd yn cyhoeddi modelau sydd wedi'u hyfforddi gyda data Mozilla CommonVoice Cymraeg fersiwn 7, a chyhoeddwyd ym mis Gorffennaf 2021, a data corpws testunau Cymraeg OSCAR o fis Awst 2021.

Ceir ffeiliau modelau ar wefan HuggingFace: https://huggingface.co/techiaith/wav2vec2-xlsr-ft-cy/tree/21.08

Mewn arbrofion syml, pan ddefnyddir y model acwsteg ac iaith gyda'i gilydd, mae'r adnabod lleferydd o ganlyniad yn cam-adnabod tua 14% o eiriau mewn brawddeg.


in English

Here are our August 2021 (21.08) scripts for training, evaluating, using and hosting your own Welsh speech recognition models based on wav2vec2 by Facebook AI and HuggingFace, and KenLM by Kenneth Heafield and others.

This release also contains models trained with the Welsh dataset from Mozilla CommonVoice version 7 as published in July 2021 and the Welsh text corpus dataset from OSCAR from August 2021.

Models can be found on the HuggingFace website: https://huggingface.co/techiaith/wav2vec2-xlsr-ft-cy/tree/21.08

In simple evaluations on the Welsh Common Voice test set, the models, when used together in inference, exhibit a word error rate of 14%.