DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Speech research paper. Project DeepSpeech uses Google's TensorFlow to make the implementation easier.
Documentation for installation, usage, and training models are available on deepspeech.readthedocs.io.
For the Quran Workflow and model release, see the folder data/quran. To skip the importer script and directly download the dataset, goto the archive
Thanks to Omer Asif , a nice ipynb is shared on colab. Feel free to tune, reproduce our work and reshare.
As the workflow clarifies, the engine is created in two steps:
- Step-1: Imam Only dataset :
WER: 0.056551, CER: 0.039540, loss: 24.844383
- Step-2: Imam + Filtered Users dataset :
WER: 0.099118, CER: 0.065586, loss: 39.312599