Traditional ASR (Signal Analysis, MFCC, DTW, HMM & Language Modelling) and DNNs (Custom Models & Baidu DeepSpeech Model) on Indian Accent Speech
<< Uploaded the pre-trained model owing to requests >>
The generated trie file is uploaded to pre-trained-models directory. So you can skip the KenLM Toolkit step.
To understand the context, theory and explanation of this project, head over to my blog:
https://towardsdatascience.com/indian-accent-speech-recognition-2d433eb7edac
A starter Code to use the model is given in the file: Starter.ipynb. You can run it in your Google Colab, if you upload the 3 files (given in params) to your google drive.
- Install DeepSpeech 0.6.1
- Download the pre-trained model (.pbmm), language model and trie file.
- Download instructions are given in pre-trained-models folder. After download give them as arguments.
!deepspeech --model speech/output_graph.pbmm --lm speech/lm.binary --trie speech/trie --audio /content/06_M_artic_01_004.wav
If you run into issue while loading the pre-trained model, then it is mostly due to your deepspeech version.
- vui_notebook.ipynb: DNN Custom Models and Comparative Analysis to make a custom Speech Recognition model.
- DeepSpeech_Training.ipynb: Retraining of DeepSpeech Model with Indian Accent Voice Data.
- Training_Instructions.docx: Instructions to train DeepSpeech model.
Indic TTS Project: Downloaded 50+ GB of Indic TTS voice DB from Speech and Music Technology Lab, IIT Madras, which comprises of 10000+ spoken sentences from 20+ states (both Male and Female native speakers)
https://www.iitm.ac.in/donlab/tts/index.php
You can also record your own audio or let the ebook reader apps read a document. But I found it is insufficient to train such a heavy model. Then I requested support of IIT Madras, Speech Lab who kindly granted access to their Voice database.