Skip to content

parmarjh/Indian-Accent-Speech-Recognition

Repository files navigation

Indian Accent Speech Recognition

Traditional ASR (Signal Analysis, MFCC, DTW, HMM & Language Modelling) and DNNs (Custom Models & Baidu DeepSpeech Model) on Indian Accent Speech

<< Uploaded the pre-trained model owing to requests >>
The generated trie file is uploaded to pre-trained-models directory. So you can skip the KenLM Toolkit step.

To understand the context, theory and explanation of this project, head over to my blog:
https://towardsdatascience.com/indian-accent-speech-recognition-2d433eb7edac

How to Use?

A starter Code to use the model is given in the file: Starter.ipynb. You can run it in your Google Colab, if you upload the 3 files (given in params) to your google drive.

  • Install DeepSpeech 0.6.1
  • Download the pre-trained model (.pbmm), language model and trie file.
  • Download instructions are given in pre-trained-models folder. After download give them as arguments.
!deepspeech --model speech/output_graph.pbmm --lm speech/lm.binary --trie speech/trie --audio /content/06_M_artic_01_004.wav

If you run into issue while loading the pre-trained model, then it is mostly due to your deepspeech version.

Contents:

  • vui_notebook.ipynb: DNN Custom Models and Comparative Analysis to make a custom Speech Recognition model.
  • DeepSpeech_Training.ipynb: Retraining of DeepSpeech Model with Indian Accent Voice Data.
  • Training_Instructions.docx: Instructions to train DeepSpeech model.

Data Source/ Training Data:

Indic TTS Project: Downloaded 50+ GB of Indic TTS voice DB from Speech and Music Technology Lab, IIT Madras, which comprises of 10000+ spoken sentences from 20+ states (both Male and Female native speakers)

https://www.iitm.ac.in/donlab/tts/index.php

You can also record your own audio or let the ebook reader apps read a document. But I found it is insufficient to train such a heavy model. Then I requested support of IIT Madras, Speech Lab who kindly granted access to their Voice database.

DNN Custom Models for Speech Recognition: