Name		Name	Last commit message	Last commit date
parent directory ..
conf		conf
local		local
README.md		README.md
RESULTS		RESULTS
RESULTS.txt		RESULTS.txt
cmd.sh		cmd.sh
path.sh		path.sh
run.sh		run.sh
steps		steps
utils		utils

README.md

Vosk API Training

This directory contains scripts and tools for training speech recognition models using the Kaldi toolkit.

Overview

This repository provides tools for training custom speech recognition models using Kaldi. It supports acoustic model training, language model creation, and decoding pipelines.

Directory Structure

.
├── cmd.sh                         # Command configuration for training and decoding
├── conf/
│   ├── mfcc.conf                  # Configuration for MFCC feature extraction
│   └── online_cmvn.conf           # Online Cepstral Mean Variance Normalization (currently empty)
├── local/
│   ├── chain/
│   │   ├── run_ivector_common.sh  # Script for i-vector extraction during chain model training
│   │   └── run_tdnn.sh            # Script for training a TDNN model
│   ├── data_prep.sh               # Data preparation script for creating Kaldi data directories
│   ├── download_and_untar.sh      # Script for downloading and extracting datasets
│   ├── download_lm.sh             # Downloads language models
│   ├── prepare_dict.sh            # Prepares the pronunciation dictionary
│   └── score.sh                   # Scoring script for evaluation
├── path.sh                        # Script for setting Kaldi paths
├── RESULTS                        # Script for printing the best WER results
├── RESULTS.txt                    # Contains WER results from decoding
├── run.sh                         # Main script for the entire training pipeline
├── steps -> ../../wsj/s5/steps/   # Link to Kaldi’s WSJ steps for acoustic model training
└── utils -> ../../wsj/s5/utils/   # Link to Kaldi’s utility scripts

Key Files:

cmd.sh: Defines commands for running training and decoding tasks.
path.sh: Sets up paths for Kaldi binaries and scripts.
run.sh: Main entry point for the training pipeline, running tasks in stages.
RESULTS: Displays Word Error Rate (WER) for the trained models.

Installation

Prerequisites

Kaldi: Kaldi toolkit must be installed and configured.
Required tools: ffmpeg, sox, sctk for data preparation and scoring.

Steps

Clone the Vosk API repository.
Install Kaldi and ensure the KALDI_ROOT is correctly set in path.sh.
Set environment variables using cmd.sh and path.sh.

Training Process

Data Preparation

Run the data preparation stage in run.sh:

bash run.sh --stage 0 --stop_stage 0

This stage downloads and prepares the LibriSpeech dataset.

Dictionary Preparation

Prepare the pronunciation dictionary with:

bash run.sh --stage 1 --stop_stage 1

This step generates the necessary files for Kaldi's prepare_lang.sh script.

MFCC Feature Extraction

Run the MFCC extraction process:

bash run.sh --stage 2 --stop_stage 2

This step extracts Mel-frequency cepstral coefficients (MFCC) features and computes Cepstral Mean Variance Normalization (CMVN).

Acoustic Model Training

Train monophone, LDA+MLLT, and SAT models:

bash run.sh --stage 3 --stop_stage 3

This stage trains GMM-based models and aligns the data for TDNN training.

TDNN Chain Model Training

Train a Time-Delay Neural Network (TDNN) chain model:

bash run.sh --stage 4 --stop_stage 4

The chain model uses i-vectors for speaker adaptation.

Decoding

After training, decode the test data:

bash run.sh --stage 5 --stop_stage 5

This step decodes using the trained model and evaluates the Word Error Rate (WER).

Results

WER can be evaluated by running:

bash RESULTS

Example of RESULTS.txt:

%WER 14.10 [ 2839 / 20138, 214 ins, 487 del, 2138 sub ] exp/chain/tdnn/decode_test/wer_11_0.0
%WER 12.67 [ 2552 / 20138, 215 ins, 406 del, 1931 sub ] exp/chain/tdnn/decode_test_rescore/wer_11_0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training

training

README.md

Vosk API Training

Table of Contents

Overview

Directory Structure

Key Files:

Installation

Prerequisites

Steps

Training Process

Data Preparation

Dictionary Preparation

MFCC Feature Extraction

Acoustic Model Training

TDNN Chain Model Training

Decoding

Results

Files

training

Directory actions

More options

Directory actions

More options

Latest commit

History

training

Folders and files

parent directory

README.md

Vosk API Training

Table of Contents

Overview

Directory Structure

Key Files:

Installation

Prerequisites

Steps

Training Process

Data Preparation

Dictionary Preparation

MFCC Feature Extraction

Acoustic Model Training

TDNN Chain Model Training

Decoding

Results