Skip to content

georgepar/kaldi-grpc-server

Repository files navigation

Kaldi gRPC Server

This is a modern alternative for deploying Speech Recognition models developed using Kaldi.

Features:

  • Standardized API. We use a modified version of Jarvis proto files, which mimic the Google speech API. This allows for easy switching between Gloud speech recognizers and custom models developed with Kaldi
  • Fully pythonic implementation. We utilize pykaldi bindings to interface with Kaldi programmatically. This allows for a clean, customizable and extendable implementation
  • Fully bidirectional streaming using HTTP/2 (gRPC). Binary speech segments are streamed to the server and partial hypotheses are streamed back to the client
  • Transcribe arbitrarily long speech
  • DNN-HMM models supported out of the box
  • Supports RNNLM lattice rescoring
  • Clients for other languages can be easily generated using the proto files

Getting started

Kaldi model structure

We recommend the following structure for the deployed model

model
├── conf
│   ├── ivector_extractor.conf
│   ├── mfcc.conf
│   ├── online_cmvn.conf
│   ├── online.conf
│   └── splice.conf
├── final.mdl
├── global_cmvn.stats
├── HCLG.fst
├── ivector_extractor
│   ├── final.dubm
│   ├── final.ie
│   ├── final.mat
│   ├── global_cmvn.stats
│   ├── online_cmvn.conf
│   ├── online_cmvn_iextractor
│   └── splice_opts
└── words.txt

The key files / directories are:

  • conf: Configuration files that are used to train the model
  • final.mdl: The acoustic model
  • HCLG.fst: The composed HCLG graph (output of mkgraph.sh)
  • global_cmvn.stats: Mean and std used for CMVN normalization
  • words.txt: Vocabulary file, mapping words to integers
  • ivector_extractor: Model trained to extract ivector features (used for tdnn / chain models)

Build Binary ASR recognizer (singularity)

We provide the option to build a (for all intents and purposes) binary file using the kaldi bindings through singularity containers. In short, singularity containers build a fakeroot filesystem into a single, executable file. For more info check the documentation.

Instructions:

  • Install singularity on your machine. Instructions here
  • Build the container
    make build-singularity kaldi_model=$MY_MODEL_DIR image_tag=myasr
    # Do not include special characters like : in the image_tag argument because this will be the path to the container file
    
  • Run the container with
    ./containers/myasr.sif --beam=11 --streaming --wav=$MYTEST.wav
    
  • For more options run
    ./containers/myasr.sif --help
    

Note: You can also use the command make build-flex-singularity so that the singularity container does not include / expect the model at build time, in order to build a more flexible container that can run any local model. Then you can do something like

./containers/asr.sif --model_dir=$MY_LOCAL_MODEL --wav=$MYTEST.wav
./containers/asr.sif --model_dir=$MY_OTHER_LOCAL_MODEL --wav=$MYTEST.wav

Dockerized server deployment

Once you create this model structure, you can use the provided Dockerfile to build the server container. Run:

make build-server kaldi_model=$MY_MODEL_DIR image_tag=$CONTAINER_TAG
# example: make build-server kaldi_model=/models/kaldi/english_model image_tag=kaldigrpc:en-latest

And you can run the container

# Run your container for maximum 3 simultaneous clients on port 1234
make run-server image_tag=kaldigrpc:en-latest max_workers=3 server_port=1234

Client usage

Install client library:

pip install kaldigrpc-client

Run client from command line:

kaldigrpc-transcribe --streaming --host localhost --port 50051 mytest.wav

For more infomation refer to client/README.md

RNNLM Rescoring

TODO: Write documentation

Roadmap

  • Add support for mixed kaldi and pytorch acoustic / language models
  • Add full support for pause detection (interim results)
  • Add load balancer / benchmarks
  • Streamined conversion scripts from exp folder to model tarball
  • Support all Speech API configuration options

About

Deploy Kaldi models using grpc for bidirectional streaming.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published