GitHub - lovemefan/fsmn-vad: A enterprise-grade Voice Activity Detector from modelscope and funasr.

FSMN VAD

A enterprise-grade Voice Activity Detector from modelscope and funasr.

Key Features

Fast

One audio (70s) less than 0.6s to be processed on mac M1 pro . Under the ONNX runtime, the RTF can be accelerated to 0.0077
Lightweight

Do not need to download the model, the model is loaded from the memory directly. and the onnx model size is only 1.6M. Do not need pytorch, torchaudio, etc. dependencies.
General

fsmn VAD was trained on chinese corpora. and it finished anti-noise training, with certain noise rejection ability performs well on audios from different domains with various background noise and quality levels.
- file vad
- streaming vad
Flexible sampling rate
- 16k
- 8k
Highly Accurate

coming ...
Highly Portable

fsmn VAD reaps benefits from the rich ecosystems built around ONNX running everywhere where these runtimes are available.

Installation

git clone https://github.com/lovemefan/fsmn-vad
cd fsmn-vad
python setup.py install

Usage

from fsmnvad import FSMNVad
from pathlib import Path
vad = FSMNVad()
segments = vad.segments_offline(Path("/path/audio/vad_example.wav"))
print(segments)

from fsmnvad import FSMNVadOnline
from fsmnvad import AudioReader
in_cache = []
speech, sample_rate = AudioReader.read_wav_file('/path/audio/vad_example.wav')
speech_length = speech.shape[0]

sample_offset = 0
step = 1600
vad_online = FSMNVadOnline()

for sample_offset in range(0, speech_length, min(step, speech_length - sample_offset)):
    if sample_offset + step >= speech_length - 1:
        step = speech_length - sample_offset
        is_final = True
    else:
        is_final = False
    segments_result, in_cache = vad_online.segments_online(
        speech[sample_offset: sample_offset + step],
        in_cache=in_cache, is_final=is_final)
    if segments_result:
        print(segments_result)

Citation

@inproceedings{zhang2018deep,
  title={Deep-FSMN for large vocabulary continuous speech recognition},
  author={Zhang, Shiliang and Lei, Ming and Yan, Zhijie and Dai, Lirong},
  booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5869--5873},
  year={2018},
  organization={IEEE}
}

@misc{FunASR,
  author = {Speech Lab, Alibaba Group, China},
  title = {FunASR: A Fundamental End-to-End Speech Recognition Toolkit},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/alibaba-damo-academy/FunASR/}},
}

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
.github/workflows		.github/workflows
fsmnvad		fsmnvad
test		test
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py
version.txt		version.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FSMN VAD

Key Features

Installation

Usage

Citation

About

Releases 2

Packages

Languages

License

lovemefan/fsmn-vad

Folders and files

Latest commit

History

Repository files navigation

FSMN VAD

Key Features

Installation

Usage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages