Silero VAD: pre-trained enterprise-grade Voice Activity Detector
-
Updated
Jul 11, 2024 - Python
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
On-device voice activity detection (VAD) powered by deep learning
A python package to build AI-powered real-time audio applications
Code for ICASSP 2024 paper WhisperSeg: Positive Transfer of the Whisper Speech Transformer to Human and Animal Voice Activity Detection
CNN-based audio segmentation toolkit. Allows to detect speech, music, noise and speaker gender. Has been designed for large scale gender equality studies based on speech time per gender.
Automatically synchronize and translate subtitles, or create new ones by transcribing, using pre-trained DNNs, Forced Alignments and Transformers. https://subaligner.readthedocs.io/
Tr-VAD: An Efficient Transformer based Voice Activity Detection Model
ovos plugin for voice activity detection using silero vad
♂️♀️ Detect a person's gender from a voice file (90.7% +/- 1.3% accuracy).
A comprehensive AI companion leveraging advanced semantic analysis, sentiment detection, and voice processing to provide personalized and context-aware interactions using Autogen, semantic-router, and VoiceProcessingToolkit.
This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.
A repository for code used to produce the results the ICASSP 2024 paper: "SELF-SUPERVISED PRETRAINING FOR ROBUST PERSONALIZED VOICE ACTIVITY DETECTION IN ADVERSE CONDITIONS"
Automagically synchronize subtitles with video.
Python AI assistant 🧠
ovos plugin for voice activity detection using webrtcvad
Command-line utility to transcribe/translate from video/audio/subtitles to subtitles
Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering
The Voxseg implementation in PyTorch. Voxseg is a python library for voice activity detection (VAD) for speech/non-speech segmentation.
🎙️ Enhanced Speaker Diarisation 📒 with OSD, SS, and Advanced VAD🗣️.
Add a description, image, and links to the voice-activity-detection topic page so that developers can more easily learn about it.
To associate your repository with the voice-activity-detection topic, visit your repo's landing page and select "manage topics."