-
GIST (Gwangju Institute of Science and Technology)
- Gwangju, Republic of Korea
- https://velog.io/@dongkeon
Highlights
- Pro
Lists (2)
Sort Name ascending (A-Z)
Stars
Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization (ACM MM 2024)
Scripts for data generation, scoring and data manifest preparation for CHiME-8 DASR task.
The official code repo of "HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection"
📄 Awesome CV is LaTeX template for your outstanding job application
This is the Python library for an unsupervised, fast method for robust voice activity detection (rVAD), as in the paper rVAD: An Unsupervised Segment-Based Robust Voice Activity Detection Method.
[INTERSPEECH 2022] This dataset is designed for multi-modal speaker diarization and lip-speech synchronization in the wild.
Companion repo for the paper "PixIT: Joint Training of Speaker Diarization and Speech Separation from Real-world Multi-speaker Recordings" published at Odyssey 2024
Vundle, the plug-in manager for Vim
🙃 A delightful community-driven (with 2,300+ contributors) framework for managing your zsh configuration. Includes 300+ optional plugins (rails, git, macOS, hub, docker, homebrew, node, php, python…
The implementation of "End-to-End Neural Speaker Diarization with an Iterative Adaptive Attractor Estimation", which is accepted by Neural Networks.
Clustering-based methods for overlapping diarization
Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper
CHIME-7/8 diarization champion system: neural speaker diarization using memory-aware multi-speaker embedding with sequence-to-sequence architecture
Some comprehensive papers about speaker diarization
The Hugging Face Course on Transformers for Audio
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors". [ICASSP 2024]
A collection of inspiring lists, manuals, cheatsheets, blogs, hacks, one-liners, cli/web tools and more.
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
This repository contains audio samples and supplementary materials accompanying publications by the "Speaker, Voice and Language" team at Google.