-
Westlake University
- Hangzhou
Stars
A Python implementation of COP-KMEANS algorithm
A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NIPS 2024]
Some comprehensive papers about speaker diarization
Tools for handling speech data in machine learning projects.
Python interface to the WebRTC Voice Activity Detector
The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors". [ICASSP 2024]
[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
A pytorch implementation of the paper "ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding"
Multimodal speaker diarization using pre-trained audio-visual synchronization model
Both audio-only and audio-visual speaker diarization datasets are listed here.
Out of time: automated lip sync in the wild
The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recogniti…
kaldi-asr/kaldi is the official location of the Kaldi project.
This repository contains a set of codes to run (i.e., train, perform inference with, evaluate) a diarization method called EEND-vector-clustering.
A curated list of awesome Speaker Diarization papers, libraries, datasets, and other resources.
A PyTorch implementation of End-to-End Neural Diarization
Exploring Unsupervised Cell Recognition with Prior Self-activation Maps (MICCAI 2023)
A library built for easier audio self-supervised training, downstream tasks evaluation
PyTorch implementation of "FullSubNet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."
The official repo: "McNet: Fuse Multiple Cues for Multichannel Speech Enhancement", ICASSP 2023
This repo includes the official implementations of "Fine-tune the pretrained ATST model for sound event detection".
Official PyTorch implementation of "RVAE-EM: Generative speech dereverberation based on recurrent variational auto-encoder and convolutive transfer function" [ICASSP2024]
This repository is the official implementation of "Unimodal Aggregation for CTC-based Speech Recognition".