Stars
SLT 2024 Challenge: Post-ASR-Speaker-Tagging
Cross-Speaker Encoding Network for Multi-talker Speech Recognition
Code for InterSpeech 2024 Paper: LipGER: Visually-Conditioned Generative Error Correction for Robust Automatic Speech Recognition
Repository for "LLM-based speaker diarization correction: A generalizable approach" paper
Single-blind supplementary materials for NeurIPS 2023 submission
MooER: Open-sourced LLM for audio understanding trained on 80,000 hours of data
Foal-Net:Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition
[ICLR 2023] Codebase for Copy-Generator model, including an implementation of kNN-LM
Textualized and Feature-based Models for Compound Multimodal Emotion Recognition in the Wild, ABAW 7th - Challenge - Compound Expression (CE) Recognition Challenge
[Interspeech 2024] Whisper-Flamingo: Integrating Visual Features into Whisper for Audio-Visual Speech Recognition and Translation
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
Generative Fusion Decoding (GFD) is a novel framework for integrating Large Language Models (LLMs) into multi-modal text recognition systems like ASR and OCR, improving performance and efficiency b…
Pytorch implementation for Tailor Versatile Multi-modal Learning for Multi-label Emotion Recognition
This repo contains a list of the 44,998 most common Japanese words in order of frequency, as determined by the University of Leeds Corpus.