Stars
Chinese and English multimodal conversational language model | 多模态中英双语对话语言模型
🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps
An open-source framework for training large multimodal models.
UT-Sarulab MOS prediction system using SSL models
Python parser and tools for MUSDB18 Music Separation Dataset
Writing AI Conference Papers: A Handbook for Beginners
A family of diffusion models for text-to-audio generation.
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction (LLM-TSE)
Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".
Versatile audio super resolution (any -> 48kHz) with AudioSR.
PyTorch implementation of Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities.
Code for GeSS: Benchmarking Geometric Deep Learning under Scientific Applications with Distribution Shifts
Audio Codec Speech processing Universal PERformance Benchmark
AcademiCodec: An Open Source Audio Codec Model for Academic Research
Metadata, scripts and baselines for the MTG-Jamendo dataset
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
Official implementation of "Separate Anything You Describe"
Official Implementation of EnCLAP (ICASSP 2024)
The official code repo for "Zero-shot Audio Source Separation through Query-based Learning from Weakly-labeled Data", in AAAI 2022
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Source code for the paper 'Audio Captioning Transformer'
AudioLDM training, finetuning, evaluation and inference.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
Official repo for WavCraft, an AI agent for audio creation and editing