Stars
Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch
The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.
Code for the paper: GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
LAVIS - A One-stop Library for Language-Vision Intelligence
Llama3、Llama3.1 中文仓库(随书籍撰写中... 各种网友及厂商微调、魔改版本有趣权重 & 训练、推理、评测、部署教程视频 & 文档)
Official Implementation of EnCLAP (ICASSP 2024)
A family of diffusion models for text-to-audio generation.
Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi…