Stars
An Open Source text-to-speech system built by inverting Whisper.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
Metrics for evaluating music and audio generative models – with a focus on long-form, full-band, and stereo generations.
Generative models for conditional audio generation
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
PyTorch implementation of MAE https//arxiv.org/abs/2111.06377
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Music Audio Representation Benchmark for Universal Evaluation
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
This reporsitory contains metadata of WavCaps dataset and codes for downstream tasks.
This repo is a pipeline of VITS finetuning for fast speaker adaptation TTS, and many-to-many voice conversion
Fine-Tuning your VITS model using a pre-trained model
Differentiable audio signal processors in PyTorch
The fundamentals for Digital Audio Signal Processing. Formerly `sample`.
[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
VirtualWife是一个虚拟数字人项目,支持B站直播,支持openai、ollama
AGI 社交网络 Bot. BiliBili | 直播聊天数字人 | 视频@自动回复 | 私信bot | 终端聊天 | 语音交互
Ikaros-521 / AI-Vtuber
Forked from sandboxdream/AI-VtuberAI Vtuber是一个由 【ChatterBot/ChatGPT/claude/langchain/chatglm/text-gen-webui/闻达/千问/kimi/ollama】 驱动的虚拟主播【Live2D/UE/xuniren】,可以在 【Bilibili/抖音/快手/微信视频号/拼多多/斗鱼/YouTube/twitch/TikTok】 直播中与观众实时互动 或 直接在本地进行聊…
InvokeAI is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. Th…
A command-line utility that allows you to interact with the Shutterstock public API.
【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
A curated list of deep learning resources for video-text retrieval.
An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
Object-oriented handling of audio data, with GPU-powered augmentations, and more.
DeepAFx-ST - Style transfer of audio effects with differentiable signal processing. Please see https://csteinmetz1.github.io/DeepAFx-ST/