Stars
🤢 LipSick: Fast, High Quality, Low Resource Lipsync Tool 🤮
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Versatile audio super resolution (any -> 48kHz) with AudioSR.
An optimized pipeline for DINet reducing inference latency for up to 60% 🚀. Kudos for the authors of the original repo for this amazing work.
C0untFloyd / roop-unleashed
Forked from s0md3v/roopEvolved Fork of roop with Web Server and lots of additions
Easy tool that splits given audio based on speaker.
(discontinued) AudioSlicer (Editor) for ai-voice-cloning by mrq
Implementation of Meta-Voicebox : The first generative AI model for speech to generalize across tasks with state-of-the-art performance.
Inpaint Anything extension performs stable diffusion inpainting on a browser UI using masks from Segment Anything.
Official Code for DragGAN (SIGGRAPH 2023)
Full GUI Version
Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
[ECCV 2022] XMem: Long-Term Video Object Segmentation with an Atkinson-Shiffrin Memory Model
152334H / DL-Art-School
Forked from neonbjb/DL-Art-SchoolTorToiSe fine-tuning with DLAS
[CVPR 2023] MetaPortrait: Identity-Preserving Talking Head Generation with Fast Personalized Adaptation
Bringing Old Photo Back to Life (CVPR 2020 oral)
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer