Lists (1)
Sort Name ascending (A-Z)
Stars
WiLoR: End-to-end 3D hand localization and reconstruction in-the-wild
StoryMaker: Towards consistent characters in text-to-image generation
SCEPTER is an open-source framework used for training, fine-tuning, and inference with generative models.
Instant voice cloning by MIT and MyShell.
[ECCV 2024 Oral] EDTalk - Official PyTorch Implementation
[IJCAI'24] Beyond Alignment: Blind Video Face Restoration via Parsing-Guided Temporal-Coherent Transformer
[ICLR 2024] Official pytorch implementation of "ControlVideo: Training-free Controllable Text-to-Video Generation"
This is the official repository for TalkSHOW: Generating Holistic 3D Human Motion from Speech [CVPR2023].
AutoShot: A Short Video Dataset and State-of-the-Art Shot Boundary Detection - CVPR NAS 2023
TransNet V2: Shot Boundary Detection Neural Network
A simple GUI to show shot boundary detection based on TransNet V2.
[BMVC 2023] 3D Structure-guided Network for Tooth Alignment in 2D Photograph
real time face swap and one-click video deepfake with only a single image
Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Official implementation of HumanVid, NeurIPS D&B Track 2024
Official implementation of the SIGGRAPH 2024 paper "N-Dimensional Gaussians for Fitting of High Dimensional Functions"
CatVTON is a simple and efficient virtual try-on diffusion model with 1) Lightweight Network (899.06M parameters totally), 2) Parameter-Efficient Training (49.57M parameters trainable) and 3) Simpl…
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
Pytorch implementation for few-shot photorealistic video-to-video translation.
Artstation-Artistic-face-HQ Dataset (AAHQ)
Official implementation of CVPR 2024 paper: "FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition"
Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning
V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
Official implementation of MotionClone: Training-Free Motion Cloning for Controllable Video Generation
📺 An End-to-End Solution for High-Resolution and Long Video Generation Based on Transformer Diffusion
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
A work list of recent human video generation method. This repository focus on half/full body human video generation method, The Nerf, Gaussian splashing, Motion Pose, and talking head/Portrait is n…