Stars
PyTorch - FID calculation with proper image resizing and quantization steps [CVPR 2022]
Dual-Branch Network for Portrait Image Quality Assessment
PeRFlow: Piecewise Rectified Flow as Universal Plug-and-Play Accelerator (NeurIPS 2024)
A collection of various image grids created with Flux. Things like hair styles, clothing, nationalities, ages, etc.
A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation
Official code for Goldfish model for long video understanding and MiniGPT4-video for short video understanding
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising
利用AI大模型,一键生成高清短视频 Generate short videos with one click using AI LLM.
Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, Real-ESRGAN, Real-CUGAN, RTX Video Super Resolution VSR, SRMD, RealSR, Anime4K, RIFE, IF…
我的 ComfyUI 工作流合集 | My ComfyUI workflows collection
A collection of awesome video generation studies.
Open-Sora: Democratizing Efficient Video Production for All
(CVPR 2023) CelebV-Text: A Large-Scale Facial Text-Video Dataset
Annotated Flow Matching paper
Infinite Photorealistic Worlds using Procedural Generation
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
[ECCV 2024, Oral] DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors
[NeurIPS'23] ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding
[CSUR] A Survey on Video Diffusion Models
[ICCV'23] Efficient Region-Aware Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis
Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
📖 A curated list of resources dedicated to talking face.
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
✨ Hotshot-XL: State-of-the-art AI text-to-GIF model trained to work alongside Stable Diffusion XL
The world's simplest facial recognition api for Python and the command line
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens