Skip to content

Latest commit

 

History

History
196 lines (191 loc) · 129 KB

image-and-video-synthesis-and-generation.md

File metadata and controls

196 lines (191 loc) · 129 KB

CVPR-2024-Papers

Application App
Previous Collections Conference

Image and Video Synthesis and Generation

Section Papers Preprint Papers Papers with Open Code Papers with Video

Title Repo Paper Video
Alchemist: Parametric Control of Material Properties with Diffusion Models WEB Page thecvf
arXiv
Analyzing and Improving the Training Dynamics of Diffusion Models GitHub thecvf
arXiv
Attention Calibration for Disentangled Text-to-Image Personalization GitHub thecvf
arXiv
FreeU: Free Lunch in Diffusion U-Net WEB Page
GitHub
thecvf
arXiv
YouTube
Generative Image Dynamics GitHub Page
GitHub
thecvf
arXiv
Instruct-Imagen: Image Generation with Multi-Modal Instruction GitHub Page thecvf
arXiv
NoiseCLR: A Contrastive Learning Approach for Unsupervised Discovery of Interpretable Directions in Diffusion Models GitHub Page thecvf
arXiv
YouTube
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following GitHub Page
GitHub
thecvf
arXiv
YouTube
Style Aligned Image Generation via Shared Attention GitHub Page
GitHub
thecvf
arXiv
Visual Anagrams: Generating Multi-View Optical Illusions with Diffusion Models GitHub Page
GitHub
thecvf
arXiv
Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed Diffusion Models WEB Page thecvf
arXiv
Amodal Completion via Progressive Mixed Context Diffusion GitHub Page
GitHub
thecvf
arXiv
CLiC: Concept Learning in Context GitHub Page
GitHub
thecvf
arXiv
YouTube
Clockwork Diffusion: Efficient Generation with Model-Step Distillation GitHub thecvf
arXiv
YouTube
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis GitHub thecvf
arXiv
YouTube
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing GitHub Page
GitHub
thecvf
arXiv
Correcting Diffusion Generation through Resampling GitHub thecvf
arXiv
CosmicMan: A Text-to-Image Foundation Model for Humans GitHub Page
GitHub
Hugging Face
thecvf
arXiv
YouTube
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations GitHub Page
GitHub
thecvf
arXiv
Diffusion Handles Enabling 3D Edits for Diffusion Models by Lifting Activations to 3D GitHub Page
GitHub
thecvf
arXiv
YouTube
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models WEB Page
GitHub
thecvf
arXiv
YouTube
Don't Drop Your Samples! Coherence-Aware Training Benefits Conditional Diffusion GitHub Page
GitHub
thecvf YouTube
DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing GitHub Page
GitHub
thecvf
arXiv
YouTube
Dynamic Policy-Driven Adaptive Multi-Instance Learning for whole Slide Image Classification WEB Page
GitHub
thecvf
arXiv
YouTube
Fast ODE-based Sampling for Diffusion Models in Around 5 Steps GitHub thecvf
arXiv
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis GitHub Page
GitHub
thecvf
arXiv
YouTube
Generative Powers of Ten GitHub Page
GitHub
thecvf
arXiv
HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting GitHub Page
GitHub
thecvf
arXiv
YouTube
Image Neural Field Diffusion Models GitHub Page thecvf
arXiv
Learning Adaptive Spatial Coherent Correlations for Speech-Preserving Facial Expression Manipulation GitHub thecvf
LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching GitHub thecvf
arXiv
YouTube
MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation GitHub Page thecvf
arXiv
YouTube
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis GitHub Page
GitHub
thecvf
arXiv
One-Dimensional Adapter to Rule them All: Concepts Diffusion Models and Erasing Applications GitHub Page
GitHub
thecvf
arXiv
Orthogonal Adaptation for Modular Customization of Diffusion Models WEB Page thecvf
arXiv
YouTube
PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis GitHub Page
GitHub
thecvf
arXiv
YouTube
Emu Edit: Precise Image Editing via Recognition and Generation Tasks WEB Page thecvf
arXiv
YouTube
Predicated Diffusion: Predicate Logic-based Attention Guidance for Text-to-Image Diffusion Models thecvf
arXiv
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models GitHub Page
GitHub
Hugging Face
thecvf
arXiv
YouTube
Readout Guidance: Learning Control from Diffusion Features GitHub Page
GitHub
thecvf
arXiv
Real-Time 3D-Aware Portrait Video Relighting WEB Page thecvf
Residual Learning in Diffusion Models thecvf
Rethinking FID: Towards a Better Evaluation Metric for Image Generation GitHub Page thecvf
arXiv
Sat2Scene: 3D Urban Scene Generation from Satellite Images with Diffusion thecvf
arXiv
YouTube
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing GitHub Page
GitHub
thecvf
arXiv
YouTube
SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models GitHub Page
GitHub
thecvf
arXiv
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis GitHub Page thecvf
arXiv
YouTube
Style Injection in Diffusion: A Training-Free Approach for Adapting Large-Scale Diffusion Models for Style Transfer GitHub Page
GitHub
thecvf
arXiv
Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models GitHub Page
GitHub
thecvf
arXiv
YouTube
Taming Stable Diffusion for Text to 360 Panorama Image Generation GitHub Page
GitHub
thecvf
arXiv
TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models GitHub Page
GitHub
thecvf
arXiv
Total Selfie: Generating Full-Body Selfies WEB Page
GitHub
thecvf
arXiv
YouTube
UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs thecvf
arXiv
VecFusion: Vector Font Generation with Diffusion GitHub Page thecvf
arXiv
YouTube
ViVid-1-to-3: Novel View Synthesis with Video Diffusion Models GitHub Page
GitHub
thecvf
3D Geometry-Aware Deformable Gaussian Splatting for Dynamic View Synthesis GitHub Page thecvf
arXiv
YouTube
3D Multi-Frame Fusion for Video Stabilization thecvf
arXiv
YouTube
4D-fy: Text-to-4D Generation using Hybrid Score Distillation Sampling GitHub Page
GitHub
thecvf
360DVD: Controllable Panorama Video Generation with 360-Degree Video Diffusion Model GitHub Page
GitHub
thecvf
arXiv
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization GitHub Page thecvf
arXiv
Ƶ*: Zero-Shot Style Transfer via Attention Reweighting GitHub thecvf
A Recipe for Scaling up Text-to-Video Generation with Text-Free Videos GitHub Page
GitHub
thecvf
arXiv
YouTube
A Unified Approach for Text- and Image-guided 4D Scene Generation WEB Page
GitHub
thecvf
arXiv
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing GitHub Page
GitHub
thecvf
arXiv
YouTube
Accelerating Diffusion Sampling with Optimized Time Steps GitHub thecvf
arXiv
ACT-Diffusion: Efficient Adversarial Consistency Training for One-Step Diffusion Models GitHub thecvf
arXiv
Adversarial Score Distillation: When Score Distillation Meets GAN GitHub Page
GitHub
thecvf
arXiv
Adversarial Text to Continuous Image Generation GitHub Page thecvf
AEROBLADE: Training-Free Detection of Latent Diffusion Images using Autoencoder Reconstruction Error GitHub thecvf
arXiv
YouTube
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation GitHub Page
GitHub
thecvf
arXiv
YouTube
Animating General Image with Large Visual Motion Model GitHub Page
GitHub
thecvf
Anomaly Score: Evaluating Generative Models and Individual Generated Images based on Complexity and Vulnerability thecvf
arXiv
AnyDoor: Zero-Shot Object-Level Image Customization GitHub Page
GitHub
Hugging Face
thecvf
arXiv
YouTube
AnyScene: Customized Image Synthesis with Composited Foreground thecvf YouTube
Arbitrary-Scale Image Generation and Upsampling using Latent Diffusion Model and Implicit Neural Decoder thecvf
arXiv
ArtAdapter: Text-to-Image Style Transfer using Multi-Level Style Encoder and Explicit Adaptation GitHub Page
GitHub
thecvf
arXiv
YouTube
AVID: Any-Length Video Inpainting with Diffusion Model GitHub Page
GitHub
thecvf
arXiv
YouTube
Balancing Act: Distribution-Guided Debiasing in Diffusion Models GitHub Page
GitHub
thecvf
arXiv
YouTube
BerfScene: Bev-Conditioned Equivariant Radiance Fields for Infinite 3D Scene Generation GitHub Page
GitHub
Hugging Face
thecvf
arXiv
Beyond First-Order Tweedie: Solving Inverse Problems using Latent Diffusion thecvf
arXiv
Beyond Textual Constraints: Learning Novel Diffusion Conditions with Fewer Examples GitHub thecvf YouTube
BIVDiff: A Training-Free Framework for General-Purpose Video Synthesis via Bridging Image and Video Diffusion Models GitHub Page
GitHub
thecvf
arXiv
Boosting Diffusion Models with Moving Average Sampling in Frequency Domain thecvf
arXiv
YouTube
C3: High-Performance and Low-Complexity Neural Compression from a Single Image or Video GitHub Page
GitHub
thecvf
arXiv
Cache Me if You Can: Accelerating Diffusion Models through Block Caching GitHub Page thecvf
arXiv
YouTube
CAMEL: CAusal Motion Enhancement Tailored for Lifting Text-Driven Video Editing GitHub thecvf YouTube
CapHuman: Capture Your Moments in Parallel Universes GitHub Page
GitHub
thecvf
arXiv
Carve3D: Improving Multi-View Reconstruction Consistency for Diffusion Models with RL Finetuning GitHub Page
GitHub
thecvf
arXiv
CAT-DM: Controllable Accelerated Virtual Try-On with Diffusion Model GitHub Page
GitHub
thecvf YouTube
CCEdit: Creative and Controllable Video Editing via Diffusion Models GitHub Page
GitHub
thecvf
arXiv
YouTube
CDFormer: When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution GitHub thecvf
arXiv
CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization GitHub thecvf
arXiv
YouTube
Check Locate Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation GitHub Page thecvf
arXiv
Cinematic Behavior Transfer via NeRF-based Differentiable Filming GitHub Page
GitHub
thecvf
arXiv
Codebook Transfer with Part-of-Speech for Vector-Quantized Image Modeling thecvf
arXiv
YouTube
CoDi: Conditional Diffusion Distillation for Higher-Fidelity and Faster Image Generation GitHub Page
GitHub
thecvf
arXiv
Combining Frame and GOP Embeddings for Neural Video Representation thecvf
CommonCanvas: Open Diffusion Models Trained on Creative-Commons Images GitHub Page
Hugging Face
thecvf
Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models thecvf
arXiv
YouTube
Condition-Aware Neural Network for Controlled Image Generation GitHub thecvf
arXiv
YouTube
CONFORM: Contrast is All You Need for High-Fidelity Text-to-Image Diffusion Models GitHub Page
GitHub
thecvf
arXiv
ConsistNet: Enforcing 3D Consistency for Multi-View Images Diffusion GitHub Page
GitHub
thecvf
arXiv
YouTube
Content-Style Decoupling for Unsupervised Makeup Transfer without Generating Pseudo Ground Truth GitHub thecvf
arXiv
YouTube
Contrastive Denoising Score for Text-Guided Latent Diffusion Image Editing GitHub Page
GitHub
thecvf
arXiv
YouTube
ControlRoom3D: Room Generation using Semantic Proxy Rooms GitHub Page thecvf
arXiv
YouTube
Cross Initialization for Face Personalization of Text-to-Image Models GitHub thecvf
arXiv
Customization Assistant for Text-to-Image Generation thecvf
arXiv
Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training GitHub Page
GitHub
thecvf
arXiv
YouTube
DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance GitHub Page
GitHub
thecvf
arXiv
YouTube
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement GitHub thecvf
arXiv
YouTube
Deformable One-Shot Face Stylization via DINO Semantic Guidance WEB Page
GitHub
thecvf
arXiv
YouTube
DemoCaricature: Democratising Caricature Generation with a Rough Sketch GitHub Page
GitHub
thecvf
arXiv
YouTube
DemoFusion: Democratising High-Resolution Image Generation with No $$$ GitHub Page
GitHub
Hugging Face
thecvf
arXiv
YouTube
DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception thecvf
arXiv
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model GitHub thecvf
DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing GitHub Page
GitHub
thecvf
arXiv
DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing GitHub Page
GitHub
Hugging Face
thecvf
arXiv
YouTube
DiffPerformer: Iterative Learning of Consistent Latent Guidance for Diffusion-based Human Video Generation thecvf
DiffSHEG: A Diffusion-based Approach for Real-Time Speech-Driven Holistic 3D Expression and Gesture Generation GitHub Page
GitHub
thecvf
arXiv
YouTube
Diffusion Model Alignment using Direct Preference Optimization GitHub thecvf
arXiv
Diffusion Models without Attention thecvf
arXiv
Direct2.5: Diverse Text-to-3D Generation via Multi-View 2.5D Diffusion GitHub Page
GitHub
thecvf
arXiv
DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data GitHub Page
GitHub
thecvf
arXiv
DisCo: Disentangled Control for Realistic Human Dance Generation GitHub Page
GitHub
thecvf
arXiv
YouTube
Discriminative Probing and Tuning for Text-to-Image Generation GitHub Page
GitHub
thecvf
arXiv
Distilling ODE Solvers of Diffusion Models into Smaller Steps GitHub thecvf
arXiv
Diversity-Aware Channel Pruning for StyleGAN Compression GitHub Page
GitHub
thecvf
arXiv
YouTube
Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting GitHub thecvf
arXiv
YouTube
Doubly Abductive Counterfactual Inference for Text-based Image Editing GitHub thecvf
arXiv
YouTube
Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation GitHub thecvf
arXiv
YouTube
DREAM: Diffusion Rectification and Estimation-Adaptive Models WEB Page
GitHub
thecvf
arXiv
DreamComposer: Controllable 3D Object Generation via Multi-View Conditions GitHub Page
GitHub
thecvf
arXiv
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization GitHub Page
GitHub
thecvf
arXiv
YouTube
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation thecvf
arXiv
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion GitHub Page
GitHub
thecvf
arXiv
YouTube
DyBluRF: Dynamic Neural Radiance Fields from Blurry Monocular Video GitHub Page
GitHub
thecvf
arXiv
YouTube
DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing GitHub Page thecvf
arXiv
YouTube
Dysen-VDM: Empowering Dynamics-Aware Text-to-Video Diffusion with LLMs GitHub thecvf YouTube
EasyDrag: Efficient Point-based Manipulation on Diffusion Models GitHub thecvf
ECLIPSE: A Resource-Efficient Text-to-Image Prior for Image Generations WEB Page
GitHub
Hugging Face
thecvf
arXiv
Edit One for All: Interactive Batch Image Editing GitHub Page
GitHub
thecvf
arXiv
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation GitHub Page
GitHub
thecvf
arXiv
EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models WEB Page
GitHub
thecvf
arXiv
EMOPortraits: Emotion-enhanced Multimodal One-Shot Head Avatars GitHub Page
GitHub
thecvf
arXiv
Exact Fusion via Feature Distribution Matching for Few-Shot Image Generation GitHub thecvf YouTube
Exploiting Diffusion Prior for Generalizable Dense Prediction GitHub Page
GitHub
thecvf
arXiv
YouTube
Face2Diffusion for Fast and Editable Face Personalization GitHub Page
GitHub
thecvf
arXiv
YouTube
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-Shot Subject-Driven Generation GitHub thecvf
arXiv
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text WEB Page thecvf
arXiv
YouTube
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis GitHub Page thecvf
arXiv
Fixed Point Diffusion Models GitHub Page
GitHub
thecvf
arXiv
Focus on Your Instruction: Fine-grained and Multi-Instruction Image Editing by Attention Modulation GitHub thecvf
arXiv
YouTube
FreeControl: Training-Free Spatial Control of any Text-to-Image Diffusion Model with any Condition GitHub Page
GitHub
thecvf
arXiv
FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition GitHub Page
GitHub
thecvf
arXiv
FreeDrag: Feature Dragging for Reliable Point-based Image Editing WEB Page
GitHub
thecvf
arXiv
YouTube
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation WEB Page
GitHub
Hugging Face
thecvf
arXiv
YouTube
FSRT: Facial Scene Representation Transformer for Face Reenactment from Factorized Appearance Head-Pose and Facial Expression Features GitHub Page
GitHub
thecvf
arXiv
YouTube
Gaussian Shell Maps for Efficient 3D Human Generation GitHub thecvf
arXiv
YouTube
GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models WEB Page
GitHub
thecvf
arXiv
GeneAvatar: Generic Expression-Aware Volumetric Head Avatar Editing from a Single Image GitHub Page
GitHub
thecvf
arXiv
YouTube