This is a repository for 3D Diffusion Generation research papers digest. The taxonomy and papers highly refer the survey paper (state of the art on diffusion models for visual computing) and the repo: https://github.com/cwchenwang/awesome-3d-diffusion?tab=readme-ov-file
- state of the art on diffusion models for visual computing πππππ
This survey paper provides insightful introductions about diffusion models and critical applications, including 2D, 3D, video, and 4D.
This series of papers are focused on modeling the distribution of 3D shapes. Eventually, 3D content/model can be directly generated by the well-trained 3D diffusion models. Taxonomy: the type of output, representation,
This series of papers aim to leverage the 2D diffusion models (pre-trained or train from scratch) to generate high-quality and diverse 3D content. In this mode, the 3D contents are not generated directly by diffusion models. Especially, they try to leverage the score information from 2D diffusion models to optimize the 3D reconstructor.
-
DreamFusion: Text-to-3D using 2D Diffusion πππππ
$\textit{ICLR 2023 Outstanding Paper Award}$
This paper firstly proposes using SDS to generate 3D contents based on the pre-trained diffusion models. (NeRF) -
Magic3D: High-Resolution Text-to-3D Content Creation ππππ
$\textit{CVPR 2023}$
Based on the SDS, they propose the coarse-to-fine two-stage optimization to generate high-resolution 3D output efficiently. (NeRF & Mesh) -
Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation πππ
$\textit{ICCV 2023}$
They -
ProlificDreamer: High-Fidelity and Diverse Text-to-3D Generation with Variational Score Distillation πππππ
$\textit{NeurIPS 2023 Spotlight}$
This paper proposes the (Variational Score Distillation) VSD to enhance the quality and diversity of 3D contents. (NeRF & Mesh) - NFSD: Noise Free Score Distillation
These works aim to train/fine-tune a diffusion model to generate multi-view images given single image. Regrading to the model and output, these works are roughly divided into three categories. First, the output is color images with 3D consistency based on 2D diffusion models. Second, the output is color images and geometry images (depth image, normal map, .etc.) based on 2D diffusion models. Third, the information from 2D and 3D diffusion models are combined together.
-
RealFusion: 360Β° Reconstruction of Any Object from a Single Image πππ
$\textit{CVPR 2023}$
Reconstruction loss and SDS are used to reconstruct the object based on a given image. -
Zero-1-to-3: Zero-shot One Image to 3D Object πππππ
$\textit{ICCV 2023}$
Zero-shot transfer, single image input, and 3D generation content (meanings of zero, one, and three). -
Magic123: One Image to High-Quality 3D Object Generation Using Both 2D and 3D Diffusion Priors ππππ
$\textit{ICCV 2023}$
The idea is similar to Magic which is composed of two stages. Single image input to 3D content (meanings of one, two, three). - []
- Wonder3D: Single Image to 3D using Cross-Domain Diffusion πππππ Fine-tune a pre-trained diffusion model to output consistent color images and normal maps and synthesize 3D model based on th cross-domian images (SDF, .etc.)
- DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation πππππ