HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance

Zhu, Junzhe; Zhuang, Peiye; Koyejo, Sanmi

Computer Science > Computer Vision and Pattern Recognition

arXiv:2305.18766 (cs)

[Submitted on 30 May 2023 (v1), last revised 11 Mar 2024 (this version, v4)]

Title:HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance

Authors:Junzhe Zhu, Peiye Zhuang, Sanmi Koyejo

View PDF HTML (experimental)

Abstract:The advancements in automatic text-to-3D generation have been remarkable. Most existing methods use pre-trained text-to-image diffusion models to optimize 3D representations like Neural Radiance Fields (NeRFs) via latent-space denoising score matching. Yet, these methods often result in artifacts and inconsistencies across different views due to their suboptimal optimization approaches and limited understanding of 3D geometry. Moreover, the inherent constraints of NeRFs in rendering crisp geometry and stable textures usually lead to a two-stage optimization to attain high-resolution details. This work proposes holistic sampling and smoothing approaches to achieve high-quality text-to-3D generation, all in a single-stage optimization. We compute denoising scores in the text-to-image diffusion model's latent and image spaces. Instead of randomly sampling timesteps (also referred to as noise levels in denoising score matching), we introduce a novel timestep annealing approach that progressively reduces the sampled timestep throughout optimization. To generate high-quality renderings in a single-stage optimization, we propose regularization for the variance of z-coordinates along NeRF rays. To address texture flickering issues in NeRFs, we introduce a kernel smoothing technique that refines importance sampling weights coarse-to-fine, ensuring accurate and thorough sampling in high-density regions. Extensive experiments demonstrate the superiority of our method over previous approaches, enabling the generation of highly detailed and view-consistent 3D assets through a single-stage training process.

Comments:	Project page: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2305.18766 [cs.CV]
	(or arXiv:2305.18766v4 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2305.18766

Submission history

From: Peiye Zhuang [view email]
[v1] Tue, 30 May 2023 05:56:58 UTC (6,291 KB)
[v2] Wed, 31 May 2023 07:35:49 UTC (6,303 KB)
[v3] Tue, 28 Nov 2023 05:09:02 UTC (10,783 KB)
[v4] Mon, 11 Mar 2024 06:14:31 UTC (10,783 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion Guidance

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators