Update README.md

Add "Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics" Add "Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections" Add "L4GM: Large 4D Gaussian Reconstruction Model" Add "PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting" Add "Unified Gaussian Primitives for Scene Representation and Rendering" Add "GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion" Add "GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors"
Awesome3DGS · Jun 18, 2024 · ec1c8f5 · ec1c8f5
1 parent 57ce041
commit ec1c8f5
Show file tree

Hide file tree

Showing 9 changed files with 135 additions and 42 deletions.
diff --git a/Changelog.md b/Changelog.md
@@ -1,5 +1,21 @@
 # Changelog
 
+### 2024/06/18
+
+Add "Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics"
+
+Add "Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections"
+
+Add "L4GM: Large 4D Gaussian Reconstruction Model"
+
+Add "PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting"
+
+Add "Unified Gaussian Primitives for Scene Representation and Rendering"
+
+Add "GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion"
+
+Add "GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors"
+
 ### 2024/06/14
 
 Add "Modeling Ambient Scene Dynamics for Free-view Synthesis"

diff --git a/README.md b/README.md
diff --git a/abs/2406.09733.md b/abs/2406.09733.md
@@ -0,0 +1,5 @@
+### Unified Gaussian Primitives for Scene Representation and Rendering
+
+Searching for a unified scene representation remains a research challenge in computer graphics. Traditional mesh-based representations are unsuitable for dense, fuzzy elements, and introduce additional complexity for filtering and differentiable rendering. Conversely, voxel-based representations struggle to model hard surfaces and suffer from intensive memory requirement. We propose a general-purpose rendering primitive based on 3D Gaussian distribution for unified scene representation, featuring versatile appearance ranging from glossy surfaces to fuzzy elements, as well as physically based scattering to enable accurate global illumination. We formulate the rendering theory for the primitive based on non-exponential transport and derive efficient rendering operations to be compatible with Monte Carlo path tracing. The new representation can be converted from different sources, including meshes and 3D Gaussian splatting, and further refined via transmittance optimization thanks to its differentiability. We demonstrate the versatility of our representation in various rendering applications such as global illumination and appearance editing, while supporting arbitrary lighting conditions by nature. Additionally, we compare our representation to existing volumetric representations, highlighting its efficiency to reproduce details.s
+
+在计算机图形学中，寻找统一的场景表示仍是一个研究挑战。传统的基于网格的表示法不适用于密集、模糊的元素，并且在过滤和可微渲染方面引入了额外的复杂性。相反，基于体素的表示法难以模拟硬表面，并且会遭受密集的内存需求。我们提出了一种基于三维高斯分布的通用渲染原语，用于统一场景表示，具有从光滑表面到模糊元素的多样化外观，以及基于物理的散射，以实现准确的全局照明。我们根据非指数传输公式化了该原语的渲染理论，并推导出与蒙特卡罗路径跟踪兼容的高效渲染操作。这种新的表示法可以从不同来源转换，包括网格和三维高斯平涂，且由于其可微性，可以通过透射优化进一步细化。我们展示了我们的表示在各种渲染应用中的多功能性，如全球照明和外观编辑，同时自然地支持任意光照条件。此外，我们将我们的表示与现有的体积表示进行比较，突出其在复制细节方面的效率。
diff --git a/abs/2406.09850.md b/abs/2406.09850.md
@@ -0,0 +1,5 @@
+### GradeADreamer: Enhanced Text-to-3D Generation Using Gaussian Splatting and Multi-View Diffusion
+
+Text-to-3D generation has shown promising results, yet common challenges such as the Multi-face Janus problem and extended generation time for high-quality assets. In this paper, we address these issues by introducing a novel three-stage training pipeline called GradeADreamer. This pipeline is capable of producing high-quality assets with a total generation time of under 30 minutes using only a single RTX 3090 GPU. Our proposed method employs a Multi-view Diffusion Model, MVDream, to generate Gaussian Splats as a prior, followed by refining geometry and texture using StableDiffusion. Experimental results demonstrate that our approach significantly mitigates the Multi-face Janus problem and achieves the highest average user preference ranking compared to previous state-of-the-art methods.
+
+文本到三维生成已展示出有希望的结果，但仍存在一些常见挑战，如多面体问题和高质量资产的延长生成时间。在本文中，我们通过引入一个名为GradeADreamer的新型三阶段训练流水线来解决这些问题。这个流水线能够在仅使用一块RTX 3090 GPU的情况下，在30分钟内生成高质量资产。我们提出的方法采用多视图扩散模型（MVDream）生成高斯平涂作为先验，然后使用StableDiffusion细化几何和纹理。实验结果表明，我们的方法显著减轻了多面体问题，并获得了与以往最先进方法相比的最高平均用户偏好排名。
diff --git a/abs/2406.10111.md b/abs/2406.10111.md
@@ -0,0 +1,5 @@
+### GaussianSR: 3D Gaussian Super-Resolution with 2D Diffusion Priors
+
+Achieving high-resolution novel view synthesis (HRNVS) from low-resolution input views is a challenging task due to the lack of high-resolution data. Previous methods optimize high-resolution Neural Radiance Field (NeRF) from low-resolution input views but suffer from slow rendering speed. In this work, we base our method on 3D Gaussian Splatting (3DGS) due to its capability of producing high-quality images at a faster rendering speed. To alleviate the shortage of data for higher-resolution synthesis, we propose to leverage off-the-shelf 2D diffusion priors by distilling the 2D knowledge into 3D with Score Distillation Sampling (SDS). Nevertheless, applying SDS directly to Gaussian-based 3D super-resolution leads to undesirable and redundant 3D Gaussian primitives, due to the randomness brought by generative priors. To mitigate this issue, we introduce two simple yet effective techniques to reduce stochastic disturbances introduced by SDS. Specifically, we 1) shrink the range of diffusion timestep in SDS with an annealing strategy; 2) randomly discard redundant Gaussian primitives during densification. Extensive experiments have demonstrated that our proposed GaussainSR can attain high-quality results for HRNVS with only low-resolution inputs on both synthetic and real-world datasets.
+
+从低分辨率输入视图实现高分辨率新视图合成（HRNVS）是一个具有挑战性的任务，因为缺乏高分辨率数据。以往的方法是从低分辨率输入视图优化高分辨率的神经辐射场（NeRF），但受制于渲染速度慢。在这项工作中，我们基于三维高斯平涂（3DGS），因为它能够以更快的渲染速度产生高质量图像。为了缓解高分辨率合成数据的短缺，我们提出利用现成的二维扩散先验通过得分蒸馏采样（SDS）将二维知识蒸馏到三维。然而，直接将SDS应用于基于高斯的三维超分辨率会导致不良和冗余的三维高斯原语，这是由生成先验带来的随机性造成的。为了缓解这一问题，我们引入了两种简单而有效的技术来减少SDS引入的随机干扰。具体来说，我们1）通过退火策略缩小SDS中扩散时间步的范围；2）在密集化过程中随机丢弃冗余的高斯原语。广泛的实验表明，我们提出的GaussainSR可以仅使用低分辨率输入在合成和现实世界数据集上获得HRNVS的高质量结果。
diff --git a/abs/2406.10219.md b/abs/2406.10219.md
@@ -0,0 +1,5 @@
+### PUP 3D-GS: Principled Uncertainty Pruning for 3D Gaussian Splatting
+
+Recent advancements in novel view synthesis have enabled real-time rendering speeds and high reconstruction accuracy. 3D Gaussian Splatting (3D-GS), a foundational point-based parametric 3D scene representation, models scenes as large sets of 3D Gaussians. Complex scenes can comprise of millions of Gaussians, amounting to large storage and memory requirements that limit the viability of 3D-GS on devices with limited resources. Current techniques for compressing these pretrained models by pruning Gaussians rely on combining heuristics to determine which ones to remove. In this paper, we propose a principled spatial sensitivity pruning score that outperforms these approaches. It is computed as a second-order approximation of the reconstruction error on the training views with respect to the spatial parameters of each Gaussian. Additionally, we propose a multi-round prune-refine pipeline that can be applied to any pretrained 3D-GS model without changing the training pipeline. After pruning 88.44% of the Gaussians, we observe that our PUP 3D-GS pipeline increases the average rendering speed of 3D-GS by 2.65× while retaining more salient foreground information and achieving higher image quality metrics than previous pruning techniques on scenes from the Mip-NeRF 360, Tanks & Temples, and Deep Blending datasets.
+
+近期在新视图合成领域的进展已经实现了实时渲染速度和高重建精度。三维高斯平涂（3D-GS）是一种基础的基于点的参数化三维场景表示方法，通过大量的三维高斯模型来表示场景。复杂场景可能包括数百万个高斯，这导致大量的存储和内存需求，限制了3D-GS在资源有限的设备上的可行性。目前压缩这些预训练模型的技术通过剪枝高斯，并依赖组合启发式方法来确定去除哪些高斯。在本文中，我们提出了一种基于原理的空间敏感性剪枝得分，其表现超过了这些方法。该得分作为对训练视图中每个高斯的空间参数的重建误差的二阶近似来计算。此外，我们提出了一个可应用于任何预训练3D-GS模型的多轮剪枝-精化流水线，而无需改变训练流程。在剪枝了88.44%的高斯之后，我们观察到我们的PUP 3D-GS流水线将3D-GS的平均渲染速度提高了2.65倍，同时保留了更多显著的前景信息，并在Mip-NeRF 360、坦克与庙宇以及深度混合数据集的场景中，比以前的剪枝技术实现了更高的图像质量指标。
diff --git a/abs/2406.10324.md b/abs/2406.10324.md
@@ -0,0 +1,5 @@
+### L4GM: Large 4D Gaussian Reconstruction Model
+
+We present L4GM, the first 4D Large Reconstruction Model that produces animated objects from a single-view video input -- in a single feed-forward pass that takes only a second. Key to our success is a novel dataset of multiview videos containing curated, rendered animated objects from Objaverse. This dataset depicts 44K diverse objects with 110K animations rendered in 48 viewpoints, resulting in 12M videos with a total of 300M frames. We keep our L4GM simple for scalability and build directly on top of LGM, a pretrained 3D Large Reconstruction Model that outputs 3D Gaussian ellipsoids from multiview image input. L4GM outputs a per-frame 3D Gaussian Splatting representation from video frames sampled at a low fps and then upsamples the representation to a higher fps to achieve temporal smoothness. We add temporal self-attention layers to the base LGM to help it learn consistency across time, and utilize a per-timestep multiview rendering loss to train the model. The representation is upsampled to a higher framerate by training an interpolation model which produces intermediate 3D Gaussian representations. We showcase that L4GM that is only trained on synthetic data generalizes extremely well on in-the-wild videos, producing high quality animated 3D assets.
+
+我们提出了L4GM，这是第一个4D大型重建模型，能够从单视角视频输入中生成动画对象——仅通过一次前馈传递即可完成，耗时仅一秒。我们成功的关键在于一个新颖的数据集，该数据集包含了来自Objaverse的精心策划和渲染的动画对象多视图视频。该数据集展示了44K个不同的对象，具有110K个动画，这些动画在48个视点中被渲染，产生了总共300M帧的12M个视频。我们保持L4GM的简单性以便于扩展，并直接基于预训练的3D大型重建模型LGM构建，LGM可以从多视图图像输入输出3D高斯椭球体。L4GM从以低帧率采样的视频帧中输出每帧的3D高斯平涂表示，然后将表示上采样到更高的帧率以实现时间平滑性。我们在基础LGM中添加了时间自注意层，帮助它学习时间上的一致性，并使用每个时间步的多视图渲染损失来训练模型。通过训练一个插值模型将表示上采样到更高的帧率，该模型产生中间的3D高斯表示。我们展示了仅在合成数据上训练的L4GM在野外视频上具有极好的泛化能力，能够生成高质量的动画3D资产。
diff --git a/abs/2406.10373.md b/abs/2406.10373.md
@@ -0,0 +1,5 @@
+### Wild-GS: Real-Time Novel View Synthesis from Unconstrained Photo Collections
+
+Photographs captured in unstructured tourist environments frequently exhibit variable appearances and transient occlusions, challenging accurate scene reconstruction and inducing artifacts in novel view synthesis. Although prior approaches have integrated the Neural Radiance Field (NeRF) with additional learnable modules to handle the dynamic appearances and eliminate transient objects, their extensive training demands and slow rendering speeds limit practical deployments. Recently, 3D Gaussian Splatting (3DGS) has emerged as a promising alternative to NeRF, offering superior training and inference efficiency along with better rendering quality. This paper presents Wild-GS, an innovative adaptation of 3DGS optimized for unconstrained photo collections while preserving its efficiency benefits. Wild-GS determines the appearance of each 3D Gaussian by their inherent material attributes, global illumination and camera properties per image, and point-level local variance of reflectance. Unlike previous methods that model reference features in image space, Wild-GS explicitly aligns the pixel appearance features to the corresponding local Gaussians by sampling the triplane extracted from the reference image. This novel design effectively transfers the high-frequency detailed appearance of the reference view to 3D space and significantly expedites the training process. Furthermore, 2D visibility maps and depth regularization are leveraged to mitigate the transient effects and constrain the geometry, respectively. Extensive experiments demonstrate that Wild-GS achieves state-of-the-art rendering performance and the highest efficiency in both training and inference among all the existing techniques.
+
+在非结构化旅游环境中拍摄的照片常常表现出变化多端的外观和瞬时遮挡，这对准确的场景重建和新视图合成造成挑战，并可能引发伪影。尽管之前的方法已将神经辐射场（NeRF）与额外的可学习模块结合起来以处理动态外观并消除瞬时物体，但它们广泛的训练需求和慢速的渲染速度限制了实际部署。最近，三维高斯平涂（3DGS）作为NeRF的有希望的替代方案出现，提供了更优的训练和推理效率以及更好的渲染质量。本文介绍了Wild-GS，这是对3DGS的创新适应，专为不受约束的照片集合优化，同时保留了其效率优势。Wild-GS通过每张图片的固有材料属性、全局照明和相机属性以及点级局部反射率变化来确定每个3D高斯的外观。与之前在图像空间建模参考特征的方法不同，Wild-GS通过采样从参考图像提取的三平面，显式地将像素外观特征与对应的局部高斯对齐。这种新颖的设计有效地将参考视图的高频详细外观转移到3D空间，并显著加快了训练过程。此外，还利用了2D可见性图和深度正则化来减轻瞬时效应和约束几何形状。广泛的实验表明，Wild-GS在所有现有技术中实现了最佳的渲染性能和最高的训练及推理效率。
diff --git a/abs/2406.10788.md b/abs/2406.10788.md
@@ -0,0 +1,5 @@
+### Physically Embodied Gaussian Splatting: A Realtime Correctable World Model for Robotics
+
+For robots to robustly understand and interact with the physical world, it is highly beneficial to have a comprehensive representation - modelling geometry, physics, and visual observations - that informs perception, planning, and control algorithms. We propose a novel dual Gaussian-Particle representation that models the physical world while (i) enabling predictive simulation of future states and (ii) allowing online correction from visual observations in a dynamic world. Our representation comprises particles that capture the geometrical aspect of objects in the world and can be used alongside a particle-based physics system to anticipate physically plausible future states. Attached to these particles are 3D Gaussians that render images from any viewpoint through a splatting process thus capturing the visual state. By comparing the predicted and observed images, our approach generates visual forces that correct the particle positions while respecting known physical constraints. By integrating predictive physical modelling with continuous visually-derived corrections, our unified representation reasons about the present and future while synchronizing with reality. Our system runs in realtime at 30Hz using only 3 cameras. We validate our approach on 2D and 3D tracking tasks as well as photometric reconstruction quality.
+
+为了使机器人能够稳健地理解和与物理世界互动，拥有一个全面的表示——模拟几何、物理和视觉观察——对于信息感知、规划和控制算法非常有益。我们提出了一种新颖的双高斯-粒子表示法，该表示法能够模拟物理世界，同时（i）实现未来状态的预测模拟和（ii）允许在动态世界中根据视觉观察进行在线校正。我们的表示包括捕捉世界中对象的几何方面的粒子，并可以与基于粒子的物理系统一起使用，以预测物理上可行的未来状态。这些粒子附加有3D高斯，通过平涂过程从任何视角渲染图像，从而捕捉视觉状态。通过比较预测图像和观察图像，我们的方法生成视觉力，这些视觉力在尊重已知物理约束的同时，校正粒子位置。通过将预测物理建模与连续的视觉导出校正相结合，我们的统一表示法在与现实同步的同时，对当前和未来进行推理。我们的系统仅使用3个摄像头即可实时运行，频率为30Hz。我们在2D和3D跟踪任务以及光度重建质量上验证了我们的方法。