Memorize What Matters: Emergent Scene Decomposition from Multitraverse

Li, Yiming; Wang, Zehong; Wang, Yue; Yu, Zhiding; Gojcic, Zan; Pavone, Marco; Feng, Chen; Alvarez, Jose M.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2405.17187 (cs)

[Submitted on 27 May 2024 (v1), last revised 29 May 2024 (this version, v2)]

Title:Memorize What Matters: Emergent Scene Decomposition from Multitraverse

Authors:Yiming Li, Zehong Wang, Yue Wang, Zhiding Yu, Zan Gojcic, Marco Pavone, Chen Feng, Jose M. Alvarez

View PDF

Abstract:Humans naturally retain memories of permanent elements, while ephemeral moments often slip through the cracks of memory. This selective retention is crucial for robotic perception, localization, and mapping. To endow robots with this capability, we introduce 3D Gaussian Mapping (3DGM), a self-supervised, camera-only offline mapping framework grounded in 3D Gaussian Splatting. 3DGM converts multitraverse RGB videos from the same region into a Gaussian-based environmental map while concurrently performing 2D ephemeral object segmentation. Our key observation is that the environment remains consistent across traversals, while objects frequently change. This allows us to exploit self-supervision from repeated traversals to achieve environment-object decomposition. More specifically, 3DGM formulates multitraverse environmental mapping as a robust differentiable rendering problem, treating pixels of the environment and objects as inliers and outliers, respectively. Using robust feature distillation, feature residuals mining, and robust optimization, 3DGM jointly performs 2D segmentation and 3D mapping without human intervention. We build the Mapverse benchmark, sourced from the Ithaca365 and nuPlan datasets, to evaluate our method in unsupervised 2D segmentation, 3D reconstruction, and neural rendering. Extensive results verify the effectiveness and potential of our method for self-driving and robotics.

Comments:	Project page: this https URL; Code and data: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
Cite as:	arXiv:2405.17187 [cs.CV]
	(or arXiv:2405.17187v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2405.17187

Submission history

From: Yiming Li [view email]
[v1] Mon, 27 May 2024 14:11:17 UTC (8,891 KB)
[v2] Wed, 29 May 2024 23:32:23 UTC (13,757 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Memorize What Matters: Emergent Scene Decomposition from Multitraverse

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Memorize What Matters: Emergent Scene Decomposition from Multitraverse

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators