GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Paper | Project Page

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Yuanhui Huang, Wenzhao Zheng$\dagger$, Yunpeng Zhang, Jie Zhou, Jiwen Lu$\ddagger$

$\dagger$ Project leader $\ddagger$ Corresponding author

💥A pioneering step towards building an object-centric autonomous driving system. 💥

GaussianFormer proposes the 3D semantic Gaussians as a more efficient object-centric representation for driving scenes compared with 3D occupancy.

News.

[2024/09/12] Training code release.
[2024/09/05] An updated version of GaussianFormer modeling only the occupied area.
[2024/09/05] Model weights and evaluation code release.
[2024/07/01] GaussianFormer is accepted to ECCV24!
[2024/05/28] Paper released on arXiv.
[2024/05/28] Demo release.

Demo

Overview

Considering the universal approximating ability of Gaussian mixture, we propose an object-centric 3D semantic Gaussian representation to describe the fine-grained structure of 3D scenes without the use of dense grids. We propose a GaussianFormer model consisting of sparse convolution and cross-attention to efficiently transform 2D images into 3D Gaussian representations. To generate dense 3D occupancy, we design a Gaussian-to-voxel splatting module that can be efficiently implemented with CUDA. With comparable performance, our GaussianFormer reduces memory consumption of existing 3D occupancy prediction methods by 75.2% - 82.2%.

Getting Started

Installation

Follow instructions HERE to prepare the environment.

Data Preparation

Download nuScenes V1.0 full dataset data HERE.
Download the occupancy annotations from SurroundOcc HERE and unzip it.
Download pkl files HERE.

Folder structure

GaussianFormer
├── ...
├── data/
│   ├── nuscenes/
│   │   ├── maps/
│   │   ├── samples/
│   │   ├── sweeps/
│   │   ├── v1.0-test/
|   |   ├── v1.0-trainval/
│   ├── nuscenes_cam/
│   │   ├── nuscenes_infos_train_sweeps_occ.pkl
│   │   ├── nuscenes_infos_val_sweeps_occ.pkl
│   ├── surroundocc/
│   │   ├── samples/
│   │   |   ├── xxxxxxxx.pcd.bin.npy
│   │   |   ├── ...

Inference

We provide two checkpoints trained on the SurroundOcc dataset:

The checkpoint that reproduces the result in Table.1 of our paper.
🔥🔥An updated version of GaussianFormer which assigns semantic Gaussians to model only the occupied area while leaving the empty space to one fixed infinitely large Gaussian. This modification can significant reduce the number of Gaussians to achieve similar model capacity (144000 -> 25600), thus being even more efficient. Check our GaussianHead for more details.

python eval.py --py-config config/nuscenes_gs144000.py --work-dir out/nuscenes_gs144000/ --resume-from out/nuscenes_gs144000/state_dict.pth

python eval.py --py-config config/nuscenes_gs25600_solid.py --work-dir out/nuscenes_gs25600_solid/ --resume-from out/nuscenes_gs25600_solid/state_dict.pth

Train

Run the following command to launch your training process. Note that the setting with 144000 Gaussians requires ~40G GPU memory in the training phase. So we recommend trying out the 25600 version which achieves even better performance!🚀

Download the pretrained weights for the image backbone HERE and put it inside ckpts.

CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --py-config config/nuscenes_gs25600_solid.py --work-dir out/nuscenes_gs25600_solid

Config	mIoU	Log	Weight
nuscenes_gs25600_solid	19.31	log	weight

Stay tuned for more exciting work and models!🤗

Related Projects

Our work is inspired by these excellent open-sourced repos: TPVFormer PointOcc SelfOcc SurroundOcc OccFormer BEVFormer

Our code is originally based on Sparse4D and migrated to the general framework of SelfOcc.

Citation

If you find this project helpful, please consider citing the following paper:

@article{huang2024gaussian,
    title={GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction},
    author={Huang, Yuanhui and Zheng, Wenzhao and Zhang, Yunpeng and Zhou, Jie and Lu, Jiwen},
    journal={arXiv preprint arXiv:2405.17429},
    year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
config		config
dataset		dataset
docs		docs
loss		loss
misc		misc
model		model
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
eval.py		eval.py
readme.md		readme.md
train.py		train.py
vis.py		vis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Paper | Project Page

News.

Demo

Overview

Getting Started

Installation

Data Preparation

Inference

Train

Related Projects

Citation

About

Releases

Packages

Contributors 2

Languages

License

huang-yh/GaussianFormer

Folders and files

Latest commit

History

Repository files navigation

GaussianFormer: Scene as Gaussians for Vision-Based 3D Semantic Occupancy Prediction

Paper | Project Page

News.

Demo

Overview

Getting Started

Installation

Data Preparation

Inference

Train

Related Projects

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages