Xiang Zhang*, Zeyuan Chen*, Fangyin Wei, and Zhuowen Tu (*Equal contribution)
This is the repository for the paper Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction (ICCV 2023).
[Paper
]
We have pre-packaged all dependencies via Docker image. It is built on top of PyTorch 2.1.2
with CUDA 11.8
.
You can pull the image via
docker pull zx1239856/uni-3d:0.1.0
Assume you already have proper PyTorch (>=1.10.1)
and CUDA (>=11.3)
installation.
- Install the following system dependencies
apt-get install ninja-build libopenblas-dev libopenexr-dev
- Remove the comment mark on Line 9 of requirements.txt. Install the required Python packages via
pip install -r requirements.txt
Please download 3D-FRONT from Dahnert et al. (Panoptic 3D Scene Reconstruction from a Single RGB Image). Extract it under datasets/front3d/data
as
unzip front3d.zip -d datasets/front3d/data
Please request the dataset from the authors of Pano-Re. Extract it under datasets/matterport/data
.
Also download the room mask and depth from BUOL. Extract them underdataset/matterport/room_mask
and dataset/matterport/depth_gen
, respectively.
matterport/
meta/
train_3d.json # Training set metadata
...
data/
<scene_id>/
├── <image_id>_i<frame_id>.png # Color image: 320x240x3
├── <image_id>_segmap<frame_id>.mapped.npz # 2D Segmentation: 320x240x2, with 0: pre-mapped semantics, 1: instances
├── <image_id>_intrinsics_<camera_id>.png # Intrinsics matrix: 4x4
├── <image_id>_geometry<frame_id>.npz # 3D Geometry: 256x256x256x1, truncated, (unsigned) distance field at 3cm voxel resolution and 12 voxel truncation.
├── <image_id>_segmentation<frame_id>.mapped.npz # 3D Segmentation: 256x256x256x2, with 0: pre-mapped semantics & instances
├── <image_id>_weighting<frame_id>.npz # 3D Weighting mask: 256x256x256x1
depth_gen/
<scene_id>/
├── <posithion_id>_d<frame_id>.png # Depth image: 320x240x1
room_mask/
<scene_id>/
├── <posithion_id>_rm<frame_id>.png # Room mask: 320x240x1
Model | PRQ | RSQ | RRQ | Download |
---|---|---|---|---|
3D-FRONT Pretrained 2D | -- | -- | -- | front3d_dps_160k.pth |
3D-FRONT Single-scale | 52.51 | 60.89 | 83.97 | front3d_full_single_scale.pth |
3D-FRONT Multi-scale | 53.53 | 61.69 | 84.69 | front3d_full_multi_scale.pth |
Matterport Pretrained 2D | -- | -- | -- | matterport_dps_120k.pth |
Matterport Single-scale | 16.58 | 44.26 | 36.68 | matterport_full_single_scale.pth |
If you are using docker, you may set the following prefix for convenience.
export DOCKER_PREFIX="docker run -it --gpus all --shm-size 128G -v "$(pwd)":/workspace zx1239856/uni-3d:0.1.0"
$DOCKER_PREFIX OMP_NUM_THREADS=16 torchrun --nproc_per_node=8 train_net.py --config-file configs/front3d/mask2former_R50_bs16_160k.yaml OUTPUT_DIR <path-to-output-dir>
$DOCKER_PREFIX OMP_NUM_THREADS=16 torchrun --nproc_per_node=8 train_net.py --config-file configs/front3d/uni_3d_R50.yaml MODEL.WEIGHTS <path-to-pretrained-2d-model> OUTPUT_DIR <path-to-output-dir>
Use uni_3d_R50_ms.yaml for multi-scale feature reprojection.
Please adjust --nproc_per_node
, OMP_NUM_THREADS
and SOLVER.IMS_PER_BATCH
based on your environment.
Please add --eval-only
flag to the training scripts above for evaluation.
You can generate meshes for visualization for 3D-FRONT images via the following command.
python demo_front3d.py -i <path-to-3d-front-image> -o <path-to-output-dir> -m <path-to-pretrained-model>
Please consider citing Uni-3D if you find the work helpful.
@InProceedings{Zhang_2023_ICCV,
author = {Zhang, Xiang and Chen, Zeyuan and Wei, Fangyin and Tu, Zhuowen},
title = {Uni-3D: A Universal Model for Panoptic 3D Scene Reconstruction},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2023},
pages = {9256-9266}
}
This repository is released under the Apache License 2.0. License can be found in LICENSE file.
- Mask2Former for the framework.
- panoptic-reconstruction for the pre-processed 3D-FRONT and Matterport dataset, and evaluation codes.
- BUOL for generated depth and room mask on Matterport dataset.