by Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng
The unsupervised 3D object detection is to accurately detect objects in unstructured environments with no explicit supervisory signals. This task, given sparse LiDAR point clouds, often results in compromised performance for detecting small or distant objects due to the inherent sparsity and limited spatial resolution. In this paper, we are among the early attempts to integrate LiDAR data with 2D images for unsupervised 3D detection and introduce a new method, dubbed LiDAR-2D Self-paced Learning (LiSe). We argue that RGB images serve as a valuable complement to LiDAR data, offering precise 2D localization cues, particularly when scarce LiDAR points are available for certain objects. Considering the unique characteristics of both modalities, our framework devises a self-paced learning pipeline that incorporates adaptive sampling and weak model aggregation strategies. The adaptive sampling strategy dynamically tunes the distribution of pseudo labels during training, countering the tendency of models to overfit on easily detected samples, such as nearby and large-sized objects. By doing so, it ensures a balanced learning trajectory across varying object scales and distances. The weak model aggregation component consolidates the strengths of models trained under different pseudo label distributions, culminating in a robust and powerful final model. Experimental evaluations validate the efficacy of our proposed LiSe method, manifesting significant improvements of +7.1% APBEV and +3.4% AP3D on nuScenes, and +8.3% APBEV and +7.4% AP3D on Lyft compared to existing techniques.
- for MODEST, follow
MODEST/README.md
to install. - for GroundingDINO, follow
GroundingDINO/README.md
to install. - for FGR, follow
FGR/README.md
to install.
<!-- Identify discriminative objects within the images -->
conda activate GroundingDINO; cd LiSe/GroundingDINO; ln -s PATH_TO_YOUR_DATASET_DIR data; mkdir lyft_2Dbox_img; mkdir lyft_2Dbox_txt; mkdir lyft_2Dbox_pkl; python gd_inference.py;
<!-- Lifting process from 2D to 3D, obtain image-based pseudo 3D boxes -->
conda activate FGR; cd LiSe/FGR/FGR; python save_region_grow_result.py --kitti_dataset_path ../data/lyft --output_dir ../lyft_output --root_dir ../split/lyft_train.txt --dataset lyft; python detect.py --kitti_dataset_path ../data/lyft --final_save_path ../lyft_output_detect --pickle_save_path ../lyft_output --save_det_image --train_split_path ../split/lyft_train.txt --not_merge_valid_labels;
<!-- Integration with 2D scenes, distance-aware fusion -->
conda activate MODEST; cd LiSe/MODEST; python integration_with_2D_scenes.py --B_LiDAR_dir_path LiSe/MODEST/generate_cluster_mask/intermediate_results/lyft_original_seed/lyft_labels_pp_score_fw70_2m_r0.3_fov/\* --B_img_dir_path LiSe/FGR/lyft_output_detect/labels/ --B_final_dir_path LiSe/MODEST/generate_cluster_mask/intermediate_results/lyft_labels_pp_score_fw70_2m_r0.3_fov/ --d_min 10;
We upload final generated pseudo labels at MODEST/storage
. They can be directly used for training.
conda activate MODEST; cd LiSe/MODEST; bash scripts/seed_training_lyft.sh; bash scripts/self_training_lyft.sh -C "det_filtering.pp_score_threshold=0.7 det_filtering.pp_score_percentile=20 data_paths.bbox_info_save_dst=null data_root=$(pwd)/downstream/OpenPCDet/data/lyft/training";
conda activate MODEST; cd LiSe/MODEST; bash scripts/seed_training_nuscenes.sh; bash scripts/self_training_nusc.sh -C "data_paths=nusc.yaml det_filtering.pp_score_threshold=0.7 det_filtering.pp_score_percentile=20 data_paths.bbox_info_save_dst=null calib_path=$(pwd)/downstream/OpenPCDet/data/nuscenes_boston/training/calib ptc_path=$(pwd)/downstream/OpenPCDet/data/nuscenes_boston/training/velodyne image_shape=[900,1600]"
conda activate MODEST; cd LiSe/MODEST/downstream/OpenPCDet/tools; bash scripts/dist_test.sh 4 --cfg_file ../../downstream/OpenPCDet/tools/cfgs/lyft_models/pointrcnn_dynamic_obj.yaml --ckpt PATH_TO_CKPT
conda activate MODEST; cd LiSe/MODEST/downstream/OpenPCDet/tools; bash scripts/dist_test.sh 4 --cfg_file ../../downstream/OpenPCDet/tools/cfgs/nuscenes_boston_models/pointrcnn_dynamic_obj.yaml --ckpt PATH_TO_CKPT
Model | ST rounds | Checkpoint | Config file |
---|---|---|---|
PointRCNN | 0 | link | cfg |
PointRCNN | 1 | link | cfg |
PointRCNN | 10 | link | cfg |
Model | ST rounds | Checkpoint | Config file |
---|---|---|---|
PointRCNN | 0 | link | cfg |
PointRCNN | 1 | link | cfg |
PointRCNN | 10 | link | cfg |
GroundingDINO/gd_inference.py
: GroundingDINO inference on image set of autonomous driving dataset.FGR/FGR/save_region_grow_result.py
,FGR/FGR/detect.py
: Lift GroundingDINO 2D boxes to 3D boxes.MODEST/integration_with_2D_scenes.py
: Intergrate LiDAR-based pseudo boxes and image-based pseudo boxes.
This project is under the MIT License.
Please open an issue if you have any questions about using this repo.
Our repo is based on MODEST(CVPR'22), GroundingDINO(ECCV'24), FGR(ICRA'21), OpenPCDet. Thanks for their great works and open-source effort!
@inproceedings{zhang2024approaching,
title={Approaching Outside: Scaling Unsupervised 3D Object Detection from 2D Scene},
author={Zhang, Ruiyang and Zhang, Hu and Yu, Hang and Zheng, Zhedong},
booktitle={Proceedings of the European conference on computer vision (ECCV)},
year={2024}
}