This is the code for the paper "Learning Point-Language Hierarchical Alignment for 3D Visual Grounding".
Our method is the 1st method on the ScanRefer benchmark (2022/10 - 2023/3) and is the winner of the ECCV 2022 2nd Workshop on Language for 3D Scenes.
- Download the ScanRefer dataset and unzip it under
data/
. - Downloadand the preprocessed GLoVE embeddings and put them under
data/
. - Download the ScanNetV2 dataset and put
scans/
underdata/scannet/scans/
. - Pre-process ScanNet data.
cd data/scannet python batch_load_scannet_data.py
pip install -r requirements.txt
cd lib/pointnet2
python setup.py install
Set the correct project path in lib/config.py
.
Using --tag
to name your experiment, and the training snapshots and results will be put in outputs/TAG_NAME_[timestamp]
CUDA_VISIABLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch \
--master_port 19999 --nproc_per_node 8 ./scripts/train_dist.py \
--fuse_with_key --use_spa --sent_aug \
--use_color --use_normal \
--tag TAG_NAME
CUDA_VISIABLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch
--master_port 19998 --nproc_per_node 1 ./benchmark/predict.py
--fuse_with_key --use_spa --use_color --use_normal
--no_nms --pred_split val --folder TAG_NAME
We thank a lot for the codebases of ScanRefer, 3DVG-Transformer, GroupFree.