WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language
This repository is for WildRefer dataset and official implement for WildRefer: WildRefer: 3D Object Localization in Large-scale Dynamic Scenes with Multi-modal Visual Data and Natural Language.
Our dataset can be download here.
We strongly recommend to use our pre-processed HuCenLife and STCrowd that can be downloaded here.
Please prepare the dataset as following folder struction:
./
└── data/
├── liferefer_test.json
├── liferefer_train.json
├── strefer_test.json
└── strefer_train.json
└── src/
├── LifeRefer.zip
└── STRefer.zip
Unzip our processed data
cd src
unzip LifeRefer.zip
unzip STRefer.zip
cd ..
Our environment is based on Python 3.8 and cuda 11.3.
You can install the environment with conda
.
conda create -n wildrefer_env python=3.8 -y
conda activate wildrefer_env
conda install conda-forge::cudatoolkit-dev=11.3 -y
pip install torch==1.11.0 torchvision==0.12.0 --index-url https://download.pytorch.org/whl/cu113
pip install -r requirements.txt
python -m spacy download en_core_web_sm
cd pointnet2
python setup.py install
cd ..
Our weights can be downloaded here.
You can put the weights under the folder weights/
.
./
└── weights/
├── liferefer_test.json
└── strefer_train.json
python test.py --dataset strefer --pretrain weights/strefer_weights.pth --max_lang_num 50 --frame_num 2 --batch_size 36
python test.py --dataset liferefer --pretrain weights/liferefer_weights.pth --frame_num 2 --batch_size 32
python train.py --dataset strefer --max_lang_num 50
python train.py --dataset liferefer --max_lang_num 100
All datasets are published under the Creative Commons Attribution-NonCommercial-ShareAlike. This means that you must attribute the work in the manner specified by the authors, you may not use this work for commercial purposes and if you alter, transform, or build upon this work, you may distribute the resulting work only under the same license.