Transcrib3D: 3D Referring Expression Resolution through Large Language Models (IROS 2024)

Jiading Fang*, Xiangshan Tan*, Shengjie Lin*, Igor Vasiljevic, Vitor Guizilini, Hongyuan Mei, Rares Ambrus Gregory Shakhnarovich Matthew Walter

Transcrib3d_real_robot_demo_compressed.mp4

Transcrib3D reasons and acts according to complex 3D referring expression with real robots.

Environment Settings

For evaluation, only some simple packages are required, include numpy, openai and tenacity.

pip install numpy openai tenacity

Some additional packages are required for data preprocessing:

pip install plyfile scikit-learn scipy pandas

Set up your OpenAI api key as an environment variable OPENAI_API_KEY:

export OPENAI_API_KEY=xxx

Data Prepare

Since ReferIt3D dataset(includes sr3d and nr3d) and ScanRefer dataset depend on ScanNet, we preproces the ScanNet data as the first step.

quick start

To make things easier, we provide the bounding boexes for each scene at data/scannet_object_info. Currently it only include ground truth bounding boxes (which is the setting for NR3D and SR3D from Referit3D benchmark), detected bounding boxes will be provided later. There is no need to prepare the original scannet scene data for the sole purpose of testing (original scene data are still useful for debugging and visualization).

You could jump to Evaluation to get quick start.

If you want to generate the bounding boxes from the original ScanNet data, follow the steps below.

download ScanNet data

Follow the official instruction here to download the data. You have to fill a form and email the authors of ScanNet. Then, you will receive a response email with detailed instructions and a download python script download-scannet.py. Run the script to download certain types of data:

python download-scannet.py -o [directory in which to download] --type [file suffix]

Since the original 1.3TB ScanNet data contains many types of data files and there are some types we don't need in this project(such as the RGBD stream .sens type), you could use --type tag to download only the necessary types:

_vh_clean_2.ply _vh_clean_2.labels.ply _vh_clean_2.0.010000.segs.json _vh_clean.segs.json .aggregation.json _vh_clean.aggregation.json .txt

Run this sh/CMD instruction to download them (to avoid any key-pressing during download, comment the code key = input('') at line 147 and 225):

# bash
download_dir="your_scannet_download_directory"
suffixes=(
    "_vh_clean_2.ply"
    "_vh_clean_2.labels.ply"
    "_vh_clean_2.0.010000.segs.json"
    "_vh_clean.segs.json"
    ".aggregation.json"
    "_vh_clean.aggregation.json"
    ".txt"
)
for suffix in "${suffixes[@]}"; do
    python download-scannet.py -o "$download_dir" --type "$suffix"
done

CMD
set download_dir="your_scannet_download_directory"
set suffixes=_vh_clean_2.ply;_vh_clean_2.labels.ply;_vh_clean_2.0.010000.segs.json;_vh_clean.segs.json;.aggregation.json;_vh_clean.aggregation.json;.txt

for %s in (%suffixes%) do (
  python download-scannet.py -o  %download_dir% --type %s
)

After downloading, your folder should look like:

your_scannet_download_directory/
|-- scans/
|   |-- scene0000_00/
|   |   |-- scene0000_00_vh_clean_2.ply
|   |   |-- scene0000_00_vh_clean_2.labels.ply
|   |   |-- scene0000_00_vh_clean_2.0.010000.segs.json
|   |   |-- scene0000_00_vh_clean.segs.json
|   |   |-- scene0000_00.aggregation.json
|   |   |-- scene0000_00_vh_clean.aggregation.json
|   |   |-- scene0000_00.txt
|   |-- scenexxxx_xx/
|   |   |-- ...
|-- scans_test/
|   |-- scene0707_00/
|   |-- ...
|-- scannetv2-labels.combined.tsv

axis-align

Then, use the axis align matrices(recorded in scenexxxx_xx.txt) to transform the coordinates of vertices:

python preprocessing/align_scannet_mesh.py --scannet_download_path [your_scannet_download_directory]

download ReferIt3D and ScanRefer data

Follow the ReferIt3D official guide to download nr3d.csv, sr3d.csv, sr3d_train.csv, sr3d_test.csv and save them at data/referit3d folder.

Follow the ScanRefer official guide to download the dataset and put them under data/scanrefer folder.

generate object information

In this step, we process the ScanNet data to acquire quantitative and semantic information of objects in each scene.

For object instance segmentation, we use either ground truth (ScanNet official) data or off-the-shelf segmentation tool (Mask3d).

To use ground truth segmentation data, run:

python preprocessing/gen_obj_list.py --scannet_download_path [your_scannet_download_directory] --bbox_type gt

You can find the results in scannet_download_path/scans/objects_info/ and scannet_download_path/scans_test/objects_info/.

To use Mask3D segmentation data, first follow the Mask3D official guide to produce the instance segmentation results, then run:

python preprocessing/gen_obj_list.py --scannet_download_path [your_scannet_download_directory] \
    --bbox_type mask3d \
    --mask3d_result_path [your_mask3d_result_directory]
# Note: mask3d_result_path should look like xxx/Mask3D/eval_output/instance_evaluation_mask3d_export_scannet200_0/val/

You can find the results in scannet_download_path/scans/objects_info_mask3d_200c/.

Evaluation

quick start

Run the first 50 data records of nr3d_test_sampled1000.csv with config index 1:

python main.py --workspace_path /path/to/Transcribe3D/project/folder --scannet_data_root /path/to/ScanNet/Data/  --mode eval --dataset_type nr3d --conf_idx 1 --range 2 52

Remember to replace the paths.

Note that scannet_data_root can be set to /path/to/Transcribe3D/project/folder/data/scannet_object_info as we provide the GT scannet bounding boxes already. If you preprocess data by yourself, it can be set to scannet_download_path/scans/objects_info/.

how to modify configurations

To run our model on different refering datasets, simply modify the --dataset_type setting to [sr3d/nr3d/scanrefer].
To select the evaluation range of dataset, modify the --range setting. For Sr3D and Nr3D which use .csv files, the minimum number is 2. For ScanRefer which uses .json files, the minimum number is 0.
For convenience, more configurations are put into config/config.py. There are 3 dictionaries inside: confs_nr3d, confs_sr3d and confs_scanrefer. Each of them contains several configurations of that dataset. The meaning of different configurations could be understood from the variable names. Modify the --conf_idx setting to select configuration, and you can also add your own configurations.
More information could be found by python main.py -h.

result storage

After running the evaluation with certian configuration, a folder which has a name starting with eval_results_ and containing configuration infomation will be created under the results folder. Under this folder, there will be subfolders named after the start time of experiment.

Analyze Result

You might run one or more experiments of a evaluation configuration, and get some subfolders named after formatted time. The time/times are used for analyze the results. An example timestamp looks like 2023-10-26-15-48-12.

Specify the formatted time(s) after the --ft setting:

python main.py --workspace_path /path/to/Transcribe3D/project/folder/ --scannet_data_root /path/to/ScanNet/Data/  --mode result --dataset_type nr3d --conf_idx 1 --ft time1 time2

Check Scanrefer

Check the how many cases are provided with detected boxes that has 0.5 or higher iou with gt box, which indicates the upper bound of performance on Scanrefer.

python main.py --workspace_path /path/to/Transcribe3D/project/folder/ --scannet_data_root /path/to/ScanNet/Data/ --mode check_scanrefer --dataset_type scanrefer --conf_idx 1

Finetuning

We provide scripts for finetuning on open-source LLMs (e.g. codeLlama, Llama2) under finetune directory.

Environment

The script uses Huggingface trl (https://github.com/huggingface/trl) library to perform finetuning jobs. Main dependencies include Huggingface accelerate, transformers, datasets, peft, trl.

Data

We provide processed fientuning data following OpenAI finetune file protocal in finetune/finetune_files directory. It contains many different settings aligned as described in our paper. The original processing script is finetune/prepare_finetuning_data.py which processes results from the main script.

Scripts

We provide two example shell scripts to run the finetuning jobs, one with codellama model (finetune/trl_finetune_codellama_instruct.sh) and the other with llama2_chat model (finetune/trl_finetune_llama2_chat.sh). You can also customize finetuning job using finetune/trl_finetune.py.

Notes

The finetuned open-source models (e.g. codellama, llama2) still large under performs finetuned closed-source model (gpt-3.5-turbo) as of Sep 2023. We expect the situation would change dramatically in the coming future with quickly improving open source models.
The resource required for finetuning is roughly 24g+ gpu memory for 7b models and 36g+ gpu memory for 13b models.

Bibtex

If you find our paper useful, and use it in a publication, we appreciate the following citation:

@misc{fang2024transcrib3d3dreferringexpression,
      title={Transcrib3D: 3D Referring Expression Resolution through Large Language Models}, 
      author={Jiading Fang and Xiangshan Tan and Shengjie Lin and Igor Vasiljevic and Vitor Guizilini and Hongyuan Mei and Rares Ambrus and Gregory Shakhnarovich and Matthew R Walter},
      year={2024},
      eprint={2404.19221},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2404.19221}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 82 Commits
config		config
core		core
data		data
finetune		finetune
preprocessing		preprocessing
sh_scripts		sh_scripts
tools		tools
utils		utils
.gitignore		.gitignore
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transcrib3D: 3D Referring Expression Resolution through Large Language Models (IROS 2024)

Environment Settings

Data Prepare

quick start

download ScanNet data

axis-align

download ReferIt3D and ScanRefer data

generate object information

Evaluation

quick start

how to modify configurations

result storage

Analyze Result

Check Scanrefer

Finetuning

Environment

Data

Scripts

Notes

Bibtex

About

Contributors 3

Languages

ripl/Transcrib3D

Folders and files

Latest commit

History

Repository files navigation

Transcrib3D: 3D Referring Expression Resolution through Large Language Models (IROS 2024)

Environment Settings

Data Prepare

quick start

download ScanNet data

axis-align

download ReferIt3D and ScanRefer data

generate object information

Evaluation

quick start

how to modify configurations

result storage

Analyze Result

Check Scanrefer

Finetuning

Environment

Data

Scripts

Notes

Bibtex

About

Topics

Resources

Stars

Watchers

Forks

Contributors 3

Languages