CaCao

This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World" (Accepted by ICCV 2023)

Complete code for CaCao and boosted SGG

Here we provide sample code for CaCao boosting SGG dataset in standard setting and open-world setting.

Enhanced fine-grained predicates for VG

Download the enhanced dataset for VG training, you can use this Google drive link.

Dataset prepare

the VG dataset is required and put all the images in a same folder VG_100K

download

put above in ./datasets/vg/ folder

put coco2014_train in ./datasets/coco folder

put Vit in ./vit-base-patch32-224-in21k folder

put bert pytorch.bin in ./bert-base-uncased folder

Running Script Tutorial

# creat imdb_512.h5
python vg_to_imdb.py

# obtain initialized clusters for CaCao
python adaptive_cluster.py 
# establish the mapping from open-world boosted data to target predicates for enhancement
python fine_grained_mapping.py

# obtain cross-modal prompt tuning models for better predicate boosting
python cross_modal_tuning.py --mode 50 
python cross_modal_tuning.py --mode all
# enhance the existing SGG dataset with our CaCao model in <pre_trained_visually_prompted_model>
python fine_grained_predicate_boosting_data_prepare.py --mode 50 
python fine_grained_predicate_boosting_data_prepare.py --mode all

python fine_grained_predicate_boosting.py --mode 50
python fine_grained_predicate_boosting.py --mode all

Quantitative Analysis

Qualitative Analysis

Predicate Boosting

Predicate Prediction Distribution

Acknowledgement

The SGG part code is implemented based on Scene-Graph-Benchmark.pytorch, FGPL, and SSRCNN(One-Stage). Thanks for their great works!

📜 Citation

If you find this work useful for your research, please cite our paper and star our git repo:

@article{yu2023visually,
  title={Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World},
  author={Yu, Qifan and Li, Juncheng and Wu, Yu and Tang, Siliang and Ji, Wei and Zhuang, Yueting},
  journal={arXiv preprint arXiv:2303.13233},
  year={2023}
}

or

@inproceedings{yu2023visually,
  title={Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World},
  author={Yu, Qifan and Li, Juncheng and Wu, Yu and Tang, Siliang and Ji, Wei and Zhuang, Yueting},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CaCao

Complete code for CaCao and boosted SGG

Enhanced fine-grained predicates for VG

Dataset prepare

Running Script Tutorial

Quantitative Analysis

Qualitative Analysis

Predicate Boosting

Predicate Prediction Distribution

Acknowledgement

📜 Citation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
bert-base-uncased		bert-base-uncased
datasets		datasets
figures		figures
models		models
utils_data		utils_data
.gitignore		.gitignore
README.md		README.md
adaptive_cluster.py		adaptive_cluster.py
cross_modal_tuning.py		cross_modal_tuning.py
fine_grained_mapping.py		fine_grained_mapping.py
fine_grained_predicate_boosting.py		fine_grained_predicate_boosting.py
fine_grained_predicate_boosting_data_prepare.py		fine_grained_predicate_boosting_data_prepare.py
vg_to_imdb.py		vg_to_imdb.py

zhangjingxian1998/CaCao

Folders and files

Latest commit

History

Repository files navigation

CaCao

Complete code for CaCao and boosted SGG

Enhanced fine-grained predicates for VG

Dataset prepare

Running Script Tutorial

Quantitative Analysis

Qualitative Analysis

Predicate Boosting

Predicate Prediction Distribution

Acknowledgement

📜 Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages