Skip to content

This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World" (Accepted by ICCV 2023)

Notifications You must be signed in to change notification settings

zhangjingxian1998/CaCao

 
 

Repository files navigation

CaCao

This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World" (Accepted by ICCV 2023) framework

Complete code for CaCao and boosted SGG

Here we provide sample code for CaCao boosting SGG dataset in standard setting and open-world setting.

Enhanced fine-grained predicates for VG

Download the enhanced dataset for VG training, you can use this Google drive link.

Dataset prepare

the VG dataset is required and put all the images in a same folder VG_100K

download

VG-SGG.h5

objects.json

relationships.json

image_data.json

put above in ./datasets/vg/ folder

put coco2014_train in ./datasets/coco folder

put Vit in ./vit-base-patch32-224-in21k folder

put bert pytorch.bin in ./bert-base-uncased folder

Running Script Tutorial

# creat imdb_512.h5
python vg_to_imdb.py
# obtain initialized clusters for CaCao
python adaptive_cluster.py 
# establish the mapping from open-world boosted data to target predicates for enhancement
python fine_grained_mapping.py 
# obtain cross-modal prompt tuning models for better predicate boosting
python cross_modal_tuning.py --mode 50 
python cross_modal_tuning.py --mode all
# enhance the existing SGG dataset with our CaCao model in <pre_trained_visually_prompted_model>
python fine_grained_predicate_boosting_data_prepare.py --mode 50 
python fine_grained_predicate_boosting_data_prepare.py --mode all

python fine_grained_predicate_boosting.py --mode 50
python fine_grained_predicate_boosting.py --mode all 

Quantitative Analysis

image

Qualitative Analysis

visualization visualization

Predicate Boosting

image

Predicate Prediction Distribution

image image

Acknowledgement

The SGG part code is implemented based on Scene-Graph-Benchmark.pytorch, FGPL, and SSRCNN(One-Stage). Thanks for their great works!

📜 Citation

If you find this work useful for your research, please cite our paper and star our git repo:

@article{yu2023visually,
  title={Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World},
  author={Yu, Qifan and Li, Juncheng and Wu, Yu and Tang, Siliang and Ji, Wei and Zhuang, Yueting},
  journal={arXiv preprint arXiv:2303.13233},
  year={2023}
}

or

@inproceedings{yu2023visually,
  title={Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World},
  author={Yu, Qifan and Li, Juncheng and Wu, Yu and Tang, Siliang and Ji, Wei and Zhuang, Yueting},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2023}
}

About

This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World" (Accepted by ICCV 2023)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%