# SimREC [![Python](https://img.shields.io/badge/python-blue.svg)](https://www.python.org/) ![PyTorch](https://img.shields.io/badge/pytorch-%237732a8) [![MAC-Group](https://img.shields.io/badge/mac-group-orange.svg)](https://mac.xmu.edu.cn/) SimREC is a simple and lightweight codebase for the research of referring expression comprehension and segmentation, supporting large-scale pre-training and multi-task learning. SimREC includes the official implementations of our paper "[Multi-task Collaborative Network for Joint Referring Expression Comprehension and Segmentation](https://openaccess.thecvf.com/content_CVPR_2020/papers/Luo_Multi-Task_Collaborative_Network_for_Joint_Referring_Expression_Comprehension_and_Segmentation_CVPR_2020_paper.pdf)" and "[What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study](https://arxiv.org/pdf/2204.07913.pdf)" . More models will be updated.

## Updates - (2022/4/20) A novel point-based prediction paradigm [SeqTR](https://arxiv.org/abs/2203.16265) is released in [here](https://github.com/sean-zhuh/SeqTR), which will be updated in this repository. - (2022/4/20) Implement MCN for SimREC project. - (2022/4/19) Release our SimREC project. ## Installation - Clone this repo ```bash git clone https://github.com/luogen1996/SimREC.git cd SimREC ``` - Create a conda virtual environment and activate it ```bash conda create -n simrec python=3.7 -y conda activate simrec ``` - Install Pytorch following the [official installation instructions](https://pytorch.org/get-started/locally/) - Install mmcv following the [installation guide](https://github.com/open-mmlab/mmcv#installation) - Install [Spacy](https://spacy.io/) and initialize the [GloVe](https://github-releases.githubusercontent.com/84940268/9f4d5680-4fed-11e9-9dd2-988cce16be55?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAIWNJYAX4CSVEH53A%2F20210815%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20210815T072922Z&X-Amz-Expires=300&X-Amz-Signature=1bd1bd4fc52057d8ac9eec7720e3dd333e63c234abead471c2df720fb8f04597&X-Amz-SignedHeaders=host&actor_id=48727989&key_id=0&repo_id=84940268&response-content-disposition=attachment%3B%20filename%3Den_vectors_web_lg-2.1.0.tar.gz&response-content-type=application%2Foctet-stream) and install other requirements as follows: ``` pip install -r requirements.txt wget https://github.com/explosion/spacy-models/releases/download/en_vectors_web_lg-2.1.0/en_vectors_web_lg-2.1.0.tar.gz -O en_vectors_web_lg-2.1.0.tar.gz pip install en_vectors_web_lg-2.1.0.tar.gz ``` ## Data preparation -  Follow the instructions of [DATA_PRE_README.md](./DATA_PRE_README.md) to generate training data and testing data. -  Download the pretrained weights of backbone (vgg, darknet, cspdarknet, DResNet, etc.). Expect for DResNet, all pretrained backbones are trained on COCO 2014 *train+val* set while removing the images appeared in the *val+test* sets of RefCOCO, RefCOCO+ and RefCOCOg (nearly 6500 images). Please follow the instructions of [DATA_PRE_README.md](./DATA_PRE_README.md) to download them. ## Training and Evaluation 1. **Config preparation**. Prepare your own configs in [configs](./configs), you don't need to rewrite all the contents in config every time.You can import the config as a python file to use the default configs in [configs/common](./configs/common) as follows: ```python # your own config.py from simrec.config import LazyCall from .common.dataset import dataset from .common.train import train from .common.optim import optim from .common.models.simrec import model # modify the config depend your own need dataset.ann_path["refcoco"] = "path/to/refcoco_data" train.output_dir = "./path/to/output_dir" ... ``` 2. **Train the model**. Run `bash/train.sh` under the [tools](./tools) to start training, for example, training `simrec` model on 4 GPUs: ```shell bash tools/train.sh config/simrec_refcoco_scratch.py 4 ``` The `training logs`, `tensorboard logs`, `config.yaml` and `model checkpoints` will be automatically saved under `cfg.train.output_dir`. 3. **Resume training**. We support two resume training mode. You can resume from a specific checkpoint or resume from the latest checkpoint: - Auto resume from `last_checkpoint.pth`: ```python # config.py from .common.train import train train.auto_resume.enabled = True ``` Setting `train.auto_resume.enabled=True`, which will automatically resume from `last_checkpoint.pth` saved in `cfg.train.output_dir`. - Resume from a specific checkpoint ```python # config.py from .common.train import train # disable auto resume first train.auto_resume.enabled = False # modify the resume path train.resume_path = "path/to/specific/checkpoint.pth" ``` Setting `train.resume_path` to the specific `checkpoint.pth` you want to resume from. 4. **Test the model.** Run `bash/eval.sh` under [tools](./tools) to evaluate the saved weight. ```shell bash tools/eval.sh config/simrec_refcoco_scratch.py 4 /path/to/checkpoint ``` ## Model Zoo SimREC supports all benchmarks of REC and RES, and can easily achieve very competitive performance. More results are available in [Model Zoo](https://github.com/luogen1996/SimREC/blob/main/MODEL_ZOO.md). | Tasks | RefCOCO | RefCOCO+ | RefCOCOg | Referit | Flickr | All REC data + VG (0.2M) | | :-----: | :-----: | :------: | :------: | :-----: | :----: | :----------------------: | | REC | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | | RES | ✓ | ✓ | ✓ | ✓ | × | × | | REC+RES | ✓ | ✓ | ✓ | ✓ | × | × | ## License This project is released under the [Apache 2.0 license](LICENSE). ## Citation If this repository is helpful for your research, or you want to refer the provided results in your paper, consider cite: ```BibTeX @article{luo2022what, title={What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study}, author={Luo, Gen and Zhou, Yiyi and Sun, Jiamu and Huang, Shubin and Sun, Xiaoshuai and Ye, Qixiang and Wu, Yongjian and Ji, Rongrong}, journal={arXiv preprint arXiv:2204.07913}, year={2022} } ``` ```BibTeX @article{zhu2022seqtr, title={SeqTR: A Simple yet Universal Network for Visual Grounding}, author={Zhu, Chaoyang and Zhou, Yiyi and Shen, Yunhang and Luo, Gen and Pan, Xingjia and Lin, Mingbao and Chen, Chao and Cao, Liujuan and Sun, Xiaoshuai and Ji, Rongrong}, journal={arXiv preprint arXiv:2203.16265}, year={2022} } ``` ```BibTeX @InProceedings{Luo_2020_CVPR, author = {Luo, Gen and Zhou, Yiyi and Sun, Xiaoshuai and Cao, Liujuan and Wu, Chenglin and Deng, Cheng and Ji, Rongrong}, title = {Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2020} } ``` ```BibTeX @inproceedings{luo2020cascade, title={Cascade grouped attention network for referring expression segmentation}, author={Luo, Gen and Zhou, Yiyi and Ji, Rongrong and Sun, Xiaoshuai and Su, Jinsong and Lin, Chia-Wen and Tian, Qi}, booktitle={Proceedings of the 28th ACM International Conference on Multimedia}, pages={1274--1282}, year={2020} } ``` ```BibTeX @article{zhou2021real, title={A real-time global inference network for one-stage referring expression comprehension}, author={Zhou, Yiyi and Ji, Rongrong and Luo, Gen and Sun, Xiaoshuai and Su, Jinsong and Ding, Xinghao and Lin, Chia-Wen and Tian, Qi}, journal={IEEE Transactions on Neural Networks and Learning Systems}, year={2021}, publisher={IEEE} } ``` ```BibTeX @misc{luo2022simrec, author = {Gen Luo and Tianhe Ren}, title = {SimREC}, howpublished = {\url{https://github.com/luogen1996/SimREC}}, year = {2022} } ``` ## Acknowledgement Thanks a lot for the nicely organized code from the following repos - [OpenVQA](https://github.com/MILVLG/openvqa). - [Swin-Transformer](https://github.com/microsoft/Swin-Transformer) - [Detectron2](https://github.com/facebookresearch/detectron2)