CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

This is the official PyTorch implementation of CORA (CVPR 2023).

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching (CVPR 2023)
Xiaoshi Wu, Feng Zhu, Rui Zhao, Hongsheng Li

Overview

We propose CORA, a DETR-style framework for open-vocabulary detection (OVD) that adapts CLIP for Open-vocabulary detection by Region prompting and Anchor pre-matching. Our method demonstrates state-of-the-art results on both COCO and LVIS OVD benchmarks.

Environment

Requirements

Linux with Python ≥ 3.9.12
CUDA 11
The provided environment is suggested for reproducing our results, similar configurations may also work.

Quick Start

# environment
conda create -n cora python=3.9.12
conda activate cora
conda install pytorch==1.12.0 torchvision==0.13.0 cudatoolkit=11.3 -c pytorch

# cora
git clone [email protected]:tgxs002/CORA.git
cd CORA

# other dependencies
pip install -r requirements.txt

# install detectron2
Please install detectron2 as instructed in the official tutorial (https://detectron2.readthedocs.io/en/latest/tutorials/install.html). We use version==0.6 in our experiments.

Data Preparation

Check docs/dataset.md for dataset preparation.

Besides the dataset, we also provide necessary files to reproduce our result. Please download the learned region prompts, and put them under logs folder. A guide for training the region prompts is provided in Region Prompting.

Model Zoo

Method	Pretraining Model	Novel	All	Checkpoint
CORA	RN50	35.1	35.4	Checkpoint
CORA	RN50x4	41.7	43.8	Checkpoint

Checkpoints for LVIS, $\text{CORA}^+$ will be ready soon.

Evaluation

Run the following command for evaluating the RN50 model:

# if you are running locally
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh test 8 local --resume /path/to/checkpoint.pth --eval

# if you are running on a cluster with slurm scheduler
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh test 8 slurm quota_type partition_name --resume /path/to/checkpoint.pth --eval

If you are using slurm, please remember to replace quota_type and partition_name to your quota type and the partition you are using. You can directly change the config and checkpoint path to evaluate other models.

Training Localizer

Before training the localizer, please make sure that the region prompts and relabeled annotations as instructed in Data Preparation.

Run the following command to train the RN50 model:

# if you are running locally
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh RN50 8 local

# if you are running on a cluster with slurm scheduler
bash configs/COCO/R50_dab_ovd_3enc_apm128_splcls0.2_relabel_noinit.sh RN50 8 slurm quota_type partition_name

If you are using slurm, please remember to replace quota_type and partition_name to your quota type and the partition you are using. You can directly change the config to train other models.

Region Prompting

We provide the trained pre-trained region prompts as specified in Data Preparation. Please refer to the region branch for training and exporting the region prompts.

git checkout region

CLIP-Aligned Labeling

The code for CLIP-Aligned Labeling will be released soon in another branch of this repository, we provide the pre-computed relabeled annotations as specified in Data Preparation.

Citation and Acknowledgement

Citation

If you find this repo useful, please consider citing our paper:

@article{wu2023cora,
  title={CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching},
  author={Xiaoshi Wu and Feng Zhu and Rui Zhao and Hongsheng Li},
  journal={ArXiv},
  year={2023},
  volume={abs/2303.13076}
}

Acknowledgement

This repository was built on top of SAM-DETR, CLIP, RegionClip, and DAB-DETR. We thank the effort from the community.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
configs		configs
datasets		datasets
docs		docs
models		models
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cluster_classes.py		cluster_classes.py
convert_annotation_to_coco_style.py		convert_annotation_to_coco_style.py
convertlog.py		convertlog.py
demo.py		demo.py
engine.py		engine.py
export_pseudo_label.py		export_pseudo_label.py
export_pseudo_label_CC3M.py		export_pseudo_label_CC3M.py
export_pseudo_label_IN.py		export_pseudo_label_IN.py
export_pseudo_label_LVIS.py		export_pseudo_label_LVIS.py
extract_attn_pool.py		extract_attn_pool.py
main.py		main.py
merge_lvis_annotation.py		merge_lvis_annotation.py
relabel.py		relabel.py
remove_lvis_rare.py		remove_lvis_rare.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Overview

Outline

Environment

Requirements

Quick Start

Data Preparation

Model Zoo

Evaluation

Training Localizer

Region Prompting

CLIP-Aligned Labeling

Citation and Acknowledgement

Citation

Acknowledgement

About

Releases

Packages

Languages

License

yechenzhi/CORA

Folders and files

Latest commit

History

Repository files navigation

CORA: Adapting CLIP for Open-Vocabulary Detection with Region Prompting and Anchor Pre-Matching

Overview

Outline

Environment

Requirements

Quick Start

Data Preparation

Model Zoo

Evaluation

Training Localizer

Region Prompting

CLIP-Aligned Labeling

Citation and Acknowledgement

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages