Name	Name	Last commit message	Last commit date
Latest commit History 49 Commits
.asset	.asset
ape	ape
configs	configs
datasets	datasets
demo	demo
scripts	scripts
tools	tools
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md
requirements.txt	requirements.txt
setup.py	setup.py

APE: Aligning and Prompting Everything All at Once for Universal Visual Perception

🍇 [Read our arXiv Paper] 🍎 [Try our Online Demo]

💡 Highlight

High Performance. SotA (or competitive) performance on 160 datasets with only one model.
Perception in the Wild. Detect and segment everything with thousands of vocabularies or language descriptions all at once.
Flexible. Support both foreground objects and background stuff for instance segmentation and semantic segmentation.

🔥 News

2024.02.27 APE has been accepted to CVPR 2024!
2023.12.05 Release training codes!
2023.12.05 Release checkpoints!
2023.12.05 Release inference codes and demo!

🏷️ TODO

Release inference code and demo.
Release checkpoints.
Release training codes.
Add clean docs.

🛠️ Install

Clone the APE repository from GitHub:

git clone https://github.com/shenyunhang/APE
cd APE

Install the required dependencies and APE:

pip3 install -r requirements.txt
python3 -m pip install -e .

▶️ Demo Localy

Web UI demo

pip3 install gradio
cd APE/demo
python3 app.py

This demo will detect GPUs and use one GPU if you have GPUs.

Please feel free to try our Online Demo!

📚 Data Prepare

Following here to prepare the following datasets:

	COCO	LVIS	Objects365	Openimages	VisualGenome	SA-1B	RefCOCO	GQA	PhraseCut	Flickr30k	ODinW	SegInW	Roboflow100	ADE20k	ADE-full	BDD10k	Cityscapes	PC459	PC59	VOC	D3
Train	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗	✗
Test	✓	✓	✓	✓	✗	✗	✓	✗	✗	✗	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓	✓

Noted we do not use coco_2017_train for training.

Instead, we augment lvis_v1_train with annotations from coco, and keep the image set unchanged.

And we register it as lvis_v1_train+coco for instance segmentation and lvis_v1_train+coco_panoptic_separated for panoptic segmentation.

🧪 Inference

Infer on 160+ dataset

We provide several scripts to evaluate all models.

It is necessary to adjust the checkpoint location and GPU number in the scripts before running them.

scripts/eval_all_D.sh
scripts/eval_all_C.sh
scripts/eval_all_B.sh
scripts/eval_all_A.sh

Infer on images or videos

APE-D

python3.9 demo/demo_lazy.py \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k.py \
--input image1.jpg image2.jpg image3.jpg \
--output /path/to/output/dir \
--confidence-threshold 0.1 \
--text-prompt 'person,car,chess piece of horse head' \
--with-box \
--with-mask \
--with-sseg \
--opts \
train.init_checkpoint=/path/to/APE-D/checkpoint \
model.model_language.cache_dir="" \
model.model_vision.select_box_nums_for_evaluation=500 \
model.model_vision.text_feature_bank_reset=True \

To disable xformers, add the following option:

model.model_vision.backbone.net.xattn=False \

To use pytorch version of MultiScaleDeformableAttention, add the following option:

model.model_vision.transformer.encoder.pytorch_attn=True \
model.model_vision.transformer.decoder.pytorch_attn=True \

🚋 Training

Prepare backbone and language models

git lfs install
git clone https://huggingface.co/QuanSun/EVA-CLIP models/QuanSun/EVA-CLIP/
git clone https://huggingface.co/BAAI/EVA models/BAAI/EVA/
git clone https://huggingface.co/Yuxin-CV/EVA-02 models/Yuxin-CV/EVA-02/

Resize patch size:

python3.9 tools/eva_interpolate_patch_14to16.py --input models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14_plus_s9B.pt --output models/QuanSun/EVA-CLIP/EVA02_CLIP_E_psz14to16_plus_s9B.pt --image_size 224
python3.9 tools/eva_interpolate_patch_14to16.py --input models/QuanSun/EVA-CLIP/EVA01_CLIP_g_14_plus_psz14_s11B.pt --output models/QuanSun/EVA-CLIP/EVA01_CLIP_g_14_plus_psz14to16_s11B.pt --image_size 224
python3.9 tools/eva_interpolate_patch_14to16.py --input models/QuanSun/EVA-CLIP/EVA02_CLIP_L_336_psz14_s6B.pt --output models/QuanSun/EVA-CLIP/EVA02_CLIP_L_336_psz14to16_s6B.pt --image_size 336

Train APE-D

Single node:

python3.9 tools/train_net.py \
--num-gpus 8 \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_`date +'%Y%m%d_%H%M%S'`

Multiple nodes:

python3.9 tools/train_net.py \
--dist-url="tcp:https://${MASTER_IP}:${MASTER_PORT}" \
--num-gpus ${HOST_GPU_NUM} \
--num-machines ${HOST_NUM} \
--machine-rank ${INDEX} \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO_GQA_PhraseCut_Flickr30k/ape_deta/ape_deta_vitl_eva02_clip_vlf_lsj1024_cp_16x4_1080k_mdl_`date +'%Y%m%d_%H'`0000

Train APE-C

Single node:

python3.9 tools/train_net.py \
--num-gpus 8 \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H%M%S'`

Multiple nodes:

python3.9 tools/train_net.py \
--dist-url="tcp:https://${MASTER_IP}:${MASTER_PORT}" \
--num-gpus ${HOST_GPU_NUM} \
--num-machines ${HOST_NUM} \
--machine-rank ${INDEX} \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_SA1B_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H'`0000

Train APE-B

Single node:

python3.9 tools/train_net.py \
--num-gpus 8 \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H%M%S'`

Multiple nodes:

python3.9 tools/train_net.py \
--dist-url="tcp:https://${MASTER_IP}:${MASTER_PORT}" \
--num-gpus ${HOST_GPU_NUM} \
--num-machines ${HOST_NUM} \
--machine-rank ${INDEX} \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VGR_REFCOCO/ape_deta/ape_deta_vitl_eva02_vlf_lsj1024_cp_1080k_`date +'%Y%m%d_%H'`0000

Train APE-A

Single node:

python3.9 tools/train_net.py \
--num-gpus 8 \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k_`date +'%Y%m%d_%H%M%S'`

Multiple nodes:

python3.9 tools/train_net.py \
--dist-url="tcp:https://${MASTER_IP}:${MASTER_PORT}" \
--num-gpus ${HOST_GPU_NUM} \
--num-machines ${HOST_NUM} \
--machine-rank ${INDEX} \
--resume \
--config-file configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k.py \
train.output_dir=output/APE/configs/LVISCOCOCOCOSTUFF_O365_OID_VG/ape_deta/ape_deta_vitl_eva02_lsj1024_cp_720k_`date +'%Y%m%d_%H'`0000

🧳 Checkpoints

git lfs install
git clone https://huggingface.co/shenyunhang/APE

	name	Checkpoint	Config
1	APE-A	HF link	link
2	APE-B	HF link	link
3	APE-C	HF link	link
4	APE-D	HF link	link

🎖️ Results

✒️ Citation

If you find our work helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{APE,
  title={Aligning and Prompting Everything All at Once for Universal Visual Perception},
  author={Shen, Yunhang and Fu, Chaoyou and Chen, Peixian and Zhang, Mengdan and Li, Ke and Sun, Xing and Wu, Yunsheng and Lin, Shaohui and Ji, Rongrong},
  journal={CVPR},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

APE: Aligning and Prompting Everything All at Once for Universal Visual Perception

💡 Highlight

🔥 News

🏷️ TODO

🛠️ Install

▶️ Demo Localy

📚 Data Prepare

🧪 Inference

Infer on 160+ dataset

Infer on images or videos

🚋 Training

Prepare backbone and language models

Train APE-D

Train APE-C

Train APE-B

Train APE-A

🧳 Checkpoints

🎖️ Results

✒️ Citation

About

Releases 1

Packages

Contributors 2

Languages

License

shenyunhang/APE

Folders and files

Latest commit

History

Repository files navigation

APE: Aligning and Prompting Everything All at Once for Universal Visual Perception

💡 Highlight

🔥 News

🏷️ TODO

🛠️ Install

▶️ Demo Localy

📚 Data Prepare

🧪 Inference

Infer on 160+ dataset

Infer on images or videos

🚋 Training

Prepare backbone and language models

Train APE-D

Train APE-C

Train APE-B

Train APE-A

🧳 Checkpoints

🎖️ Results

✒️ Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Languages

Packages