By Zeren Chen, Gengshi Huang, Wei Li, Jianing Teng, Kun Wang, Jing Shao, Chen Change Loy, Lu Sheng.
This repository is an official implementation of the paper Siamese DETR.
We propose Siamese DETR, a novel self-supervised pretraining method for DETR. With two newly-designed pretext tasks, we directly locate the query regions generated by Edgeboxes in a cross-view manner and maximize cross-view semantic consistency, learning localization and discrimination representation transfer to downstream detection tasks.
This code is implemented based on MMSelfSup codebase. We test the code with python 3.6, pytorch 1.8.1, CUDA 9.0 and CUDNN 7.6.5. Other requirements can be installed via:
pip install -r requirements/runtime.txt --user
Compile wheel following Deformable DETR repo.
You can generate offline EdgeBoxes via:
cd tools/hycplayers
pip install -r requirements.txt --user
sh run.sh <YOUR_PARTITION> <NUM_PROCESS> <PATH/TO/DATA/ROOT> <PATH/TO/DATA/META> <PATH/TO/SAVE/DIR> <DATASET> # <DATASET> should be imagenet or coco
Siamese DETR use a frozen SwAV backbone to extract feature. You can download SwAV pretrained backbone ([email protected] top1) from SwAV repo.
We provide coco-style PASCAL VOC meta files (See data/datasets/voc_meta
) for downstream finetuning.
Put them into voc directory.
mkdir -p data/datasets/edgebox
ln -s path/to/swav/backbone data/model_zoo/resnet/swav_800ep_pretrain_oss.pth.tar
# dataset
ln -s path/to/coco2017 data/datasets/mscoco2017
ln -s path/to/imagenet data/datasets/imagenet
ln -s path/to/voc data/datasets/voc
cp -rf data/datasets/voc_meta data/datasets/voc/meta
# edgebox
ln -s path/to/imagenet/edgebox data/datasets/edgebox/imagenet
ln -s path/to/coco/edgebox data/datasets/edgebox/coco
We provide an example for downstream finetining, i.e., Conditional DETR in downstream_finetune/conditionaldetr
.
The primary modifications compared to original repo includes:
-
Add
--pretrain
arguments to load Siamese DETR pretrained checkpoint. (See main.py L160-168) -
Add PASCAL VOC datasets. (See voc.py)
# upstream pretraining on ImageNet/COCO
# You can check work_dirs/selfsup/siamese_detr/<cfgs>/<time>.log.json for training details.
sh tools/srun_train.sh <PARTITION> <CONFIG> <NUM_GPU> <JOB_NAME>
# convert openselfsup (OSS) checkpoint to detr checkpoint
python tools/convert_to_detr.py --ckpt <OSS_CKPT> --export <SAVE_DIR> [--deform]
# downstream finetune
cd downstream_finetune/<DETR_VARIANTS_DIR>
sh [train_coco.sh|train_voc.sh] <PARTITION> <CONVERTED_CKPT> <NUM_GPU> <JOB_NAME>
We provide pretrained checkpoints here.
Transfer results on ImgNet -> COCO (We report AP in downstream benchmark)
Method | Vanilla | ConditionDETR-100q | DeformableDETR-MS-300q |
---|---|---|---|
from scratch | 39.7 | 37.7 | 45.5 |
UP-DETR | 40.5 | 39.4 | 45.3 |
DETReg | 41.9 | 40.2 | 45.5 |
SiameseDETR | 42.0 | 40.5 | 46.4 |
Transfer results on ImgNet -> PASCAL VOC
Method | Vanilla | ConditionDETR-100q | DeformableDETR-MS-300q |
---|---|---|---|
from scratch | 47.8 | 49.9 | 56.1 |
UP-DETR | 54.4 | 56.9 | 56.4 |
DETReg | 57.0 | 57.5 | 59.7 |
SiameseDETR | 57.4 | 58.1 | 61.2 |
Transfer results on COCO -> PASCAL VOC
Method | ConditionDETR-100q |
---|---|
from scratch | 49.9 |
UP-DETR | 51.3 |
DETReg | 55.9 |
SiameseDETR | 57.7 |
Siamese DETR is released under the Apache 2.0 license. Please see the LICENSE file for more information.
@article{chen2023siamese,
title={Siamese DETR},
author={Chen, Zeren and Huang, Gengshi and Li, Wei and Teng, Jianing and Wang, Kun and Shao, Jing and Loy, Chen Change and Sheng, Lu},
journal={arXiv preprint arXiv:2303.18144},
year={2023}
}