Anchor DETR: Query Design for Transformer-Based Object Detection

Wang, Yingming; Zhang, Xiangyu; Yang, Tong; Sun, Jian

Computer Science > Computer Vision and Pattern Recognition

arXiv:2109.07107 (cs)

[Submitted on 15 Sep 2021 (v1), last revised 4 Jan 2022 (this version, v2)]

Title:Anchor DETR: Query Design for Transformer-Based Object Detection

Authors:Yingming Wang, Xiangyu Zhang, Tong Yang, Jian Sun

View PDF

Abstract:In this paper, we propose a novel query design for the transformer-based object detection. In previous transformer-based detectors, the object queries are a set of learned embeddings. However, each learned embedding does not have an explicit physical meaning and we cannot explain where it will focus on. It is difficult to optimize as the prediction slot of each object query does not have a specific mode. In other words, each object query will not focus on a specific region. To solved these problems, in our query design, object queries are based on anchor points, which are widely used in CNN-based detectors. So each object query focuses on the objects near the anchor point. Moreover, our query design can predict multiple objects at one position to solve the difficulty: "one region, multiple objects". In addition, we design an attention variant, which can reduce the memory cost while achieving similar or better performance than the standard attention in DETR. Thanks to the query design and the attention variant, the proposed detector that we called Anchor DETR, can achieve better performance and run faster than the DETR with 10$\times$ fewer training epochs. For example, it achieves 44.2 AP with 19 FPS on the MSCOCO dataset when using the ResNet50-DC5 feature for training 50 epochs. Extensive experiments on the MSCOCO benchmark prove the effectiveness of the proposed methods. Code is available at \url{this https URL}.

Comments:	Accepted to AAAI 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2109.07107 [cs.CV]
	(or arXiv:2109.07107v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2109.07107

Submission history

From: Yingming Wang [view email]
[v1] Wed, 15 Sep 2021 06:31:55 UTC (338 KB)
[v2] Tue, 4 Jan 2022 08:20:42 UTC (394 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Anchor DETR: Query Design for Transformer-Based Object Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Anchor DETR: Query Design for Transformer-Based Object Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators