Going deeper with Image Transformers

Touvron, Hugo; Cord, Matthieu; Sablayrolles, Alexandre; Synnaeve, Gabriel; Jégou, Hervé

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.17239v1 (cs)

[Submitted on 31 Mar 2021 (this version), latest version 7 Apr 2021 (v2)]

Title:Going deeper with Image Transformers

Authors:Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, Hervé Jégou

View PDF

Abstract:Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However the optimization of image transformers has been little studied so far. In this work, we build and optimize deeper transformer networks for image classification. In particular, we investigate the interplay of architecture and optimization of such dedicated transformers. We make two transformers architecture changes that significantly improve the accuracy of deep transformers. This leads us to produce models whose performance does not saturate early with more depth, for instance we obtain 86.3% top-1 accuracy on Imagenet when training with no external data. Our best model establishes the new state of the art on Imagenet with Reassessed labels and Imagenet-V2 / match frequency, in the setting with no additional training data.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2103.17239 [cs.CV]
	(or arXiv:2103.17239v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.17239

Submission history

From: Hugo Touvron [view email]
[v1] Wed, 31 Mar 2021 17:37:32 UTC (3,594 KB)
[v2] Wed, 7 Apr 2021 08:08:39 UTC (3,295 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Going deeper with Image Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Going deeper with Image Transformers

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators