Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Wang, Ning; Zhou, Wengang; Wang, Jie; Li, Houqaing

Computer Science > Computer Vision and Pattern Recognition

arXiv:2103.11681 (cs)

[Submitted on 22 Mar 2021 (v1), last revised 24 Mar 2021 (this version, v2)]

Title:Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Authors:Ning Wang, Wengang Zhou, Jie Wang, Houqaing Li

View PDF

Abstract:In video object tracking, there exist rich temporal contexts among successive frames, which have been largely overlooked in existing trackers. In this work, we bridge the individual video frames and explore the temporal contexts across them via a transformer architecture for robust object tracking. Different from classic usage of the transformer in natural language processing tasks, we separate its encoder and decoder into two parallel branches and carefully design them within the Siamese-like tracking pipelines. The transformer encoder promotes the target templates via attention-based feature reinforcement, which benefits the high-quality tracking model generation. The transformer decoder propagates the tracking cues from previous templates to the current frame, which facilitates the object searching process. Our transformer-assisted tracking framework is neat and trained in an end-to-end manner. With the proposed transformer, a simple Siamese matching approach is able to outperform the current top-performing trackers. By combining our transformer with the recent discriminative tracking pipeline, our method sets several new state-of-the-art records on prevalent tracking benchmarks.

Comments:	To appear in CVPR 2021 (Oral)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2103.11681 [cs.CV]
	(or arXiv:2103.11681v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2103.11681

Submission history

From: Ning Wang [view email]
[v1] Mon, 22 Mar 2021 09:20:05 UTC (2,085 KB)
[v2] Wed, 24 Mar 2021 09:23:57 UTC (2,098 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators