M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

Wang, Junke; Wu, Zuxuan; Ouyang, Wenhao; Han, Xintong; Chen, Jingjing; Lim, Ser-Nam; Jiang, Yu-Gang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2104.09770 (cs)

[Submitted on 20 Apr 2021 (v1), last revised 19 Apr 2022 (this version, v3)]

Title:M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

Authors:Junke Wang, Zuxuan Wu, Wenhao Ouyang, Xintong Han, Jingjing Chen, Ser-Nam Lim, Yu-Gang Jiang

View PDF

Abstract:The widespread dissemination of Deepfakes demands effective approaches that can detect perceptually convincing forged images. In this paper, we aim to capture the subtle manipulation artifacts at different scales using transformer models. In particular, we introduce a Multi-modal Multi-scale TRansformer (M2TR), which operates on patches of different sizes to detect local inconsistencies in images at different spatial levels. M2TR further learns to detect forgery artifacts in the frequency domain to complement RGB information through a carefully designed cross modality fusion block. In addition, to stimulate Deepfake detection research, we introduce a high-quality Deepfake dataset, SR-DF, which consists of 4,000 DeepFake videos generated by state-of-the-art face swapping and facial reenactment methods. We conduct extensive experiments to verify the effectiveness of the proposed method, which outperforms state-of-the-art Deepfake detection methods by clear margins.

Comments:	accepted by ICMR 2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2104.09770 [cs.CV]
	(or arXiv:2104.09770v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2104.09770

Submission history

From: Junke Wang [view email]
[v1] Tue, 20 Apr 2021 05:43:44 UTC (5,171 KB)
[v2] Wed, 21 Apr 2021 12:59:29 UTC (5,143 KB)
[v3] Tue, 19 Apr 2022 06:08:33 UTC (21,724 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2021-04

Change to browse by:

References & Citations

DBLP - CS Bibliography

listing | bibtex

Zuxuan Wu
Jingjing Chen
Yu-Gang Jiang

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:M2TR: Multi-modal Multi-scale Transformers for Deepfake Detection

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators