Image inpainting forensics aims to distinguish the inpainted areas from the entire test image. Although many outstanding techniques have been proposed for this task, most studies only focus on visible inpainting methods, and the performance on invisible inpainting methods is poor. To this end, this paper proposes a hybrid CNN-Transformer feature fusion network (HCTNet) for image inpainting forensics. Firstly, we extract spatial features and noise features at the same time. The Transformer branch uses the inherent self-attention mechanism to capture the global spatial features between pixels, and the CNN branch is used to extract the local noise features left behind in the inpainting area. Secondly, we utilize the cross-modal fusion module (CMF) to fuse different types of multi-scale features. Finally, we design a progressive decoder that utilizes attention gate (AG) to more effectively combine multi-scale features in the decoder to improve the performance of inpainting region localization. A large number of experiments show that in the face of data sets with multiple invisible inpainting solutions, the method proposed in this article can more accurately locate the inpainting area and has strong robustness.
DreamBro-T/HCTNet
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.