Authors: Qingwei Wang, Jinyu Yang, Xiaosheng Yu, Fangyi Wang, Peng Chen, Feng zheng.
🔥🔥🔥 This repository provides code for "Depth-aided Camouflaged Object Detection" ACM MM 2023.
Figure 1: Examples of camouflaged objects, with groundtruth and depth maps generated by monocular depth estimation. Examples (a) to (c) show cases when camouflaged objects are salient in depth images, while examples (d) to (f) show cases when depth maps are not such helpful to COD.
Camouflaged Object Detection (COD) aims to identify and segment objects that blend into their surroundings. Since the color and texture of the camouflaged objects are extremely similar to the surrounding environment, it is super challenging for vision models to precisely detect them. Inspired by research on biology and evolution, we introduce depth information as an additional cue to help break camouflage, which can provide spatial information and texture-free separation for foreground and background. To dig clues of camouflaged objects in both RGB and depth modalities, we innovatively propose Depth-aided Camouflaged Object Detection (DaCOD), which involves two key components. We firstly propose the Multi-modal Collaborative Learning (MCL) module, which aims to collaboratively learning deep features from both RGB and depth channels via a hybrid backbone. Then, we propose a novel Cross-modal Asymmetric Fusion (CAF) strategy, which asymmetrically fuse RGB and depth information for complementary depth feature enhancement to produce accurate predictions. We conducted numerous experiments of the proposed DaCOD on three widely-used challenging COD benchmark datasets, in which DaCOD outperforms the current state-of-the-arts by a large margin.
Figure 2: Overall framework of the proposed DaCOD. RGB and depth images are firstly concatenated by batch connections, and then fed into two different backbones, ResNet and Swin transformer, for collaboratively learning.
The collaboration features are separated by Batch Split Block (BSB), and then sent to Cross-modal Asymmetric Fusion (CAF) to produce the final prediction.
The training and testing experiments are conducted using PyTorch with a single RTX 3090 GPU of 24 GB Memory.
Download
swin_large_patch4_window7_224_22k.pth
at here (Code: ksv5), and put it into.\backbone
Download
resnet50-19c8e357.pth
at here (Code: qxju), and put it into.\backbone
Download
55.pth
at here (Code: sj8w), and put it into.\checkpoints\Depth_cod
All data can be downloaded from google drive
You can download the Datasets at here(Code: hqxc)
You can download the results at here (Code: h7vj)
Please cite our paper if you find the work useful:
@inproceedings{10.1145/3581783.3611874,
author = {Wang, Qingwei and Yang, Jinyu and Yu, Xiaosheng and Wang, Fangyi and Chen, Peng and Zheng, Feng},
title = {Depth-Aided Camouflaged Object Detection},
year = {2023},
publisher = {Association for Computing Machinery},
booktitle = {Proceedings of the 31st ACM International Conference on Multimedia},
series = {MM '23}
}