This is the open-source repository for our paper Learning Camouflaged Object Detection from Noisy Pseudo Label, accepted at ECCV 2024!
Our Paper Can Be Seen at Paper
We introduce a novel training protocol named Weakly Semi-Supervised Camouflaged Object Detection (WSSCOD), utilizing boxes as prompts to generate high-quality pseudo labels. WSSCOD leverages box annotations, complemented by a minimal amount of pixel-level annotations, to generate high-accuracy pseudo labels.
-
Dataset Division:
- $\mathcal{D}_m = {\mathcal{X}_m, \mathcal{F}_m, \mathcal{B}m}{m=1}^M$: Pixel-level annotations
$\mathcal{F}_m$ , box annotations$\mathcal{B}_m$ , and training images$\mathcal{X}_m$ . - $\mathcal{D}_n = {\mathcal{X}_n, \mathcal{B}n}{n=1}^N$: Box annotations and images, where
$M+N$ represents the number of training sets.
- $\mathcal{D}_m = {\mathcal{X}_m, \mathcal{F}_m, \mathcal{B}m}{m=1}^M$: Pixel-level annotations
-
Training ANet:
- Train ANet using dataset
$\mathcal{D}_m$ . - Use
$\mathcal{B}_m$ as prompts and$\mathcal{F}_m$ for supervision.
- Train ANet using dataset
-
Generating Pseudo Labels:
- Use the trained ANet and dataset
$\mathcal{D}_n$ to predict pseudo labels$\mathcal{W}_n$ .
- Use the trained ANet and dataset
-
Constructing the Weakly Semi-Supervised Dataset:
- Combine ${\mathcal{X}_m, \mathcal{F}m}{m=1}^M$ and ${\mathcal{X}_n, \mathcal{W}n}{n=1}^N$ to form
$\mathcal{D}_t$ .
- Combine ${\mathcal{X}_m, \mathcal{F}m}{m=1}^M$ and ${\mathcal{X}_n, \mathcal{W}n}{n=1}^N$ to form
-
Training PNet:
- Train PNet using the dataset
$\mathcal{D}_t$ . - Evaluate performance with different
$M$ and$N$ ratios:-
PNet$_{F1}$:
$M=1%$ ,$N=99%$ -
PNet$_{F5}$:
$M=5%$ ,$N=95%$ -
PNet$_{F10}$:
$M=10%$ ,$N=90%$ -
PNet$_{F20}$:
$M=20%$ ,$N=80%$
-
PNet$_{F1}$:
- Train PNet using the dataset
Aspect | ANet (Auxiliary Network) | PNet (Primary Network) |
---|---|---|
Stage | First | Second |
Objective | Generate high-accuracy pseudo labels | Main camouflaged object detection |
Data Input | Subset |
Weakly semi-supervised dataset |
Training Dataset | $\mathcal{D}_m = {\mathcal{X}_m, \mathcal{F}_m, \mathcal{B}m}{m=1}^M$ | $\mathcal{D}_t = {\mathcal{X}_m, \mathcal{F}m}{m=1}^M \cup {\mathcal{X}_n, \mathcal{W}n}{n=1}^N$ |
Annotations | Pixel-level |
Pseudo labels |
Supervision | Pixel-level |
Pseudo labels |
Input Prompts | Box annotations |
Images |
Performance Evaluation | - | Different settings: PNet$_{F1}$, PNet$_{F5}$, PNet$_{F10}$, PNet$_{F20}$ |
Training Goal | Generate high-quality pseudo labels |
Improve detection accuracy with various |
We have made the training and test sets available for download via the following links:
- Google Drive
- BaiDu Drive (Passwd: ECCV)
Once downloaded, place data.zip
in the code/data
directory and unzip it.
python code/TrainANet/TrainDDP.py --gpu_id 0 --ration 1
# ration represents the proportion of pixel-level labels
# we find that one card training is better than four or eight cards
python code/TrainANet/Test.py --ration 1
# ration represents the proportion of pixel-level labels
python code/TrainANet/TrainDDP.py --gpu_id 0 --ration 1 --q_epoch 20 --batchsize_fully 6 --batchsize_weakly 24
# ration represents the proportion of pixel-level labels
# q_epoch means we change the q to 1 at this epoch
# batchsize_fully means the number of fully annotated samples in a batch
# batchsize_weakly means the number of weakly annotated samples in a batch
python code/TrainPNet/Test.py --ration 1
# ration represents the proportion of pixel-level labels
Model | Pretrained Weight | Prediction Description |
---|---|---|
PNet$_{F1}$ | BaiDu Link, Google Link |
|
PNet$_{F5}$ | BaiDu Link, Google Link |
|
PNet$_{F10}$ | BaiDu Link, Google Link |
|
PNet$_{F20}$ | BaiDu Link , Google Link |
|
@inproceedings{OVCOS_ECCV2024,
title={Learning Camouflaged Object Detection from Noisy Pseudo Label},
author={Jin, Zhang and Ruiheng, Zhang and Yanjiao, Shi and Zhe, Cao and Nian, Liu and Shahbaz Khan, Fahad},
booktitle={ECCV},
year={2024},
}