Distilling Cognitive Backdoor Patterns within an Image: A SOTA Method for Backdoor Sample Detection

Code for ICLR 2023 Paper "Distilling Cognitive Backdoor Patterns within an Image"

Use Cognitive Distilation on a pretrained model and images.

lr: the learning rate (step size) for extracting the mask.
p: the L_p norm constraint of the mask.
gamma (alpha used in the paper) and beta: hyperparameters for the objective function.
num_steps*: number of steps for extracting the mask.
preprocessor: image preprocessor. Example: if input normalization is used, use torchvision.transforms.Normalize(mean, std) if none, use torch.nn.Identity().

from cognitive_distillation import CognitiveDistillation

images = # batch of images (torch.Tensor) [b,c,h,w]
model = # a pre-trained model (torch.nn.Module)
preprocessor = torch.nn.Identity() # or torchvision.transforms.Normalize(mean, std)

cd = CognitiveDistillation(lr=0.1, p=1, gamma=0.01, beta=10.0, num_steps=100)
masks = cd(model, images, preprocessor=preprocessor) # the extracted masks (torch.Tensor) [b,1,h,w]
cognitive_pattern = images * masks # extracted cognitive pattern (torch.Tensor) [b,c,h,w]

Visualizations of the masks and Cognitive Patterns

Reproduce results from the paper

Configurations for each experiment are stored in configs/ folder.
Trigger patterns can be downloaded from NAD GitHub repo
ISSBA poisoned data can be downloaded from ISSBA GitHub repo
Dynamic attack generator can be downloaded from Dyanamic Attack GitHub repo
For DFST attack, data can be generated from DFST GitHub repo
Other triggers (trigger folder in this repo) can be downloaded from this Google Drive
Frequency detector model weights can be downloaded from this Google Drive. Note that this model is trained on the GTSRB dataset (reproduced using PyTorch), based on frequency-backdoor.

Train a model

$exp_path: the path where you want to store experiment results, checkpoints, logs
$exp_config: where the experiment config is located
$exp_name: name of the specific experiment configurations (*.yaml)

python train.py --exp_path $exp_path \
 --exp_config $exp_config \
 --exp_name $exp_name

Run detections

The following command will save the detection results (e.g., masks of Cognitive Distillation, a confidence score for other baselines) to $exp_path.

--method argument specifies detection methods ['CD', 'ABL', 'Feature', 'FCT', 'STRIP'].
$gamma is the hyperparameter value for Cognitive Distillation
'Feature' is used for extract deep features (used by AC and SS).
ABL does not need to run detection. All training losses are stored in the $exp_path.

python extract.py --exp_path $exp_path \
 --exp_config $exp_config \
 --exp_name $exp_name \
 --method "CD" --gamma $gamma

Run detections

The following command will check AUPRC/AUROC for the detection results.

--method argument specifies detection methods ['CD', 'AC', 'ABL', 'FCT', 'Frequency', SS', 'STRIP'].

python detect_analysis.py --exp_path $exp_path \
                          --exp_config $exp_config \
                          --exp_name $exp_name \
                          --gamma $gamma

Citation

If you use this code in your work, please cite the accompanying paper:

@inproceedings{
huang2023distilling,
title={Distilling Cognitive Backdoor Patterns within an Image},
author={Hanxun Huang and Xingjun Ma and Sarah Monazam Erfani and James Bailey},
booktitle={ICLR},
year={2023},
}

Acknowledgements

This research was undertaken using the LIEF HPC-GPGPU Facility hosted at the University of Melbourne. This Facility was established with the assistance of LIEF Grant LE170100200. The authors would like to thank Yige Li for sharing the several triggers used in the experiments.

Part of the code is based on the following repo:

Dynamic Attack: https://github.com/VinAIResearch/input-aware-backdoor-attack-release
Spectral Signatures: https://github.com/MadryLab/backdoor_data_poisoning
STRIP: https://github.com/garrisongys/STRIP
NAD: https://github.com/bboylyg/NAD
ABL: https://github.com/bboylyg/ABL
ISSBA: https://github.com/yuezunli/ISSBA
Frequency: https://github.com/YiZeng623/frequency-backdoor
https://github.com/JonasGeiping/data-poisoning/blob/main/forest/filtering_defenses.py

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
analysis		analysis
configs		configs
datasets		datasets
detection		detection
examples		examples
losses		losses
models		models
.gitattributes		.gitattributes
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
detect_analysis.py		detect_analysis.py
evaluate.py		evaluate.py
exp_mgmt.py		exp_mgmt.py
extract.py		extract.py
misc.py		misc.py
requirements.txt		requirements.txt
train.py		train.py
train_ddp.py		train_ddp.py
train_face_attr.py		train_face_attr.py
unlearn_fintune.py		unlearn_fintune.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distilling Cognitive Backdoor Patterns within an Image: A SOTA Method for Backdoor Sample Detection

Use Cognitive Distilation on a pretrained model and images.

Visualizations of the masks and Cognitive Patterns

Reproduce results from the paper

Train a model

Run detections

Run detections

Citation

Acknowledgements

Part of the code is based on the following repo:

About

Releases

Languages

License

HanxunH/CognitiveDistillation

Folders and files

Latest commit

History

Repository files navigation

Distilling Cognitive Backdoor Patterns within an Image: A SOTA Method for Backdoor Sample Detection

Use Cognitive Distilation on a pretrained model and images.

Visualizations of the masks and Cognitive Patterns

Reproduce results from the paper

Train a model

Run detections

Run detections

Citation

Acknowledgements

Part of the code is based on the following repo:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Languages